Overview
Organizations having hundreds of gigabytes of storage sometimes have a desire to backup only a fraction of total data residing on their storage devices. This could be driven by the need of having backups available for important files and folders at the time of a disaster. Another potential need would be the desire to more quickly identify and restore data from a backup set, without spending too much time in recovering an entire volume. It also yields a cost effective approach (in terms of storage) to guard business critical data, which results in having more than one layer of data protection at a lower price tag.
What is File Level Backup?
File level backup is another way of taking backups for the files and folders residing on a storage volume. As opposed to the volume level backup, where a user has to take the backup of the entire volume, file level backup is defined at the files and folders level. This mechanism provides a better degree of granularity for the users.
Why File Level Backup?
As mentioned earlier, file level backup in contrast to the image level (volume) backup gives organizations the ability to save on data storage cost. Nonetheless, this type of backup has more to offer than just been cost effective. It is a faster way of taking backups for mission critical data. It allows data operators to arrange the backup data separately into logical groups, providing increased flexibility of choice for the IT staff while defining their backup policies. This makes information lifecycle management (ILM) easier for companies and allows them to realize sizeable savings in their operational budget overtime.
The primary advantage of having a lesser number of files (translated into a lesser number of data blocks) to be backed up, is the fact that it is quicker, yielding less network traffic and minimal storage overheads. Yet another aspect of this improved form of backup can be realized at the time of data recovery. Having fewer files in the original backup set will automatically result in a quicker search-and-find operation for a particular file or folder and generally the overall time required for the data recovery will be less. This can potentially cut down the overall recovery time and can effectively reduce system down time.
Moreover, this grouping of data would also benefit IT staff from data lifecycle management (DLM) standpoint. It will allow them to add more meaning to their stored data sets and will allow their backup sets to be more easily labeled and sorted logically into different categories. For example, all financial documents can be grouped together into a ‘Finance’ backup set. Similarly for firms dealing with software R&D, their source code can be grouped into ‘Dev’ backup set.
Preparing for the disaster
Figure 1 shows a conceptual representation of the system implementing file level backups. Among the components that constitute the overall systems are a workstation (client box), network connectivity (such as a LAN) and a backup server.
Figure 1: High level disaster recovery system connectivity
A high level analysis of the system reveals the fact that this process flow offers optimal performance in a typical disaster recovery system design. The following describes the minimum set of features a typical DR (disaster recovery) product has to offer in order to be useful for data operators. These would include, but not limited to the followings:
- For data protection and better handling of a disaster situation, backups should be taken on a fast, reliable, robust and fault tolerant data storage, such as a NAS, SAN or IPSAN. Hence, the DR product should facilitate the backup storage on one of those devices.
- Among other fundamental aspects auto volume recovery is one such feature that would allow a near real time recovery of data. This feature would help recovering the system up to the last known good state that was backed up before a major system corruption has occurred.
- Another useful feature to have would be to let the users restore data from a selective group of backup sets. This type of point in time system recovery, will allow more control in a DR situation, i.e. a user can restore data to any known good state, which was backed up. When it comes to evaluate any DR product, the capability to selectively restore data to a particular point in time should be considered a must have feature for maximum effectiveness in an IT organization.
- Data compression is another important feature that can offer improved time and space utilization. With compression one would not only see a drop in the overall network traffic during the backup process but should also experience generally improved bandwidth utilization and optimized storage.
Disaster scenario
Taking backups is just the starting point for securing critical data. What is crucial is the ability to anticipate disasters that can occur in a typical corporate environment. This list includes, but is not limited to, catastrophic events such as fires, floods or other “acts of God”, hardware failures such as disk crashes, network or other communication malfunctions, malicious data modifications, accidental data updates and so on.
Let’s examine a typical disaster scenario that can happen in a day-to-day routine of any organization. Analyzing this situation will help us in effectively defining the backup strategy to prepare for a similar accident that can have a negative affect on company’s performance.
Scenario: Consider a situation where an unforeseen catastrophe (such as power surge, etc.) has caused damage to the business critical data such as, financial, inventory and or CRM related documents and spread sheets. The overall health of the system is, however, intact and it is running fine with no other issues.
Backup Strategy: To be able to recover from such a disaster situation, while avoiding as much system down time as possible, it is pertinent for the data operator to take pro-active approach in defining the backup policies. If the data is highly valuable for the business, one should not only define a policy for volume level backups for critical disk storage, but should also schedule file level backups.
In order to get to the point quickly where a file level backup set is available, it is desirable to run file level backup of important files prior to running the full volume level backup. Say for example, a disaster occurs at (t-1)st hour, whereas the file level backup was completed at (t-2)nd hour. In such a case the data operator would be able to recover from the disaster using the available file level backup.
Figure 2: Time line showing file level and volume level backups
This, however, would not be possible at all if only a full volume level backup was scheduled, as it will take significantly longer for the volume backup to complete than it will for a selective file level backup.
Figure 2 shows time line for a hypothetical chain of activities occurring during any working day at a company. In essence, having a file level backup in addition to volume backup would improve the odds of recovery from a disaster situation which could compromise important data files on the system and provides an extra layer of safety for managing business-critical data.
Sonasoft Solution
Sonasoft, a leading provider of disaster recovery and business continuance solutions released its SonaSafe for File System product, a couple of years back. This product has now been enhanced to add the ability to perform file level backups to its current list of storage backup and restore functionalities. It allows users to group their data logically into separate backups, which makes information lifecycle management (ILM) easier for IT staff using the SonaSafe for File Systems product.
The company has included in the new product release significant improvements in the User Interface for the File System product, which allow users to pick files and or folders (in addition to the volumes) to be backed up. The interface is taken to the next level where user has an option to either include his selection to the backup set or exclude the selection. This mutually exclusive selection process provides maximum ease of use to backup operators while creating backup plans, since it gives them the flexibility to define the backup set in the way that makes the most sense in terms of business requirements.
Based on the user selection for the files and or folders, the SonaSafe File System agent extracts information from the on-disk NTFS. Using a proprietary algorithm, the File System agent (implemented as a process running on the machine which contains data to be backed up) collects block/sector information for each file included in the backup set and then stores this allocation information in the form of a bitmap file. Next, it begins to perform a block-by-block backup for the selected files/folders. All data blocks are first compressed, on-the-fly, on the server where the original data resides and then sent out via the network channel to the network attached storage.
The resulting backup set files are typically significantly smaller than the original data set which was backed up, and can only be read/ restored by the SonaSafe for File Systems agent using the reverse compression algorithm.
Summary
This paper provides a conceptual look at a solution implementing highly efficient file level backup of critical data and underlines its importance. Having this feature in place will help mid-to-large size corporations with extremely large quantities of business-critical data to more effectively and efficiently handle their file system backup and data storage. Corporate data operators can better organize data into separate tiers, by specifying backup policies at the granularity of files and folders. This will allow them to maintain economical data storage and they will be able to better arrange and organize their backup sets in their corporation’s data archives.
Sonasoft recognizes the importance of ILM for an organization and has designed the file level backup feature in the SonaSafe for File Systems product so that it will assist organizations in improving their ROI and maintaining low cost operations, thus improving overall profit margin for the company.
About Sonasoft®
Sonasoft Corp. automates the disk-to-disk backup and recovery process for Microsoft Exchange, SQL and Windows Servers with its groundbreaking SonaSafe® Point-Click Recovery® solutions. Designed to simplify and eliminate human error in the backup and recovery process, SonaSafe solutions also centralize the management of multiple servers and provide a cost-effective turnkey disaster recovery strategy for companies of all sizes. For more information, please visit www.sonasoft.com.
