Advertisment

Why disk-based backup?

author-image
CIOL Bureau
Updated On
New Update

NEW DELHI, INDIA: What comes to mind first when we hear “backup”? Tape! Tape has long been established as a backup media for critical enterprise data. But with the capacity of magnetic disk drives increasing rapidly and cost per gigabyte of storage falling equally rapidly, backing up data onto disk has become a viable alternative, or addition, to tape backup. In this article, we take a look at the important trends that are driving this change, and things to consider while deploying disk-based backup.

Advertisment

In a 24x7 world, businesses are increasingly dependent – critically – on continuous access to data. At the same time, data is growing exponentially across businesses. While the former is forcing smaller backup windows, the latter actually demands larger backup windows.

More importantly, when data needs to be recovered from loss or corruption, the time available is often not enough, because longer recovery time means longer downtime which has a cost.

Not only is tape backup slow, recovery from tape is also slow, due to the sequential access nature of data stored on tape. What is more, if data needs to be retrieved from a tape which is kept off-line, it takes a long time to retrieve the correct tape and then restore the data. And then, accurate data recovery from tape is not guaranteed, because tape media is prone to errors.

Advertisment

Disk, being random access medium, easily lends itself to quick data recovery. And the advanced technologies available today, such as snapshots, ensure accurate and consistent data recovery from disk.

Snapshots are point-in-time copies of data which cannot be modified (i.e. they are 'frozen’ images of data).  In case of data loss, you can either bring the snapshot copy online in place of its original location (plain data recovery), or you can bring the copy online in a new location ("cloning”).

Of course, tape backups can not be completely eliminated. Tape has the advantage that it can be transported to another location, and can be kept securely in a “vault”. So a backup administrator can use a combination of disk-based backup and tape-based backup to achieve data retention, recovery, security and cost objectives. Using disk-based backups allows us to keep more recent copies of data online for quick recovery. At the same time, using tape backup allows us to keep older copies of data offline for a long period of time.

Advertisment

Backup consistency

It is important to ensure that the data backed up on disk is consistent, because then it can be quickly brought online in case of data loss or disaster. The consistency can be at different levels -- crash consistency, filesystem consistency or application level consistency. Usually it is not difficult to ensure that a snapshot copy is crash consistent. But getting file system consistency and application level consistency requires a little more effort.

In a NAS (network attached storage) environment, the software on the storage system runs a quick “file system consistency check” before initiating a snapshot copy, to ensure that snapshots are file system consistent. In the case of SAN (storage area network), the file system runs on the host and the snapshot mechanism needs to synchronize the host file system actions with the storage system. If some data is cached in the host's memory, this data should be flushed to disk before initiating the snapshot copy. The time taken complete the copy should be small, since most host file systems can tolerate only small disk un-availability windows. These actions on the host side ensure that the snapshot copy is 'file system consistent.'

Advertisment

For application level consistency, the application needs to be part of the snapshot copy initiation process. If the application is caching some of the data in its memory, this data should be flushed to the disk before making a snapshot. Here again, the time to complete the snapshot should be small, since most applications can tolerate only small disk un-availability windows. Some database applications are now designed to run in a "hot backup" mode, where the application logs the I/O to memory temporarily, while the storage is un-available. These actions on the application side ensure that the snapshot copy is 'application consistent.'

Many modern filesystems have journaling capability, which is useful while restoring data from a snapshot copy. Both the filesystem data and its journal of changes are stored in the snapshot copy. The filesystem is able to recover the snapshot data quickly by inspecting the journal and either playing back the pending entries, or discarding them.

Additional uses of disk-based backup

Advertisment

Networked storage vendors like NetApp are introducing innovative ways to use back up copies of data, other than just for backup purposes. One such technology is "cloning", wherein a disk-based replica of the original LUN can be created. A "LUN clone" is a replica of the LUN when a snapshot was taken. The clone can be made available to the same host (or another host) as an entirely new LUN. This 'cloned' LUN does not need to occupy extra storage space, because it uses the snapshot copy on the disk. It can be mounted as a read-only copy for data verification. It can also be made a writeable clone, using extra disk space only for the changes in data. Writeable clones enable efficient test/ development operations while introducing new applications or modifying/ upgrading current applications. Cloned LUNs are also very useful in restoring a small part of the data out of the entire backup. This could be a single table in a database, or a single file out of several hundreds or thousands of files. The random access nature of disks makes this an extremely easy task.

Disk-based backups are also very useful in mirroring data to a remote site for disaster recovery. Mirroring can be done either over a LAN or a WAN, and uses a small fraction of the available network bandwidth. It can not tolerate frequent changes to the source data, because of the risk of losing synchronization. In such situations, mirroring the disk-based backup is safer, because the contents do not change with time.

The other possibility is to implement tiered data retention with disk-based backups. Old backup copies can be moved to less expensive “archival” storage which is nothing but an array of low cost SATA disks or one can move these to tape. This “tiering” of data depends on the importance and criticality of the data that is to be archived, how frequently it needs to be referenced and how fast it needs to be referenced if called upon.

Advertisment

Summary

With the cost of disks falling rapidly with time, disk-based backup is becoming more prevalent. In environments which need high uptime and can not tolerate un-availability of data, this technology is getting serious attention. As more innovative uses of this technology get discovered over time, disk based backups will soon become an important tool in the hands of every storage and backup administrator.

The author is senior engineer at Network Appliance

tech-news