Advertisment

Archiving to the cloud

author-image
CIOL Bureau
Updated On
New Update

BANGALORE, INDIA: Cloud computing usage has grown exponentially in the last few years. While most of the focus has been on running compute jobs in the cloud, another use of cloud is for archiving.

Advertisment

Archiving to the cloud refers to keeping your most important data in the cloud, so that it remains safe in the event of a data centre outage.

You may want to keep your archival data in cloud for several reasons — to reduce your data centre cost, to provide a second copy of your data, or to use it for sharing. However, first, let us understand archiving; and look at the desirable characteristics of a storage system which is to be used for archiving.

Let us Understand Archiving

Advertisment

An ‘archive’ is a repository of information. Archiving refers to identifying data which has been static, and then moving it from expensive primary storage to cheaper archival storage. The primary storage may keep just a pointer to the back-end copy of data, so that the data can be retrieved whenever it is needed.

The archived data could include things like patient records at a hospital, or transaction logs at a bank. This data is important and needs to be retained for long term, however, it is not needed on a daily basis. Archiving less needed data frees up space in the primary storage, and so helps in the overall performance for all types of data.

Archiving and Cloud — Made for each other

Advertisment

The desirable attributes of an archival system are that it should have a large capacity, low cost of storage and good long term retention capabilities.

By its very nature, an off-site cloud storage system improves data retention, because any disaster or other disturbance at your primary data centre does not affect the off-site storage. Many cloud storage systems are designed to be large, expandable and scalable.

If they are also designed to be low cost and capable of long term retention, then they can be an ideal candidate for data archival.

Advertisment

An example scheme

How can we use such a cloud storage system effectively? Let us illustrate this by an example.



In the systems shown in the figures above, a file server acts as the central repository of data for an office.

Advertisment

As far as the client machines and the end users are concerned, all of their data resides on the file server which is the primary data store.

However, unknown to them, a software running on the primary store identifies candidate files for archival, and moves them to the low cost secondary storage. The archived files are replaced by stubs which are much smaller in size.

The storage administrator can set policies to select which files are archived. The policies can be based on the file’s location, size, age or a number of other factors. When a client machine requests for a file, it should be fetched from the secondary store and given to the client immediately.

Advertisment

Before the advent of cloud storage system, the archiving was primarily on tapes or content-addressable storage. Now, we can add cloud as a possible archival storage into this system.

{#PageBreak#}

What did we gain?

Advertisment

Let us consider the advantages of bringing in the cloud system into this architecture. To begin with, one can replace older tape based archival storage by the cloud storage. Cloud storage is cheaper and more scalable that tape storage systems. It can also be accessed from anywhere.

Multiple small offices can share a single large cloud storage account, thus giving us economy of scale.

Secondly, the data can be accessed from another file server very easily in case of a disaster or some other data loss event at the primary store. One just needs to keep a back-up of the cloud storage account credentials and other metadata in a safe location, to quickly recover all data.

If you also consider the fact that tape accesses are sequential, and need the correct tapes being labeled and loaded for data accesses, then it is very clear that cloud is a viable alternative to tape storage for archival.

Object vs. Namespace view

Is cloud a type of object storage? Or does it offer a hierarchical file and directory view? The answer is 'Yes' to both these questions.

Several cloud storage systems (an example is EMC’s Atmos) are designed to be large, distributed, massively scalable storage systems. So naturally they store their data as objects, and not as traditional file systems.

However, they also have the capability to store metadata with each object. This metadata may store the full path for the object in the host’s namespace. This gives us the ability to query for a piece of data stored on the cloud both with its object ID, and with its file path.

This ‘hybrid’ ability is very useful for archiving. Archiving software running on the host may need to recover data from the cloud storage in the event of the loss of its metadata. The namespace view is a very useful thing in such cases. During normal operations, the software can maintain the object information in its metadata, and so will use the object access methods (which are faster).

Summary

Cloud storage is an important category of emerging storage media. Many of its characteristics are especially suited for archiving. If you are considering cloud storage adoption, you should seriously look at archiving as your first foray into this field.

The author is consultant software engineer at EMC Corporation.

smac