Advertisment

High cost, tape dependency restrict dedupe

author-image
Deepa
Updated On
New Update

BANGALORE, INDIA: At a time when digital data over the metro is doubling in every 18 months, data deduplicatiom has emerged as the latest buzz in the storage industry. Data deduplication or dedupe removes redundant data over the metro. 

Advertisment

publive-image

“Redundant data is identified using hashing (the process of identifying unique segments of blocks) on the client, or in larger environments, a local backup can be taken and only unique data sent to central site. The latter method is efficient because restore from recent backups is local, not over the WAN. So it’s much quicker to recover than if it had to be carried over the WAN,” says Vivek Anand, regional director, India & SAARC, CommVault, in an interview to CIOL, where he shares his views on deduplication. Excerpts:

CIOL: Is space optimization through data dedupe the same in all circumstances? 

Advertisment

Vivek Anand: Categorically not. Data deduplication comes in many forms and should not be seen only as a gain to be had at the end point (or dedupe appliance). If you consider the lifespan of data it typically begins at a client and finally comes to rest in a long-term store - be that local tape, vaulted tape or even an on-line archive.

Before the data finishes its journey it is stored and held for variable periods of time in different locations and on different media to satisfy recovery or access criteria (defined by regulatory requirements or business practice). If we isolate deduplication to an appliance then we only gain the benefit of deduplication at a single point and for a limited period of time, we also miss the opportunity for the rationalization of data where it makes sense (i.e. before it is transmitted over a network, stored on a device other than a dedupe appliance etc.).

Also, not all data is the same and optimization ratios are different based on the data type.  For example, video and image data is less likely to retain similar blocks than document and message data and as such will have a detrimental effect on dedupe ratios.

Advertisment

Customers also have a mixture of redundant and new data. Redundant data provides excellent deduplication ratios whereas "net new" data reduces the overall ratio. The point is that customers are not the same in terms of overall requirement, data held and data requirements in longer term.

Finally, when moving deduplicated data from disk media to tape to take advantage of long-term costs benefits, CommVault is the only vendor to migrate data "as is" without the need to reverse the deduplication process and result in additional storage requirements.

CIOL: What are the challenges in this space? 

Advertisment

VA: The challenges come in many forms.  Firstly, customers need to understand the net benefits of deduplication. It may be that a standard non-deduplication solution may be of more value. A number of the customers with whom we've met in India have not made the decision to migrate to backup or archive to disk in the first instance.

As such deduplication will be of limited value to them except potentially where remote data is being serviced - in which case CommVault would advise on mixed solutions. 

 
Advertisment

The marketing surrounding deduplication would suggest that every customer will win in terms of both ROI and recovery service levels; whilst this may be true of the recovery speeds, it requires a change in environment for the customer to achieve an ROI which may be non-existent should the existing solutions meet the recovery requirements in place. As such education is critical. 

Additional challenges include network availability in more remote locations where a combination of deduplication and effective policy design are important. 

Advertisment

CIOL: Data dedupe adoption levels are still very low in India. Why?

VA: With data dedupe one can achieve granular recovery online for as long as possible and data growth running at between 50-150 per cent CAGR.

However, adoption of deduplication is typically the result of a prior move towards back up or archive to disk.  Most customers in India to whom we've spoken have not upgraded from traditional tape-based systems to disk as the primary storage media for servicing back up or archive. Without this move to disk media deduplication is not deliverable.

Advertisment

Additionally, there are costs associated with both deduplication and the move to disk media. 

CIOL: Do you think that, with the greater penetration of data dedupe technology, tapes will be replaced with disks in India? 

VA: Labour costs mean that manual moving of tapes around is less costly than in the West. However, the ability to restore data quickly from further back (retention times) and the increasing reach of compliance would drive massive data growth in secondary systems, also affecting the Indian market.

Dedupe vendors often focus on the removal of tape but if you can dedupe data to tape without negatively impacting your restore times, then you can make massive savings. Tape libraries use about three per cent of the power that the same space in disk uses, bringing the issue of cost of power into play.

Dedupe also becomes a ‘conveyer belt’ issue; it extends short-term capacity for fast granular restore but at some point that ‘conveyer belt’ runs out and tape has to kick-in again. ‘Re-hydration’ then becomes a problem and all those tapes and drives you thought you could scrap suddenly become valuable again – except exploding the data by up to 20x suddenly becomes a real problem.

CIOL: What will be the trends in the storage market with respect to data deduplication?

VA: Dedupe was needed in hardware because of the initial horsepower required to do it; as soon as software vendors like CommVault worked out how to do it efficiently then hardware vendors were always going to lose out; awareness of data type and purpose gives software vendors a massive efficiently advantage and allows the management to work very effectively.

It also allows for us to re-purpose the technique for many uses – why limit yourself to one type of hardware platform or software niche when you can have dedupe work for you across all of your secondary storage for less money?

tech-news