Quantum says that data de-duplication reduces the bandwidth required to transmit data over networks by 90 percent or more. He is of the opinion that de-duplications techniques need to be integrated with the other elements of data protection strategy.
In an interaction with CIOL, he talked in detail about the need for data de-duplication among enterprises and the benefits it brings.
CIOL: What are the key business benefits of data de-duplication?
Jim Simon: The business benefits from
data de-duplication start with increasing overall data integrity and end with reducing overall data protection costs. Data de-duplication lets users reduce the amount of disk they need for backup by 90 percent or more.
With reduced acquisition costs—and reduced power, space, and cooling requirements—disk becomes suitable for first stage backup and restore and for retention that can easily extend to months.
With data on disk, restore service levels are higher, media handling errors are reduced, and more recovery points are available on fast recovery media. What all of that really means is that data protection is improved, service is faster, and costs are reduced.
CIOL: Naturally, companies considering de-duplication are wary of losing vital data that's falsely deemed duplicative. Is this an issue, and how can companies implementing data de-duplication technology guard against this eventuality?
JS: The base technology used in the mainstream data de-duplication systems was built around methodology designed with the integrity of user data as the first concern. I'm in a good position to comment on this topic because the primary patent for variable-length, block-based data de-duplication is held by Quantum Corporation—that means the developers closest to the technology are part of a company that is an industry-leader specializing in backup, recovery, and archive.
Incidentally, this data security is not just theoretical--today, there are thousands of users all over the world safely protecting petabytes of data with products that rely on data de-duplication techniques.
CIOL: Major disasters such as Hurricane Katrina and new laws enacted specifying data retention and retrieval policies for litigation purposes are making companies wake up to the stark realities associated with their disaster recovery capabilities. What advantages can data de-duplication technologies offer in terms of disaster recovery?
JS: This is a very important question. When you write backup to conventional disk, you always need to carry out another step to provide site-loss protection, and as Katrina and the recent fires in Southern California remind us, disaster recovery protection is absolutely essential for critical data. Data de-duplication really helps this issue because it reduces the bandwidth that's needed to transmit data over networks by 90 percent or more.
That happens because most backup jobs only hold a small percentage of really new data—typically less than five percent. By linking replication with de-duplication, we can transmit an entire backup set over a network, but only have to move a few new blocks. That means that replication over standard WANs is, for the first time, a practical tool for DR—and users can create remote copies of data every day without having to transport tapes.