Advertisment

Identifying critcal process is the key'

author-image
CIOL Bureau
Updated On
New Update

Soumitra Agarwal, marketing director,NetApp India, told B.V.Shiva Shankar that an organization must determine what is the minimum infrastructure that should be available to users at any point of time; without incurring business losses. According to him, identifying real critical processes that cannot be interrupted at any point of time is most important. The excerpts of an interview:

Advertisment

CIOL: What is the importance of business continuity in the current scenario?

Soumitra Agarwal: Given that many businesses are 24x7x365, any downtime has a negative impact on the business. Downtime here refers to unavailability of data when it is required to be accessed; in other words, even if the infrastructure is up, but for some reason if access to data is not available, it is counted as downtime. Cost of downtime can be broadly classified into the following types:

Tangible costs: Lost revenue (e.g., despatches/invoicing from factory stops since delivery orders cannot be processed), lost manpower time (for e.g., waiting for the system to be up, while processing a transaction), marketing costs, bank fees/penalties, legal costs, lost intellectual property value (information value).

Intangible costs: Lost sales opportunities, employee retention, loss in share value, damage to brand, damage to goodwill, etc.

Advertisment

CIOL: What are the main challenges for business continuity?

SA: The important challenges that organizations encounter while drawing up a BC plans are:

Lowest possible recovery time objective (RTO),

  •  Significantly lower cost of acquisition and implementation,
  •  NIL or minimal need for additional network bandwidth,
  •  Flexibility to meet different recovery point objectives (RPO),
  •  Ability to leverage investment in existing IT infrastructure, and
  •  Type of DR required.
Advertisment

The two most critical parameters are RTO and RPO. Where RTO is NIL, the redundancy should be a fully-clustered architecture, where fail over is immediate. Where RTO is set at few hours, one does not need clustering, but will need to replicate data to the DR site periodically. Where RPO is access to the latest version of data, just before the disaster, then synchronous replication is recommended; where RPO is few hours or even days, asynchronous replication will do.

From the perspective of type of DR, organizations need to decide among the following options:

  •  DR for data only: Data should be available in case of disaster at primary site, but the users can wait for primary site to be up to pull back the mirrored data and start operations.
  •  DR for operations: Where users cannot wait for primary site to be up and want to be able to start operations from the DR site itself. In this case, not only the data, but even critical infrastructure like servers, applications, peripherals, network, etc. need to be replicated at the DR site.
Advertisment

In addition, the examples could be bill processing/printing for a utility company, vehicle dispatch and scheduling system for a logistics services company, etc. The BCP must take care - at a minimum -- of continuous operation of the critical processes.

As organization's business grows and evolves, the need for BCP will also change in terms of SLA expectations from users, cost of downtime, RTO/RPO, etc. It will be useful to periodically update the BCP of an organization.

 

Advertisment

CIOL: How well the industry is prepared to manage an impending disaster?

SA: In general, most organizations that depend on data for running key business processes are preparing themselves to manage a potential disaster. NetApp's objective is not only to bring to such customers our field-proven, industry leading solutions, but also to offer them our established consulting services, based on best practice expertise of our global professional services organization, to help them in BC planning.

CIOL: Is one-stop solution possible to ensure business continuity?

SA: Business continuity planning has to account for redundancy of different IT infrastructure layers - storage, network access, servers, applications, peripherals. We must not forget the most important component - people. While vendors at each layer have solutions for DR, the job of integrating various layers and implementing DR as a one-stop shop is usually performed by systems integrators. In some cases, enterprises first prefer third-party consultancy firms to assess and plan a BC implementation; and then they decide on the infrastructure for the same.

CIOL: What must be the steps taken by an organization, in terms of best practice, to prevent a possible potential disaster?

SA: Prevention is better than cure, they say! Where organizations have a choice, the first step is to choose the right location i.e. not to locate the primary data center in earthquake or flood-prone areas. Once the location is chosen, the organization needs to look at following aspects of planning their primary data center.

Advertisment
  • physical infrastructure (e.g.,. power supply redundancy, fireproofing)
  • Environmental (e.g., cooling)
  • Connectivity infrastructure (e.g., phone lines, datacom links and redundancy, thereof)
  • Technical infrastructure: Hardware, network, applications, storage, IT management.

At regular intervals, it is suggested that organizations have practice drills to sensitize personnel on what to do if a disaster strikes. The drill serves the purpose of testing the organization's readiness to respond to a disaster situation. However, the practice drill need not be restricted to IT systems only. It may cover a drill covering all functional areas. For e.g., an organization with an office in a high-rise building might want to test readiness to disaster by having a mock fire drill.

Organizations need to be fully prepared for a potential disaster; and have policies/procedures in place to minimize the losses arising out of a disaster.

CIOL: What are the popular solutions available in the market for DRP? What should be the basis for selecting DRP? Should recovery time objective (RTO) and recovery point objective (RPO) be the barometer or should it be dependent on the tuation?                                                                                         : SA: There are several considerations to be made when deciding which type of disaster recovery solution to implement:

  •  Is your goal high availability, ongoing backup and recovery, or remote recovery in the event of a site disaster?
  •  Which applications need protection? Not all applications are equal, so you need to tier applications based on how critical they are and have different sets of RPOs for each application tier.
  • What is your RPO in case of a disaster? A zero RPO implies no data loss, which can only be addressed by a synchronous replication solution. This requires a high-bandwidth network and can only mirror sites that are no more than 100 km apart. If you can afford to go with a higher RPO, an asynchronous replication solution with RPO in minutes or hours is much more flexible and cost-effective and has the ability to provide coverage for more of your applications.
  • What is your RTO in case of a disaster?
  • How far away from the primary storage site will the disaster recovery site be located? This will depend on the likely geographic impact of the disaster that you are protecting against. Many large organizations tend to mirror between their existing geographically distributed data centers as it eliminates the cost of setting up a new DR site.