Avoid Common Disaster Recovery Plan Pitfalls

Disaster Recovery planning can be painstaking.  There are so many nuanced areas of focus that it is easy to miss key information that could hinder or block restoring systems and data within the time frames required by the organization.  Exercising plans is essential to help illuminate these hidden risks.  Here are some items we frequently find missing even in very mature disaster recovery plans.

1. Escalation Criteria/Requirements – ensure the plan identifies a clear procedure for escalating not only the detection of an issue that may require plan activation, but the procedure for notifying key contacts when the recovery is not going according to plan.  Contact information is essential, of course, but identifiable and measurable criteria that, if met, would require the notification of key staff members is often undocumented.  Without these guidelines in place, key performers will continue to bang their heads against a wall while the clock ticks away when a simple report on the road block or request for assistance could have easily saved valuable time.

2. Data Backup – Few IT professionals overlook data backup under normal circumstances.  That isn’t always the case when disaster recovery environments are being utilized.  Ensure that the plan contains instructions for enabling the backup of data being entered into DR systems.   The business users of the backup systems should also be alerted as to the RPO for the DR environment.  If the RPO is not socialized, the assumption will be that the DR systems have the same capabilities as production, and any loss of data in the event of a DR system failure would make the post-incident review more than uncomfortable.

3. Special Authorities – document the special access rights necessary to perform recovery tasks.  Do not assume that personnel with access will be available.  Capture the procedure for obtaining the IDs/passwords necessary in the event that key performers are not able to work.

4. Log of Actions/Events – capture a log of the actions taken during the recovery.  It’s unfair for management to assume that every decision made during the event will prove to be the right choice.  It’s not unfair to assume that a decision made was the right one based on the situation at the time when the decision was needed.  The ability to refer to a comprehensive log of actions and events will prove handy in responding to questions when reviewing the incident.  The log will also be useful as a means of improving recovery plans.

5. Failback Procedures – ensure that the plan contains the procedures to reverse any automatic or manual failover performed during the recovery.  DR plans are often remiss in detailing how to return to normal.  The process may not be as simple as a stepping back through the failover procedure.  Make sure the procedure is exercised and well documented.

The Work of Innovation

There is a common misconception that innovation involves people sitting around thinking about things until something groundbreaking comes to mind.  That abstract notion of what it means to be innovative and to bring an innovative thought to reality has next to nothing in common with the truth.  The easiest way to prove this is to examine a few job descriptions for innovation leaders/managers/contributors. Yes, you can actually get a job in innovation.  That is because the true form of innovation is hard work.  It is not as esoteric nor conceptual as most would believe.  It is a discipline that is much more similar to science than art.

Checking the job descriptions for innovative leaders and contributors, we can identify several themes.  These positions all require a mix of all or most of the following:

  • Experience in designing and applying a structured methodology to conducting tests/experimentation

  • A disciplined approach to evaluating problem statements and solutions

  • Experience in developing strategy statements and recommendations

  • Prior involvement with feasibility studies and the development of business cases

  • Ability to produce results individually and as a part of a dynamic cross-functional team

  • Ability to creatively apply technology to solve problems

  • Mature project management skills

  • Experience in gathering and analyzing consumer insights

  • Experience in gathering data and translating it into relevant implications and strategy

There is very little magic in the list above.  Judging from the requirements for working in the field of innovation, innovation largely consists of establishing and managing a scientific, measurable method of testing and evaluating a possible solution for feasibility and effectiveness against the strategic goals of an organization.  In other words, innovation is hard work.

This means that organizations who are typically thought of as ‘innovative’ are not beating their competition with some type of complex creative thought that is innate and gifted to a select few employees of that organization.  These organizations are not winning because they have managed to secure more high-level creative type employees than their competitors.  They are not lucky, nor have they discovered the secret to harnessing the portions of the brain that most of us cannot.  They are quite simply working harder than everyone else, and they are doing it consistently.

Starting a Business Continuity/Disaster Recovery (BC/DR) Program

This series is dedicated to providing direction for applying Project Management principles to starting a Business Continuity or Disaster Recovery (BC/DR) Program.  This is the first installment of a multi-part series.  In this installment we will focus on the Project Initiation phase.  Subsequent segments will be aimed at additional phases of starting a BC/DR Program, on improving an existing BC/DR Program, and on elevating a mature program to a new level of efficiency and effectiveness.

Starting a Business Continuity Program

Launching a BC/DRBC/DR Program requires its own plan.  This is not a plan as in a recovery or response plan, but a plan in the sense of a project plan.  Starting a BC/DR is no different than starting any project, and success essentially hinges on your project management skills.  You may want to reach out to the Project Management Office (PMO) if you are fortunate enough to be part of an organization that has one.  The PMO may be able to provide an experienced project manager who can assist by applying current project management theory and techniques to the initiative.  If your organization does not have a PMO, or a resource is not available, then gaining a basic understanding of project management is the starting point.

There are many available information sources for project management principles.  The Project Management Institute (PMI) http://www.pmi.org/ is the leading authority in the field.  The PMI offers training and certification and most community colleges and universities offer courses in project management.

So let’s take a real-life approach to this and assume that you were invited into your supervisor’s office or your supervisor’s supervisor’s office on Friday afternoon, and, due to some outstanding work in a field that has nothing to do with business continuity or project management, you were “offered the opportunity” to start and lead the organization’s business continuity program.  You will do this, of course, while managing your non-business continuity, non-project management work responsibilities.  I feel your pain.  So, here’s where you are: you didn’t sleep much this weekend, you have a huge new project in your lap along with a bunch of other things on your already-full plate, and you’re probably not getting enough time, money, or people to make it happen.  Step 1 – keep reading.

This is still a project, and we still need to approach it as such despite the possibility that we are short on time and resources.  Here are the basics we need to know about project management and its application to starting a BC/DR.

Project Initiation

Project initiation is the first phase of project management.  Project Initiation is typically where a business case is created to provide the rationale for undertaking the project and proving that it is feasible.  Management will use the business case to ultimately determine if the project will be approved.  This may have already taken place and the project assigned to us after the fact.  If, however, we will be part of creating the business case, there are a ton of templates available online as well as recommendations for writing a good one.  Check internally first because there may be a standard template specifically for use by your organization.

The Business Case for a BC/DR
The business case needs to explain the why for performing the project.  Focus on describing the need for the project and how it solves an issue that the organization is facing.  Provide examples that are not exclusively IT focused as this can expand the scope of the case beyond traditional boundaries and allow areas like Supply Chain, HR, and other customer impacting areas to be included or considered. Without a BC/DR Program, the entire organization is at risk.  The organization could experience a disruption that causes injuries to associates and/or the inability to provide the products and/or services normally provided to clients.  Without a BC/DR Program there is a risk in regard to providing the safest possible working conditions for employees, and there are operational risks that could include regulatory and contractual breaches, diminished reputational status, financial loss and loss of financial opportunity, and a diminished competitive capability.

The goal of the project is the creation of a program that is focused on improving safety for all personnel and raising the state of readiness for the organization by understanding and mitigating risk and instilling an ever-improving culture of resilience.  The business case should demonstrate the value of performing the project.  For this part refer to the Business Continuity Institute (BCI). http://www.thebci.org  The BCI is a leading authority in the field of business continuity.  The BCI offers a paper for download that details how business continuity delivers ROI.  http://news.thebci.org/news/business-continuity-delivers-return-on-investment-164635  This section can also leverage relevant industry requirements.  These are often the driver for the creation of a BC/DR Program.  Depending on the industry, the ability of an organization to continue operations can hinge upon proving it has an effective BC/DR Program.

While the benefits and ROI of implementing a BC/DR Program can be difficult to express numerically, one way to do so is to establish the cost of downtime.  The factors involved in determining the cost of downtime will vary greatly from industry to industry and organization to organization, but if we can have a few minutes with the CFO, we may be able to derive a dollar amount that can adequately highlight the value a BC/DR Program will bring.  (The CFO would make a great Executive Sponsor – keep this in mind for later.) Ask for an estimate of the losses expected for a day where no work activity could be performed.  If you are part of an organization where the products and services provided are extremely time-sensitive, the cost of downtime may be measured in hours, rather than days.  In either case, the value of a BC/DR Program is in improving safety for employees and mitigating against the cost of downtime.  Be careful not to infer that a BC/DR Program will ensure safety or that downtime can be completely avoided.  A BC/DR Program can only promise to improve safety and minimize downtime.

The business case will also need to detail the requirements for the project.  In this section we need to provide information on what will be done, who will do it, how it will be done, and the timeline (when) for completion.  Who will depend on how many people we can involve.  If it’s just going to be you, you may want to include estimates for contracting with outside consultants.  If it is just you, be savvy with the timeline estimate because the revision process for the business case will most certainly include shortening the project time frame.  These project requirements will set you and the organization up for success.  Understanding your current team’s high-level bandwidth, level of effort, and deadlines will help you determine the resources required to meet your project goal.  We see too often organizations asking employees to “Just Do it!” and these eager employees struggle with trying to do more with limited resources.  Planning will provide a logical progression to achieve success and meet your organization’s goals.

We can be more certain regarding what will be done and how it will be done.   Here are some traditional deliverables (what will be done) for the project:

  1. Business Continuity Policy

  2. Business Impact Analysis

  3. Threat Evaluation

Understand that there is a debate within the Business Continuity industry over whether to perform the Threat Assessment or the Business Impact Analysis (BIA) first.  We will not wade out into that discussion in this installment; although you can see we’ve placed the BIA before the Threat Evaluation.  Our position is that the BIA should come first; however, there is enough flexibility in the sequence that they can be performed concurrently if desired.

The Business Continuity Policy will establish the requirements and responsibilities for the BC/DR Program.   The Threat Assessment will examine the likelihood, impact, and state of readiness for threats to the organization, and the BIA will establish the Recovery Time Objective (RTO) for the processes engaged by the organization.  (The RTO is the measurement of time in which a business process or service must be recovered following a disruption.)  Note that we are referring to our deliverable as a Threat Assessment, rather than a Risk Assessment.  These are two different things.  A threat assessment is identifiable with standard business continuity procedures while a Risk Assessment is wider in scope.  The Threat Assessment and BIA will provide the background and organizational understanding for establishing the program.

Prior to writing the Business Continuity Policy, it will be helpful to review a few resources:

The documents above will give you the essential steps for completing the tasks required to starting a program, and, more importantly, will provide you with an overall understanding of what is necessary for establishing a successful BC/DR Program.

As you formulate the Business Continuity Policy, cite the need for a Steering Committee.  The Steering Committee should include an executive sponsor – someone from upper management who agrees to serve as the chair of the committee.  (Recall the reference made earlier to the CFO.) The executive sponsor provides a valuable top-level presence to the program, functions as the voice of the program to other members of executive management, and assists in avoiding and ending impasses that could occur between equals.  Include a suggested structure for the Steering Committee.  In addition to the Executive Sponsor/Chairperson and the BC/DR Manager, propose that leadership from the business areas of the organization also serve as committee members.  Their support for the program will be essential to long term success.  We will eventually request each business area participate in the BIA and in building and maintaining recovery plans.

 Designing and delivering an effective BIA is a major endeavor.  The Business Case should include the BIA scope, design, and delivery method(s).  There is some cross over here between Project Initiation and Project Planning.  We will need to plan the project at least at a high level in order to provide an idea of the scope of the BIA.  Determining the scope of the BIA is the first task.  The size and structure of the organization as well as the staff that can be allocated to the task will be considerations.  If the staff is not considerable, but the size of the organization is, it may be necessary to implement the BIA in carefully planned phases or to narrow the scope to a limited portion of the organization.  Part of that determination should include the implementation method(s).  Face-to-face meeting are preferred, but they may not be feasible given resource restrictions.  The use of a business continuity software tool may  help as well.  Distribution of electronic files developed in Word or Excel can be effective, but compiling the data for analysis and reporting can be time consuming.  A blended approach to implementation is often required given restrictions on travel and staffing.  If company culture allows consider engaging an external consulting firm to collaborate on the design and provide the delivery of the BIA.  This may be the best possible use of any financial resources the project may include as the results of the assessment will be delivered along with external endorsement.

As for BIA design requirements, capture the need to measure impact using a qualitative and quantitative method.  Many organizations allow BIA participants to provide their opinion on how serious the impact of the outage would be within their area of specialization.  This is not recommended as most people are passionate about their work and find it difficult to provide an estimate of impact without allowing that passion to bias their assessment.  If specific criteria are provided for determining impact, the BIA results are more likely to represent an accurate depiction of how an interruption would affect normal activities.   This will be vital for selecting appropriate recovery strategies later.  Include the time frames in which RTOs will be expressed.  Provide a Tier structure that defines how processes will be categorized.

The policy should also state that the BIA will capture dependencies on IT assets and vendors.  Speaking with IT leadership is advised as IT may already have RTOs and classifications for applications and assets.  Sharing the same measurements, if possible, will simplify the mapping of IT dependencies and the identification of gaps between business needs and IT capabilities.  Detail the need for IT to provide current application Recovery Time Actual (RTA) and Recovery Point Actual (RPA) information.  The RTA is a measure of time in which it has been demonstrated that an application or other IT asset can be recovered.  The RPA is a measure of time indicating the true age of the data associated with an application that can be recovered by IT.   In some cases a disruption may mean that data entered into an application will be lost if it was entered within a certain time period prior to the disruption.  These measurements will ideally come from the results of IT recovery exercises, rather than estimates of what is currently possible.

Include the minimum requirement for refreshing the BIA in the policy.  Many organizations will perform the BIA on an annual or bi-annual basis.  The available methods of delivery and staffing will factor into how often the BIA can be repeated.  If a software tool to support the BC/DR Program is available, indicate that the BIA should be updated whenever there is a change in how a process is performed, where it is performed, or if the technology utilized or the role of a supporting vendor is amended.  Maintaining BIA data continually allows the organization to be more confident in the selection of strategies for recovery and more efficient in managing the resources allocated to enabling those strategies.

The Threat Evaluation should provide a score for potential threats to the organization that considers the likelihood of the threat and the expected impact if the threat were realized.  The Good Practice Guidelines provides a useful scoring model for threat assessments.  Enhance the model by accounting for any mitigation measures in place to reduce each threat.  This will ensure that the most likely and most impactful threats come to the forefront.  In order to determine the likelihood of each threat, examine historical disaster frequency data.  Here are a few websites that may be helpful:

https://www.unisdr.org/we/inform/disaster-statistics

https://ourworldindata.org/natural-catastrophes/

http://www.ifrc.org/world-disasters-report-2014/data

http://www.emdat.be/database

https://www.fema.gov/disasters/grid/year

Understand that accounting for every conceivable threat is not possible.  Try to keep the analysis simple.  The assumption should be that both the BIA and the Threat Assessment will evolve and improve over time and as the organization changes.

The policy should include specifics for program assessment and reporting.  Include information on the standards that should apply to the program based on your review of IS22301 and other relevant industry-specific requirements.  Your location in terms of state/province and nation may require additional compliance standards for the program.  The standards ultimately adopted by the organization, as well as those applied by your industry and government entities, will drive much of the design of the status reporting that is necessary for the program.

Internal and external audit findings should be part of the program reporting requirements.  Reach out to the Internal Audit Department if possible to request a collaborative effort on areas of compliance and to introduce them to the relevant standards.  For BIAs, include reporting on completion rates, updates, reviews, and overall approval statuses.  Outline reporting on the RTO and Tier results from the BIA.  Reports detailing dependencies and any gaps between business needs and IT and vendor capabilities should be outlined.  Sample Threat Assessment reports are available online.  The threat assessment is not something that will need to be refreshed often.  It will rather be repeated for all locations for the organization and for newly acquired locations should the organization experience growth.

Following the advice provided here, a very persuasive business case can be developed to support the need for a BC/DR Program.  With the steps provided herein completed, we are through Phase 1 of the project.  Watch this space for the next installment covering Phase 2 – Project Planning.

Should Your Organization Use Business Continuity Software?

The debate over the use of software for business continuity planning is typically focused on the perceived value of the system functionality. Software vendors champion their automation features while critics cite the licensing cost and the complexity of implementation and administration. Most organizations hinge their final determination on whether the system capabilities are viewed to be worthy of the resources required to use the tool properly. This analysis is often flawed in that many organizations perform their evaluation while focused solely on current software capabilities and organizational requirements in conjunction with the present state of business continuity. The advantages of properly implemented business continuity software only expand as an analysis matures to include the long-term goals of the organization and the direction of business continuity as a whole.

The functional benefits of business continuity software are numerous:​

Business continuity software facilitates global data updates by cascading individual changes throughout the system. This marks a direct return on investment that increases as the system is configured to import from or link directly to external systems of record.​

Software improves standardization across the enterprise. While a document template will facilitate standardization to a certain degree, business continuity software typically allows administrators to enforce planning requirements using security and planning wizards/assistants/navigators. Planners must work within the framework designed for them. Many software packages allow for plan completion tracking and reporting of completion rates across the enterprise.

Most software packages allow end users to map recovery dependencies illuminating relationships and enabling the remediation of exposures. When plans are developed in silos, the risk that recovery time objectives are not supported by predecessor business processes and/or information technology systems is magnified.

​Software allows data integration across modules. Many software systems have evolved to include modules for business impact analysis, emergency notification, and incident management. Sharing the same database allows these software systems to support data sharing between plans, BIAs, emergency notification systems, and incident management tools.

​The latest versions of business continuity software have dramatically increased their level of continuity intelligence. Some vendors have developed planning tools that incorporate guidance based on current industry standards such as BS25999. The standard plan wizards/assistants/navigators include industry-specific methodology and allow for the further customization of end-user guidance.

Business continuity software facilitates responses to organizational changes. As organizations restructure, the storage functionality in most software packages enables plans to be relocated to reflect changes in business structure or geographical footprint. Plans can target the response and recovery of locations, business processes, applications, or network nodes. More importantly if a current plans scope is to be divided across multiple plans, some software offers the ability to move a central component with all of its recovery details between plans. This type of change in word processing tools or spreadsheets is manual and cumbersome.

Evolving planning initiatives are accommodated more freely through software. The risks highlighted by events just over the last few years have renewed the industry focus on exposures associated with pandemics, nuclear energy production, and supply-chain resilience. Planning wizards/ assistants/navigators can be updated to address these new initiatives and assigned to all or specific plans quickly. These planning tools can be enhanced to deliver instructional details for meeting new organizational guidelines and standards and to assist planners as they work to capture steps for addressing new threats. In the ever-changing business continuity landscape, this is critical.

​Software supports the creation of business continuity metrics. A relational database allows the creation of complex reports that summarize business continuity information across all plans. Management increasingly requires an enterprise-level view of the current state of preparedness in order to determine program direction. Manually gathering data from documents for the creation of metrics is a monumental task, and few organizations are staffed at levels that allow for the consistent and continual collection of the required information. In the absence of a database, the generation of metrics will be too infrequent to provide value. Additionally, if metrics data must be compiled manually, there is a much greater risk of error. Strategy development is hindered if there is a lack of confidence in the accuracy of data and its ability to be representative of the entire organization. Management may be reluctant or unwilling to act on the information. As a result planners will view their work as less meaningful to the organization.

​Implementing BC Software will drive program commitment, innovation, and advancement:

​Implementing software for business continuity planning improves the individual sense of plan ownership. Recent business continuity standards speak to the need to move beyond plan creation to the creation of an organizational culture of resilience. The goal is an embedded sense of risk awareness. Planners must be conscious of threats to safety and normal organizational activities, and they need to view their continuity plans as integrated components of normal processes. Creating that elevated sense of ownership is easier if planners recognize a significant investment of resources in support of resilience. Ironically, key aspects of the argument against business continuity software – cost and the challenge of implementation – become psychological allies in creating a resilient culture. The investment in business continuity software sends several impactful signals to the planning community. The first is that the program is not only approved but directly supported by senior management. Planners will view the dedication of financial and human resources as a tangible measure of the importance of the initiative to the overall organization. Planners will expect that their use of the tool and the output of their work will be evaluated. This valuation is enhanced as key stakeholders are involved at critical points in the system development life cycle and in the governance and change control processes.

​As business continuity tools facilitate summary reporting, senior management can further mold a culture change by acting on the data and addressing exposures. If the data collected by and reported from the system is acted upon and creates change, planners will see the direct value of their work and view the effort to create a resilient culture as sustained. This is not to say that an organization cannot create and sustain a resilient culture without software. The challenge is much more significant, though, if the end users cannot identify a direct connection between the communications regarding the importance of the initiative and the resources dedicated in support of it.

​The determination on whether to implement business continuity software should incorporate future organizational needs and the direction of continuity as an industry. The means of creating plans needs not only to support the planning requirements for today, but it should be flexible in adapting to the changing needs of the organization. The content currently mandated for plans will evolve as the organization changes. As new threats emerge, the device where plan data is captured will need to allow for that evolution. The question to ask is does the current mode of planning provide the agility necessary to change the criteria for what is now considered a comprehensive and actionable plan? In the case of isolated, unrelated documents created using a template based on the organizational needs of the moment, the answer is no. Planning tools must be capable of supporting the regular revision of requirements and the distribution of new guidelines as the organization changes, new threats emerge, and new compliance standards are applied. Organizations using business continuity software will find it easier to revise planning requirements and implement them across the enterprise than those organizations using templates for word processing or spreadsheet programs.

​Trends in business continuity further the argument for the use of software:

​Continuity programs are increasingly finding themselves reorganized within the realm of risk management. It is a logical change. Business continuity bridges the gap for risk management by protecting the organization from prolonged outages caused by random events and from the cumulative related effects of an event that are difficult to identify through typical risk analysis. As a discipline of risk management, business continuity will be increasingly required to quantify resilience capability. One way software has begun to address this need is the concept of a continuous business impact analysis. For most organizations a business impact analysis is a yearly or less frequent endeavor. There is software available today that facilitates a continual BIA update capability in conjunction with the traditional plan update capability. These tools allow organizations to continually review current impact information rather than cycling BIA efforts on a yearly or less frequent basis. The focus of these tools on impact allows them to more closely align business continuity with risk management. If a more frequent or continual analysis of business impact is needed in the future, data must be captured so as to easily be revised, collected, and summarized. Continuity software provides a decided advantage in this regard.

​An increasingly closer alignment with governance and compliance standards is also emerging in the field. Business continuity governance and compliance is not new; however, the standards are more refined, the number of industries held to stringent guidelines is increasing, and the standards are revised more frequently than in the past. There are several software systems currently available that incorporate the more recognized standards and provide a means of measuring compliance. Until recently these capabilities were limited if available at all. Some of the more robust systems not only include the capability of guiding users toward the creation of compliant plans, but allow for the measurement of plan compliance. Administrators can select the applicable standard and generate data to determine the current level of compliance. This is a major step for business continuity software as earlier generations of these programs provided only the means for creating plans while assuming the user was well-versed in continuity.

The recent software advances highlighted here point to a final trend for the industry. The number of business continuity software vendors has grown exponentially over the last few years. Their success will depend upon their ability to outperform their many competitors. The consumer clearly is the beneficiary with this increase in competition. The result will be valuable gains in functionality, ease of use, business continuity intelligence, and more competitive pricing. Increased competition will also mean more rapid responses to changes in the industry, and improved responsiveness to their client needs. The BC maturity gap between organizations utilizing continuity software and those that are not will only widen as software capabilities become more robust.

​Organizations that implement business continuity software will derive functional and non-functional benefits providing them with a competitive advantage that will only widen as business continuity moves into the future. The evolving demands on continuity programs are too great to be managed in a means that was not intended specifically for business continuity.