Cyber Resilience - Risk Management Evolves

The development of modern cloud-based mobile applications has changed much of the IT landscape. New methods of application development (Agile & DevOps) have driven organizational change and brought IT to the forefront of new digital business models. Hyper-connected, massively distributed Cloud applications are also driving changes in traditional IT methods for business continuity and security - two primary functions missioned to manage IT risk.

Protecting information systems and ensuring their availability has been the province of the business continuity management (BCM) function. On the other hand, access control, privacy, and identity management have been handled by IT security. Historically separate, both functions play an important role in managing operational risk. As cloud and mobile applications evolve several factors suggest the need for a new approach to manage IT risk:

  • Cyber breaches are increasing in frequency. Many threat types such as zero-day exploits and Ransomware create significant business impact with little or no advance warning
  • Acceptable downtime for most IT systems is < 4 hours. Many systems are designed to be resilient with no downtime and minimal data loss placing pressure on business continuity and cybersecurity incident response teams
  • Business continuity and security metrics are often presented in technical terms. For example, the amount of malware detected and blocked, or the percentage of backups successfully completed. Senior executives need to understand these programs in the context of business risk
  • Protecting brand reputation is more important than ever. Companies must respond to disruptions (natural or cyber) with speed and accuracy to ensure minimal impact to brand reputation and external partners

While BCM and cybersecurity are managed separately, both teams must work together to develop and manage a common incident response process. This provides management and company stakeholders with the confidence their organization has a consistent way to defend against threats and respond in a manner that protects their brand.

This new approach, merging security and business continuity functions (while balancing risk and budget) is known as Cyber Resilience. Among the benefits of this approach are better incident response, an improved ability to manage risk, and more effective coordination of resources.

Response to cyber threats has created new challenges that business continuity planners must deal with. Unlike a natural disaster, cyber events create crime scenes. Law enforcement and external stakeholders must be engaged and public image must be managed. Business continuity and cybersecurity planners must develop response plans to manage these new threats.

Best practices for Cyber Resilience are still emerging. Most companies manage BCM and cybersecurity separately, while others have merged the functions. Because BCM and cybersecurity programs often compete for the same risk funding, it’s not uncommon to see conflicts in how these programs are led and managed.

One trend we are seeing is senior executives integrating these functions under the role of a Chief Risk Officer (CRO). The goal is to create a holistic view of risk and a common method for organization, governance, and funding. The CRO is responsible to allocate funding where it’s needed most and drive BCM and cybersecurity programs to ensure they are coordinated, integrated and delivering value.

The nature of risk is constantly changing as Cloud and mobile applications evolve. Internal and external cyber threats will increase as the numbers of blackhats grow.  Consequently, BCM and cybersecurity will continue to evolve as Cyber Resilience tools and techniques mature and companies develop new ways to manage risk.

The CRO may become your new best friend.

 

This blog was co-authored by Jeff Marinstein, Founding Principal of Marinstein & Co. and Michael Puldy, Director of Global Business Continuity Management at IBM.

The Role of Analytics in Disaster Recovery

This is part 1 of a multi-part series on the evolution of analytics in disaster recovery

It may seem odd to discuss the role of analytics in the field of disaster recovery. These disciplines appear to have little in common. Wikipedia describes Disaster Recovery (DR) as a set of policies and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Analytics is described as the discovery and communication of meaningful patterns in data.

In this series I'll discuss how analytics will improve resilience, lower risk and enhance business continuity. I'll explore how analytic DR services could come to market, which parties stand to benefit most, and some of the challenges that lie ahead. Part 1 will discuss how analytics will enhance disaster recovery (near term) and a vision in which analytics and automation are combined to improve risk management. 

The evolution of DR closely follows the development of IT, providing methods, products, and services to recover systems within required time frames and levels of data currency. From the early 1980’s until about 5 years ago disaster recovery mainly focused on the backup and recovery of physical computer systems. Given the need to recover physical systems to a like environment, vendors aggregated clients with like IT environments to provide shared DR services. These services made DR more affordable to many companies. This model of recovering physical systems worked well when acceptable downtime for most IT systems could be measured with a calendar.

Today, this is no longer true. Over 90% of all new applications are being developed for the Cloud. Cloud infrastructure, application characteristics and data structures are different. Cloud workloads are deployed in virtual environments, often spread across geographic boundaries. Many companies use combinations of private and public (hybrid) Clouds to run their applications. Cloud resources are dynamically added and removed based on capacity demand. And forget that calendar; downtime tolerance for most Cloud systems is minimal, measured with either a clock or stopwatch. 

By capturing and analyzing metadata stored in the Cloud stack companies will be able to gain deep insight into data protection and disaster recovery. Analytics can be applied across the IaaS/PaaS layer and across DR functions to help companies better understand data protection and DR functions such as backup, replication, DR testing, and system recovery. It should be noted that some tools used in physical DR setups capture data that can be analyzed to gain insight into discrete functions, e.g. the success rate of data backups. Cloud analytics will allow companies to gather information across the spectrum of data protection and DR functions to gain insight into how DR is working, and how Cloud resources can be optimized. Analytic data and algorithms will be used to make recommendations on how DR processes can be improved to produce better outcomes.

DR analytics will benefit companies and vendors alike. DRaaS vendors will use analytics to optimize DR capacity and costs across Cloud infrastructure. Metadata can be mined across customer segments to produce useful benchmark data helping customers improve DR and BC management.

The first wave of analytic implementations will be used to help companies improve data protection, monitor compliance, enhance DR testing, and design affordable resilience for critical IT systems. Analytics will also be used to help optimize DR Cloud capacity, costs, performance, and resource allocation.

But the use of analytics will not stop there. Cloud automation, inter-Cloud operability, IoT, and predictive analytics will be combined to usher in a new era that may change how DR is performed today. I define this new era as predictive risk management. Predictive analytics will examine a variety of threat and risk data (in real time) and determine if critical Cloud workloads are exposed to unacceptable levels of risk. These analytic models will be combined with Cloud automation to move workloads out of harms way. This model of resilience will change how companies manage risk and how DRaaS vendors provide service. In future blogs I will discuss how this model might evolve and some of the challenges involved in bringing predictive risk services to market.

Disaster recovery techniques and technologies have evolved greatly over the past 30 years. Analytics in DR and the rise of Cloud computing will bring significant benefits helping companies design truly resilient systems and optimize DR functions in ways never before possible.