Cyber Resilience - Risk Management Evolves

The development of modern cloud-based mobile applications has changed much of the IT landscape. New methods of application development (Agile & DevOps) have driven organizational change and brought IT to the forefront of new digital business models. Hyper-connected, massively distributed Cloud applications are also driving changes in traditional IT methods for business continuity and security - two primary functions missioned to manage IT risk.

Protecting information systems and ensuring their availability has been the province of the business continuity management (BCM) function. On the other hand, access control, privacy, and identity management have been handled by IT security. Historically separate, both functions play an important role in managing operational risk. As cloud and mobile applications evolve several factors suggest the need for a new approach to manage IT risk:

  • Cyber breaches are increasing in frequency. Many threat types such as zero-day exploits and Ransomware create significant business impact with little or no advance warning
  • Acceptable downtime for most IT systems is < 4 hours. Many systems are designed to be resilient with no downtime and minimal data loss placing pressure on business continuity and cybersecurity incident response teams
  • Business continuity and security metrics are often presented in technical terms. For example, the amount of malware detected and blocked, or the percentage of backups successfully completed. Senior executives need to understand these programs in the context of business risk
  • Protecting brand reputation is more important than ever. Companies must respond to disruptions (natural or cyber) with speed and accuracy to ensure minimal impact to brand reputation and external partners

While BCM and cybersecurity are managed separately, both teams must work together to develop and manage a common incident response process. This provides management and company stakeholders with the confidence their organization has a consistent way to defend against threats and respond in a manner that protects their brand.

This new approach, merging security and business continuity functions (while balancing risk and budget) is known as Cyber Resilience. Among the benefits of this approach are better incident response, an improved ability to manage risk, and more effective coordination of resources.

Response to cyber threats has created new challenges that business continuity planners must deal with. Unlike a natural disaster, cyber events create crime scenes. Law enforcement and external stakeholders must be engaged and public image must be managed. Business continuity and cybersecurity planners must develop response plans to manage these new threats.

Best practices for Cyber Resilience are still emerging. Most companies manage BCM and cybersecurity separately, while others have merged the functions. Because BCM and cybersecurity programs often compete for the same risk funding, it’s not uncommon to see conflicts in how these programs are led and managed.

One trend we are seeing is senior executives integrating these functions under the role of a Chief Risk Officer (CRO). The goal is to create a holistic view of risk and a common method for organization, governance, and funding. The CRO is responsible to allocate funding where it’s needed most and drive BCM and cybersecurity programs to ensure they are coordinated, integrated and delivering value.

The nature of risk is constantly changing as Cloud and mobile applications evolve. Internal and external cyber threats will increase as the numbers of blackhats grow.  Consequently, BCM and cybersecurity will continue to evolve as Cyber Resilience tools and techniques mature and companies develop new ways to manage risk.

The CRO may become your new best friend.


This blog was co-authored by Jeff Marinstein, Founding Principal of Marinstein & Co. and Michael Puldy, Director of Global Business Continuity Management at IBM.

Cybersecurity Skills Grow in CT

In their 2016 State of Cybersecurity report ISACA and RSA found that 74% of companies surveyed expect to fall prey to a cyberattack in 2016. In 2015, 60% of the survey's respondents were victim to a phishing attack; 30% of those claiming the attacks occurred on a daily basis. 82% of companies report their Board of Directors are either concerned, or very concerned about cybersecurity. 

Despite the rise in threat levels, the skills gap in cybersecurity remains a serious problem. The security profession is struggling to find well-trained, high-skilled workers to fill open positions. More than 60% of organizations have too few infosec professionals. Here in CT every major company has open jobs for cybersecurity professionals. Almost one-third of companies report that it takes 6 months to fill these jobs. Another 9% cannot fill open positions. This skill gap is causing companies to hire people with insufficient skills and invest in training. 60% of companies report that half (or less) of their cybersecurity job applicants are qualified upon hire. 

The most significant skill gaps are the inability to understand the business and lack of communication skills. This skill gap affects all levels of cybersecurity professionals. In my previous blog I noted that many CISOs lack the ability to describe cybersecurity in business terms.  On-the-job training and certification are the top methods of combating this skills gap.

For SMBs the problem is more acute. Smaller companies often lack the budget to properly address the cyber threat. The lack of robust security increases their risk. Difficulty hiring skilled professionals leaves them vulnerable. For these companies it may make sense to use a managed service provider (MSP) to improve their security. MSPs combine leading technology with skilled professionals to offer cybersecurity services. Companies offload the burden of selecting, installing and managing complex technology while having trained cybersecurity experts monitor and manage their environment and mitigate risks.

Southern CT has seen a surge in the availability of tech talent fueled from a variety of government, quasi-government and non-profit activity. Into this growing talent pool we welcome Blackstratus, which has moved their CYBERShark security-as-a-service operating unit to Stamford, CT. CYBERShark takes Blackstratus' proven security and compliance platform and delivers it at a fraction of the cost in the Cloud. The service provides 24x7 monitoring, real-time alerts, and remediation for malicious activity. 

"We're truly excited to be part of CT's thriving tech community and really excited to part of CT's extended and integrated ecosystem for doing business here," Blackstratus CEO Dale Cline told several dozen employees and public officials. Read more about the Blackstratus announcement here and get more info about CYBERShark and Blackstatus here

Cyber Risk on the Rise

This week I attended an excellent conference on Cyber Security. TakeDownCon run by EC-Council and hosted by the UConn School of Business in Stamford, CT provided great speakers with separate tracks for CISOs and technologists. I highly recommend an EC-Council event if you’re looking to learn more about Cyber Security or obtain certifications.

In 2015 over 169 million personal records were exposed as a result of cyber intrusions; the result of more than 780 publicized breaches across education, healthcare, government and financial sectors. The average cost per stolen record exceeded $150. In the healthcare sector the cost per stolen record was $360. Despite the rising threat posed by foreign governments, hacktivists, and cyber criminals only 38% of global organizations report they are prepared to handle a sophisticated cyber attack.

Here are some key takeaways from the conference:

·      Companies are not framing the issues of cyber risk in business terms. This creates a disconnect with senior executives and the Board of Directors. Cyber programs produce volumes of data and dashboards, but do little to describe Cyber Security issues in business terms. As a result many programs remain underfunded and understaffed despite the growing threat landscape.

·       An effective cyber program cannot be implemented until a company knows where all of its data is, who would want to access it and why. As computing becomes more distributed (through Cloud and mobile) it becomes harder to identify where all the data is.  The growing number of endpoints increases the cyber threat. Many companies cannot identify how many servers they have and where all of their data is located.

·      There is an inherent tradeoff between security and convenience. Senior executives are often unwilling to sacrifice convenience for better security. Weak passwords, poorly administered systems, and the proliferation of devices with poor security controls are examples of vulnerabilities that stem from the desire for convenience. Hackers exploit these vulnerabilities with relative ease.

·      There are hundreds of vendors selling security products and services. According to the experts most of these are of limited use. Security products are implemented without a properly designed risk management framework; in essence many companies throw technology at the problem only to find that they are still vulnerable to hackers. Products end up providing a false sense of security unless the company has learned how to manage risk.

·      The majority of cyber attacks result from exploiting human behavior, e.g. opening email attachments which install malware. Companies are beginning to develop analytics to examine and predict behavior and identify employees who may attempt to steal corporate information. These analytics examine online behavior, badge in/out times, login times, system use, files downloaded/copied, social media activity and other HR related data to profile employees. These behavioral analytics are a new line of defense for companies and may become a Cyber Security best practice as they evolve.

·      Effective CISOs can add business value beyond protecting the company. A CISO at a major retailer installed thermal imaging on in-store cameras to analyze the traffic patterns of shoppers. Company executives used this data to tailor product placement based on traffic flow. By placing high margin items in strategic high traffic locations the company increased profit by 4%.

·      US law prevents companies from using certain techniques that could help thwart cyber attacks. Federal and State computer crime laws make it illegal to hack (unauthorized access to a computer system).  As a result, US companies are unable to deploy probes or take offensive action for fear of being prosecuted. Companies have hired foreign groups to deploy cyber “weapons” hoping to prevent future hacks. There is effort to create legislation to allow companies and civilians to act in their own defense without fear of prosecution.

Despite the amount of investment and innovation in Cyber Security technology, the threat landscape is widening and the risk of a data breach is increasing. Humans are the problem; our lack of understanding about Cyber risk coupled with our desire for convenience create opportunities for bad actors. The expanding role of the CISO is critical to engaging, educating, and helping senior executives effectively address cyber risk.  As one speaker put it - there are two kinds of companies; those that have been hacked, and those that don’t yet know they’ve been hacked.

In another blog post I’ll dive into more details about Cyber risk and its ties to resilience.

Resilience for NoSQL - Meet Datos IO

The growth of Cloud, social, and mobile technology is driving increased use of NoSQL databases like Apache Cassandra, mongoDB, Amazon DynamoDB and Google Bigtable. A recent study by ESG revealed that 56% of companies have NoSQL databases in production or deployed as part of a pilot/proof of concept. A similar study by Couchbase revealed that 90% of companies consider NoSQL important or critical to their business. Once the province of Internet innovators like Google, Amazon, Facebook, LinkedIn, and Twitter, the use of NoSQL databases is now widespread across every major industry. Modern applications create new demands that developers and DBAs must address such as:

  • Huge volumes of rapidly changing data including semi-structured, unstructured and polymorphic data.
  • Applications are now delivered as services that must be always on, accessible from mobile devices and scaled globally to millions of users.
  • Agile development methods where smaller teams work in sprints, iterate quickly and release code every few weeks, and in some cases daily.
  • Scale-out architectures using open source software, commodity servers and Cloud computing.

The choice for many of these applications is a distributed database (NoSQL) that stores portions of the data on multiple computers (nodes) within the network. These databases scale rapidly by adding nodes to a cluster making them effective for both Cloud and on-premise use. Their schema-less design allows data of different formats to be added accommodating the large amount of unstructured data (documents, audio, video, images) in use today. In-memory processing and the use of direct attached storage can provide fast processing of queries and support real time analytics.

The innovations in performance, cost, and agility found in NoSQL databases come at a cost. These open source databases lack the robust internal tools necessary for effective data protection and disaster recovery (DR). They are not as operationally mature as their relational database counterparts. It took many years for IBM, Oracle, and Microsoft to build robust data management and protection capabilities into their products. Low cost and high performance don’t matter very much if databases are offline or contain corrupted data.  Historical approaches and mainstream data protection solutions used widely today are not suitable for NoSQL data protection, disaster recovery and copy management. Given the need for always-on resilient systems, NoSQL applications will require better tools for data protection, compliance, and disaster recovery. The issues include:

  • NoSQL databases are eventually consistent and scale-out using locally attached storage (DAS). LUN-based snapshots do not produce application-consistent copies of the database suitable for restore or use in DR.
  • Stopping database writes to allow the system to catch up and produce an application-consistent snapshot is not practical.
  • Traditional backups of cluster nodes don’t describe the dependencies between VMs and their applications. A detailed understanding of application configurations is required to restore a NoSQL cluster from VM backups.
  • NoSQL nodes can be dynamically added (and removed) causing data to move between the nodes. Backups using traditional products don’t reflect the current state of NoSQL nodes making them unusable for DR.
  • Taking application-inconsistent snapshots and running a database repair (replay) process takes hours–days and is not practical for low RPO DR.

The main tools used today for NoSQL fall short of providing a robust, efficient solution for data protection and DR. Scripting is a common tool used in backup and recovery, however, it is labor intensive and prone to error as configurations constantly change. Replication is meant to address availability and is often not effective for data backup and DR. If the database becomes corrupted then bad data simply replicates itself across nodes. Keeping multiple replicas to support DR creates wasted storage space, management issues, and the need for some method to deduplicate the data.

Given the importance of NoSQL to mission critical applications it is not surprising new data protection and DR solutions have come to market. One such solution is from Datos IO (  Datos IO is driven by a vision to make data recoverable at scale for next-generation databases. Their product solves the main issues described above. It creates application-consistent point in time backups (versioning) across all nodes of common NoSQL databases. These consistent copies remove manual effort during restores or the need for replaying activity logs. Datos IO allows point in time versions to be created as frequently as every 15 minutes. Backups can be stored on-premise (in NFS format) or in public cloud.

Backup is only a part of the Datos IO solution. Their solution also provides near instantaneous data restore. They do this by storing NoSQL backup data in native format. Datos IO provides backup of the database at granular levels providing the technology needed to meet low RTO-RPO scenarios. Backups are incremental forever limiting the bandwidth required to keep NoSQL backups current and lowering operational costs. The Datos IO backup becomes the single point of truth about the state of a NoSQL database.

Disaster recovery can be accomplished with a single click and is configurable to meet a variety of needs. The database can be restored to the same, or an entirely different cluster, an important feature for Cloud-based DR. This capability also allows Datos IO to support DevOps use cases by rapidly creating test/dev nodes or data migrations across Cloud environments. Datos IO is also space efficient and performs semantic deduplication of its backups saving customers up to 70% on the cost of recovery storage.

Data protection and disaster recovery technologies follow new innovations to market. The explosive growth of NoSQL databases requires the operational maturity of their relational database counterparts. Datos IO is bringing the robust quality of traditional data protection products to modern Cloud, big data, distributed applications.

To learn more about data protection, Cloud, and trends in resilience subscribe to my blog here

The Case for Resilience

The IT analyst firm Gartner predicts that by 2020 there will be over 26 billion devices connected to the Internet. When your alarm clock goes off in the morning it will notify your coffee maker to begin brewing. Five million new devices are attached to the Internet every day streaming digital information to be captured, analyzed, and turned into useful information. Technology innovations such as Cloud computing, smartphones and new distributed database structures (e.g. NoSQL) have replaced legacy IT systems to provide rapid, scalable IT services. The pace of business is accelerating and our reliance on technology has never been greater. Speaking at a recent conference of business leaders in Davos, Switzerland John Chambers, former CEO of Cisco told an audience that “Forty percent of the companies in this room won't exist, in my opinion, in a meaningful way in 10 years unless they change dramatically”.

Today’s economy is being increasingly defined by digital technology. Companies have designed IT systems that connect them to their customers, suppliers, and partners in real time. Data from transactions and interactions is captured and analyzed resulting in faster decisions which reflect current market conditions. The Internet of Things (IoT) is allowing any device with an on-off switch to be connected to the Internet or each other. This includes cars, fitness trackers, coffee makers, jet engines, traffic lights, water systems, etc.

As companies race to integrate digital technology their reliance on IT is increasing. The loss of IT systems or applications is felt immediately by customers, suppliers, and business partners. In many cases customers can fire you with two clicks of a mouse. The cost of downtime is increasing. A study by IDC revealed that for the Fortune 1000, the average total cost of unplanned application downtime per year is $1.25 billion to $2.5 billion. The average cost of a critical application failure is $500,000 - $1M per hour.

Since the 1980’s companies have relied on a centralized IT function to protect information and recover systems if they fail. During the past 35 years the disaster recovery industry grew in response to the need for information protection. That industry is now at an inflection point. The role of centralized IT is changing rapidly with rise of Cloud computing and the proliferation of mobile devices. The ease and speed with which computing power can be purchased and new applications can be composed has complicated IT’s ability to provide reliability and ensure availability of distributed systems and data. Traditional methods for backing up data and providing disaster recovery are often not effective for cloud-native applications.

Consider that several years ago companies reported a tolerance for downtime of critical systems measured from 24-48 hours. A recent study by a leading IT industry analyst showed that 83% of companies now report maximum acceptable downtime of 4 hours or less and an additional 7% of companies reported that they had 0-1 hour or less of tolerance for downtime!

Meeting this demand will require a new way of thinking; resilience must be engineered into systems as opposed to the traditional method of bolting disaster recovery onto their backend. To meet this demand companies must shift their focus from planning to recover from failures to ensuring that systems keep running in the event of failures. This (not subtle) change will require new methods and skills and broader executive support from the C-suite and line of business leaders. It will also require tremendous new innovation and rethinking industry regulations that deal with the protection and preservation of digital records.

Today, over 90% of all corporate applications are being designed for Cloud and mobile devices. Cisco predicts that from 2014-2019 Cloud traffic will quadruple. The IoT, connected devices, and advanced analytics may make us all feel smarter, however,  they are also creating massive amounts of data which must be protected and new types of systems which must not fail. 90% of the data in the world today was created in the last 2 years. In the last 30 days people watched 4 billion hours of YouTube videos, created 30 billion new pieces of content on Facebook, and sent 12 billion tweets.

I have been asked by many to share my thoughts and opinions on the state of disaster recovery. This blog is an attempt to do just that - to share, to hear ideas, to challenge you to think about these issues and for readers to challenge my thinking. I hope you will join me in this new venture, provide me with feedback, and share your thoughts. Together, we will have meaningful discussions about a topic near to our hearts. Welcome!

The Role of Analytics in Disaster Recovery

This is part 1 of a multi-part series on the evolution of analytics in disaster recovery

It may seem odd to discuss the role of analytics in the field of disaster recovery. These disciplines appear to have little in common. Wikipedia describes Disaster Recovery (DR) as a set of policies and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Analytics is described as the discovery and communication of meaningful patterns in data.

In this series I'll discuss how analytics will improve resilience, lower risk and enhance business continuity. I'll explore how analytic DR services could come to market, which parties stand to benefit most, and some of the challenges that lie ahead. Part 1 will discuss how analytics will enhance disaster recovery (near term) and a vision in which analytics and automation are combined to improve risk management. 

The evolution of DR closely follows the development of IT, providing methods, products, and services to recover systems within required time frames and levels of data currency. From the early 1980’s until about 5 years ago disaster recovery mainly focused on the backup and recovery of physical computer systems. Given the need to recover physical systems to a like environment, vendors aggregated clients with like IT environments to provide shared DR services. These services made DR more affordable to many companies. This model of recovering physical systems worked well when acceptable downtime for most IT systems could be measured with a calendar.

Today, this is no longer true. Over 90% of all new applications are being developed for the Cloud. Cloud infrastructure, application characteristics and data structures are different. Cloud workloads are deployed in virtual environments, often spread across geographic boundaries. Many companies use combinations of private and public (hybrid) Clouds to run their applications. Cloud resources are dynamically added and removed based on capacity demand. And forget that calendar; downtime tolerance for most Cloud systems is minimal, measured with either a clock or stopwatch. 

By capturing and analyzing metadata stored in the Cloud stack companies will be able to gain deep insight into data protection and disaster recovery. Analytics can be applied across the IaaS/PaaS layer and across DR functions to help companies better understand data protection and DR functions such as backup, replication, DR testing, and system recovery. It should be noted that some tools used in physical DR setups capture data that can be analyzed to gain insight into discrete functions, e.g. the success rate of data backups. Cloud analytics will allow companies to gather information across the spectrum of data protection and DR functions to gain insight into how DR is working, and how Cloud resources can be optimized. Analytic data and algorithms will be used to make recommendations on how DR processes can be improved to produce better outcomes.

DR analytics will benefit companies and vendors alike. DRaaS vendors will use analytics to optimize DR capacity and costs across Cloud infrastructure. Metadata can be mined across customer segments to produce useful benchmark data helping customers improve DR and BC management.

The first wave of analytic implementations will be used to help companies improve data protection, monitor compliance, enhance DR testing, and design affordable resilience for critical IT systems. Analytics will also be used to help optimize DR Cloud capacity, costs, performance, and resource allocation.

But the use of analytics will not stop there. Cloud automation, inter-Cloud operability, IoT, and predictive analytics will be combined to usher in a new era that may change how DR is performed today. I define this new era as predictive risk management. Predictive analytics will examine a variety of threat and risk data (in real time) and determine if critical Cloud workloads are exposed to unacceptable levels of risk. These analytic models will be combined with Cloud automation to move workloads out of harms way. This model of resilience will change how companies manage risk and how DRaaS vendors provide service. In future blogs I will discuss how this model might evolve and some of the challenges involved in bringing predictive risk services to market.

Disaster recovery techniques and technologies have evolved greatly over the past 30 years. Analytics in DR and the rise of Cloud computing will bring significant benefits helping companies design truly resilient systems and optimize DR functions in ways never before possible.