The Case for Resilience

The IT analyst firm Gartner predicts that by 2020 there will be over 26 billion devices connected to the Internet. When your alarm clock goes off in the morning it will notify your coffee maker to begin brewing. Five million new devices are attached to the Internet every day streaming digital information to be captured, analyzed, and turned into useful information. Technology innovations such as Cloud computing, smartphones and new distributed database structures (e.g. NoSQL) have replaced legacy IT systems to provide rapid, scalable IT services. The pace of business is accelerating and our reliance on technology has never been greater. Speaking at a recent conference of business leaders in Davos, Switzerland John Chambers, former CEO of Cisco told an audience that “Forty percent of the companies in this room won't exist, in my opinion, in a meaningful way in 10 years unless they change dramatically”.

Today’s economy is being increasingly defined by digital technology. Companies have designed IT systems that connect them to their customers, suppliers, and partners in real time. Data from transactions and interactions is captured and analyzed resulting in faster decisions which reflect current market conditions. The Internet of Things (IoT) is allowing any device with an on-off switch to be connected to the Internet or each other. This includes cars, fitness trackers, coffee makers, jet engines, traffic lights, water systems, etc.

As companies race to integrate digital technology their reliance on IT is increasing. The loss of IT systems or applications is felt immediately by customers, suppliers, and business partners. In many cases customers can fire you with two clicks of a mouse. The cost of downtime is increasing. A study by IDC revealed that for the Fortune 1000, the average total cost of unplanned application downtime per year is $1.25 billion to $2.5 billion. The average cost of a critical application failure is $500,000 - $1M per hour.

Since the 1980’s companies have relied on a centralized IT function to protect information and recover systems if they fail. During the past 35 years the disaster recovery industry grew in response to the need for information protection. That industry is now at an inflection point. The role of centralized IT is changing rapidly with rise of Cloud computing and the proliferation of mobile devices. The ease and speed with which computing power can be purchased and new applications can be composed has complicated IT’s ability to provide reliability and ensure availability of distributed systems and data. Traditional methods for backing up data and providing disaster recovery are often not effective for cloud-native applications.

Consider that several years ago companies reported a tolerance for downtime of critical systems measured from 24-48 hours. A recent study by a leading IT industry analyst showed that 83% of companies now report maximum acceptable downtime of 4 hours or less and an additional 7% of companies reported that they had 0-1 hour or less of tolerance for downtime!

Meeting this demand will require a new way of thinking; resilience must be engineered into systems as opposed to the traditional method of bolting disaster recovery onto their backend. To meet this demand companies must shift their focus from planning to recover from failures to ensuring that systems keep running in the event of failures. This (not subtle) change will require new methods and skills and broader executive support from the C-suite and line of business leaders. It will also require tremendous new innovation and rethinking industry regulations that deal with the protection and preservation of digital records.

Today, over 90% of all corporate applications are being designed for Cloud and mobile devices. Cisco predicts that from 2014-2019 Cloud traffic will quadruple. The IoT, connected devices, and advanced analytics may make us all feel smarter, however,  they are also creating massive amounts of data which must be protected and new types of systems which must not fail. 90% of the data in the world today was created in the last 2 years. In the last 30 days people watched 4 billion hours of YouTube videos, created 30 billion new pieces of content on Facebook, and sent 12 billion tweets.

I have been asked by many to share my thoughts and opinions on the state of disaster recovery. This blog is an attempt to do just that - to share, to hear ideas, to challenge you to think about these issues and for readers to challenge my thinking. I hope you will join me in this new venture, provide me with feedback, and share your thoughts. Together, we will have meaningful discussions about a topic near to our hearts. Welcome!