Resiliency, Redundancy, Recovery
Resiliency is the ability of a server, network, storage system, or an entire data center, to recover quickly and continue operating even when there has been an equipment failure, power outage or other disruption.
Redundancy is a core concept when architecting and running a Data Center. Backup capabilities for critical Data Center functions are essential to the smooth management of a Data Center. If a redundant primary service, function, feature, electrical service, or telecommunications line goes down, your backup can easily pick up the slack, allowing maximum uptime until the failing item is fixed.
Disaster recovery (DR) is an area of infrastructure and security planning to minimize the impact of significant negative events within your organization. A disaster recovery plan is a structured document that instructs your staff on what to do in the event of significant, unplanned incidents.
Wells Fargo experienced failures in their on-line banking, mobile applications, and reportedly some/many ATM’s on Thursday and Friday this past week. Payroll deposits were not credited to accounts when they should have and other banking services were problematic as well.
The company spokesperson said that is was not a cybersecurity issue but rather one issue in one data center as a smoke alarm was triggered in a Minnesota Data Center. However implausible or even preposterous that explanation may seem, it is a good reminder for all enterprises to perform a detailed health check on their existing Data Center Resiliency, their Data Center Redundancy, and to certify their Recovery Time Objective (RTO) and their Recovery Point Objective (RPO) along with full testing of their Disaster Recovery Run Book.
Wells Fargo is a public company that works within the highly regulated financial services industry. They must conform with at least annual Disaster Recovery Tests. Beyond regulatory requirements their very business depends on safeguarding customer financial information and providing access to highly-accurate and real-time account balances.
Every company must understand their strength and weaknesses in Resiliency, Redundancy, and Recovery. After all, it is a Risk Management Decision of how much to invest in the “Three R’s” in order to reduce risk. For many companies, like Wells Fargo, the investment in the “Three R’s” must be such that risk is dramatically reduced. That is why it seems impossible to believe that one event in one data center could prevent the access and delivery of so many key services. Clearly there was a flaw in their architecture or processes.
This is a call to action for every company to consider hiring a third party that is qualified in Enterprise Architecture and Data Protection/Backup/Recovery Technologies to perform an independent health check of their “Three R’s”. The health check would review the Enterprise Architecture to identify strengths and weaknesses, propose alternatives for specific actions that would further reduce risk, validate the Data Protection and Backup processes and technologies, and certify a test of Recovery Capabilities that would calculate an accurate and up-to-date RPO and RTO.
I guarantee you that Wells Fargo had to believe that their Architecture and Processes had made it impossible to experience 24+ hours of downtime. They were wrong.