2011: Outage Checklist for Cloud Providers

May 3 2011 | By | in Unified Monitoring

While Cloud computing continues to recover from Amazon’s widely publicized outage, the fact remains that outages have happened and will continue to happen.  Look no further than VMware’s first two public outages of Cloud Foundry, albeit a beta product.  With more and more critical business services moving to the cloud, see SAP’s announcement with HP, what can be done to mitigate risk and restore the luster of the Cloud itself?

Checklist for the Cloud providers:

  • Communication
    In a world of instant communication and analysis, it’s rather amazing the lack of communication and insight we received during the Amazon outage.  In Amazon’s case, customers received vague status updates with little context and there wasn’t even a single blog post until 4 days after the outage. Why?  For goodness sakes, a website with an install bar reading “4% restore completed” would have been better than being left in the dark. 

  • Change Controls Please
    Cloud providers have a multitude of customers with different needs and levels of expertise.  When working on any system in the Cloud, change controls are a must.  In the case of VMware’s outage, a single engineer working on a “playbook” to better respond to future outages accidentally “touched a keyboard” and took out the entire network infrastructure. 

  • Post Mortem
    After the storm is passed; Cloud providers must give their customers detailed information on the outage itself and what measures/controls are being put in place to ensure it won’t happen again.  Customers need to understand how to protect themselves from future issues and this requires detailed technical analysis.  In the end, fess up, take your lumps, and restore your customer’s faith in the Cloud provider’s ability to execute and maintain a Cloud. 

  • Transparency
    Understanding that the Cloud service provider space is heating up and competition is getting fierce, the need for obfuscating your “secret sauce” is understandable.  However, Cloud providers must resist the black box mentality of the past and offer a level of transparency to their customers.  This includes providing the proper API’s or monitoring capabilities to manage the entire heterogeneous infrastructure.  As Ronald Reagan said, “Trust, but verify.” 

  • If You Love Someone, Set Them Free…
    Building a Cloud that acts like the electronic equivalent of a “cockroach motel”, you can check in but you can’t check out, isn’t smart business.  Lock your customer’s in with your vision, customer service, reliability, etc. and provide superior integration to other public and private Cloud offerings.  Interoperability isn’t a curse; it’s a blessing that provides your customers with the needed peace of mind to sleep better knowing they can recover from catastrophic issues.  Sure, you’ll lose some customers but you’ll win even more.

Finally, let’s end the shenanigans and admit that Cloud computing is difficult, complex, and we don’t have all the answers.  Nearly every aspect of the modern datacenter is in a state of change and as agility increases so too does the complexity.  Amazon, VMware, Microsoft, and other Cloud providers are wonderful companies that are on the cutting edge of a technological revolution.  As Cloud computing is in its infancy, we should expect growing pains but we should demand our Cloud providers follow our checklist.


[adrotate block="1"]