How Much Do You Really Know About The Life of Your Cloud Resources?

You finally developed a Cloud strategy, shopped for an IaaS/PaaS that provides the services you need, with the compliance levels you require and at the most competitive prices you can find. Off to the races. You can now focus on the core of your business and worry less about IT, THE Cloud will carry you gently to success through all its benefits: agility, scalability, manageability (that being the responsibility of someone else) and sometimes lower costs. We all heard what a beautiful view one gets from up there.

With all in place, you keep busy signing up customers and keeping the existing ones happy. Then one day you get this email:

Date 12/8/2016 18:50PM

Subject: Host Node Reboot

Our monitoring system indicated an issue with the hardware node hosting the instances listed in this email. Our engineering team has investigated the issue and initiated a restart of the host node in question.

Please note: While this event rebooted the instances listed in this email, we expect no impact on data and/or configurations.

Thank you,

 

Your first thought: “Whew, a problem was averted without customers noticing it!”

Your second thought: “This Cloud stuff really works, my old IT would have taken some time to sort the issue out once they heard about it from a customer.”

Your third thought might highlight a personality trait:

  • Option 1: “Do not want to know anything else about this, we move on. The Ostrich approach to things I don’t want to deal with”
  • Option 2: “So was that really all? Did the event truly go unnoticed? Trust but verify.”

This was a real message receive by one of our customers. Here is what v6Sonar told them about the same event. An “offline” notification was sent before the Cloud provider email, when the incident actually started.

And then the “recovery” notification 10 minutes later.

This is the timeline of the incident:

Maybe there is no impact on the configuration and data for this host (something not always the case) but how about the impact on the services delivered?

All of a sudden that unexpected yet “feel good” notification seems to raise more concerns than deliver piece of mind. Now other thoughts rush through your head:

Your first thought: How many customers were affected and are now considering alternatives without even telling me they are upset?

Your second thought: How I avoid such incidents from happening in the future and how much will it cost me? Is this the right Cloud provider for me?

Your third thought: Whoever my Cloud provider will be, I need to see for myself, in a simple, inexpensive way, how well is my cloud based service infrastructure doing. Trust but verify.