I usually don't spend time talking about outages. We all experience them, either as users or providers. We make mistakes, hopefully they're correctable and we can get back about our day. But the dance taken during an outage has been well choreographed by now: The provider acknowledges an issue, users grumble, provider gets service back up and running, and finally sends out explanation of what happend, why, and steps taken to make sure it doesn't happen again.
Earlier this week Terremark's vCloud Express (VCE) service had some sort of outage. Latency and traceroute to my nodes were normal, no load on the server, no memory/cpu/disk utilization to speak of. But any interaction at all was painfully slow.Usually as a user, I wait 5-10 minutes to see if it's a hiccup. Then I hit VCE's web site - in order of priority, I was looking for
I found none of those. Their support link pointed me towards a forum, so I put up a post asking what the story was. In the resulting thread with their support department, I'm told that the forum is, indeed, how they interact with their customers.
Looking over the forum as I write this post several days later, I see a post from their Director of Product Management white-washing the issue, claiming that the infrastructure was never fully down, a small subset of customers were effected, blah blah blah.
No matter how you look at this, this is a flubbed response:
Terremark, you got a $20MM investment from VMware for this...even if it was all software licenses, hire somebody other than a high-schooler to run it if you want any type of real success.
I've moved my nodes from Terremark, I've taken it off my ilst of cloud services I recommend to cilents, and in general can't really imagine using their service again unless I see a massive improvement in how things are run.