I'll try not get too technical on this but bear with me.
So today morning starting about 9 am est Microsft Azure customers (including my company) started to experience issues with services depending on it. The issue is so bad that Microsoft own support team could no
Basically we set up something Single Sign-on with many websites, basically, you don't need yet another username/password to use these. Just use the company provided one. Very convenient.
Apparently the root of the issue with hard down datacenter in Texas due to failed cooling (rumor has it a lightning strike may have damaged the cooling system).
So it's about 1:30 pm and not all systems are not 100% back online. Their own status page is still down. Question is how come a single data center could cause such widespread issues for customers across the entire US and possibly more? Some software architect should be fired for this. If was in change of cloud strategy Azure would slide down a few notches.
It's not about having an outage. It's about eliminating a single point of failure in every aspect of the design of every system. It's about transparency and quickly fixing the issue.
sources:
https://twitter.com/AzureSupport
http://downdetector.com/status/windows-azure
https://azure.microsoft.com/en-us/status/