The recent outage of Microsoft Azure, raises some interesting questions. This isn’t the first big vendor cloud service outage, Amazon AWS and others have had their moments. Of course this had lead to the recommendation that to ensure your service has continuity that a DR arrangement with a different provider be in place. This works with Platform as a Service. But what we have been seeing is move from PaaS up the value stack to vendors offering their own rich ecosystem to build on – from Amazon SQS to Oracle’s latest announcement Oracle Internet of Things platform.

These solutions, can be built with open standards etc but ultimately when used create vendor lock-in as no one else will have an equivalent capability with the same APIs. So how do you mitigate these outages, or even the risk of such an outage? Well Oracle do claim you can actually run all their cloud capabilities on premise. But is that practical? As cloud is adopted organisations are going to wind back their hardware capital outlay, after all that is one of the value points of cloud.

So where does that leave us? Accepting the risk and trying to mitigate the risks in our own commercial agreements? What about the fact in an IoT solution where you’re event stream processing and using period on period comparisons to set thresholds which means the likely data loss from an outage will have both ‘echos’ as you period analysis has holes in data plus false thresholds as the data hole will skew the data when that period is being used for period comparison.

Difficult questions with no obvious answers, other than you mitigate you things commercially and push Microsoft and others to make things more robust – time for Netflix Chaos monkey?