At a recent IT trade show where I was on the panel discussing Cloud Providers, there was an interesting question from the audience about how to hold their Cloud provider responsible for slow performance of their applications. How to avoid the next catastrophic outage and ensure they had SLAs to penalize the cloud provider for sloppy application performance.
Reflecting on this a bit further, the answer might not be as simple as it seems. Is your Cloud provider really responsible for your application’s performance in their Cloud? Maybe its your network or Internet provider? Perhaps the DNS service provided by your friendly domain registrar is sloppy? Or could it be the slow CRM API provided by yet another SaaS provider that your application interfaces with? Current generation applications are distributed and multi-layered and the end services provided by these depends on an even more distributed set of applications.
Trying to create an SLA with so many inter-dependent components and so many providers is not trivial. Worse – there is no obligation today for the different providers to share any data on their IaaS or PaaS performance, even if this data is available to them internally.
While selecting a Cloud provider of any kind, you will need to start factoring in operational transparency and availability of their service performance metrics via APIs. The performance of their infrastructure is not as relevant as performance of the service they are providing – they might have redundancy and other design elements that might not impact their service even if their infrastructure fails, so getting their infrastructure performance metrics might not be relevant. You will also need a monitoring platform that has open API’s to aggregate the performance data from different Cloud providers and can give you a composite metric mapping into the performance of your service. And finally, your monitoring platform has to be able to provide a service oriented view of these performance metrics and not just traditional metrics like CPU and memory.
As more enterprises move to Cloud based services and infrastructure, their desire to work with a single vendor will force them to gravitate towards “cloud service aggregators” – a single vendor aggregating services from various cloud providers. However, enterprises will demand SLAs from the aggregate providers, and this will require some way to identify the responsible partner for failed SLAs or outages. There will be a need to get automated performance and SLA metrics from different downstream partners and correlate this data to provide aggregate SLAs for the enterprise. This requires transparency in operations, open APIs and uniform SLA measurements, and even though not prevalent today, this will become a necessity in the near future.