Utility Computing SLAs
When comparing Utility Computing to “traditional computing” (for want of a better term), a question that often comes up is around service level agreements, and service level management in general. ITIL (and other IT process guidelines) have ensured that most IT professionals have understood the value of Service Levels, and have used them as a way to broker an agreement between themselves and the rest of the business. The cynical observer would see this just as a way for IT to cover it’s ass, but the smart organizations with SLAs in place actually use them to the benefit of both the business (they know what to expect) and the IT department (they know what is expected of them). SLAs are usually performance and/or availability based (although there are plenty of other ways that service levels can be measured). A performance-based SLA will often state something like:
“A priority one call placed to the Service Desk between the hours of 8am and 6pm will be acknowledged within 5 minutes, and will have a support engineer working on it within 15 minutes”
… while an availability-based SLA might look like this:
“The Oracle Financials application will be available 99% of the time during normal business hours and 99.99% of the time during the end-of-quarter period”
So, when you move to a UC model, what are the SLAs? Who maintains them? What are the consequences of them not being met? If many different businesses are all getting their computing utility from the one resource, how are different SLAs maintained? So many questions … but the answer is pretty simple.
Since a UC provider is just about providing business applications as a utility, there can only be one real measurement – application availability. No other measurement matters. The business (consumer) of a UC service doesn’t care about servers, databases or networks … they just need their business application available when they need it. They don’t even care about how long it takes someone to respond to their call or provide a workaround – because they are just determining factors of application availability. If the application is available as it should be, then everything else will fall into place.
So, if there’s only one measurement … what should it be? As one of the pioneers in the Utility Computing space, Google obviously would have put a lot of thought into this … and they came up with 99.9% availability as their SLA commitment. This means that their applications can be unavailable for 8.7 hours per year – just over one business day in a whole year. If they exceed that, they start giving free service (as defined here) to each of their 1.75 million business customers! 99.9% (often called “three-nines”) seems like a pretty realistic expectation for a UC customer … considering this is probably FAR less downtime than they experience with their self-maintained applications.
So there’s one more thing that UC makes easier for it’s consumers. No complex SLAs to wade through; no IT department trying to use those SLAs to defend itself from the business it is supposed to be supporting; just a simple, measurable, actionable number that actually means something to the business.