The 100% SLA

A Simpler Way to Report Delivery

 

Service Availability is usually expressed by comparing the amount of actual service time achieved against the maximum possible service time, usually as a percentage achievement value. A target service level is then set that the service provider is intended to deliver. The actual service level is then reported against that target.. Usually this target is set close to 100% but aims to allow for the unforeseen so usually ends up as 99.5% or something similar. Penalties of various kinds are defined for any failures to achieve that notional service performance.

This can get complicated. Quite often there are varying service times and varying targets across different departments. If you don't report weekly, ignoring the calendar, you need to allow for the variations in months; you also need to cover bank holidays and the like. Using this traditional  model it is also difficult and potentially misleading to aggregate the various targets to report on overall Service Delivery performance by the supplier. Finally, reporting this in a form the users will understand requires a fair amount of data gathering, analysis and report writing. The end result, sadly, is often a complex report that rarely does anything to drive improvements in delivery which is, after, the whole point of the exercise.

The 100% SLA aims to overcome these difficulties

Firstly, it must be remembered that the customer is not actually looking for anything other than 100% service availability; they want the service to be reliably available whenever they want to use it. They are also disinterested in your Service Window and much prefer the services to be there either during office hours, which vary between departments (and which, incidentally, create their own problems defining useable SLAs), or 24x7. In effect, the customers requirement is for a Service Level of 100%

So the start point is to use that as the target.

Clearly, no matter how well managed, this is unachievable in practice. Incidents will occur, changes will need to be made, backups and maintenance will need to be performed. But we can consider these events under two headings, planned and unplanned. That gives us the method for simplified reporting of performance against the 100% target.

Planned downtime is exactly that; it has been agreed with the customer, who is not expecting the service to be available for that timeframe. Therefore we can report planned downtime as the supplier having delivered less than 100% availability, but with the user fully aware in advance that this is going to occur and without breaching the SLA.

Unplanned downtime is defined as any service interruption that has not been agreed with the customer, whether internally or externally caused. Again you focus on the actual incident in the service review, but because the service was unexpectedly unavailable to the users, this is a breach of the SLA and subject to any penalties you may have agreed. Note though that there is no real value in debating how long the service was not there, it is far more important to explore the cause of the incident and the remedial actions required

In terms of the service review, we no longer need to calculate the percentage delivery actually achieved for any given department or group of departments.  The review can instead focus on the incidents that caused a loss of service availability. More importantly, the discussion can rightly focus on the events causing the unplanned downtime and debate any remedial actions required to prevent recurrence. In other words, Problem Management can be triggered at the service review as well as by the more traditional analysis of individual Incidents, and the user is more closely involved in the process.

More importantly, the same reporting concept can be applied to individual customer departments, to the whole enterprise, or any point in between. An added benefit is that the overarching SLA can be a lot simpler, focussing on the conditions of service delivery rather than spending time defining service windows and penalty regimes.

The end result is a simple to manage, transparent and totally consistent reporting mechanism that aligns to the customer's actual expectation and that has a direct impact on service improvements. Which is why we do it in the first place.