What is a service-level agreement (SLA) ?
A Service-level agreement or commonly known as SLA. You might have heard of the term or seen it buried in an agreement you signed. But what exactly is it and how does it work in the real world?
Simply put, A service-level agreement (SLA) defines the level of service expected by a customer from a supplier, laying out the metrics by which that service is measured, and the remedies or penalties, if any, should the agreed-on service levels not be achieved. Usually, SLAs are between companies and external service providers, but they may also be between two departments within a company.
We like to think of an SLA as a guide that allows two parties to set proper expectation on when things will get done.
Now let’s take a look at some metrics to see how this applies and why the details are important.
Service availability: the amount of time the service is available for use. A telecom company's SLA, for example, may promise network availability of 99.999 percent (That works out to about five and a quarter minutes of downtime per year). In general, the more reliable a service is the better but that can come with a cost. We are used to many services just working all the time, but the reality is more complicated even major services such as Office 365 and G Suite have outages. Also, important to note that not all services need the same level of availability. For example, E-commerce operations typically have extremely aggressive SLAs (99.999 percent uptime is a not uncommon requirement for a site that generates millions of dollars an hour). But the same company might be fine if the fax is down for a few days.
Response time: this is a key metric that gets used a lot but can be useless in many cases. This generally means how quickly a service provider will response to a request. While a response can be reassuring, what people really need is the issue to be resolved not just an acknowledgment of the issue. Many services providers automate the reply to tickets so that they can have a very quick “Response time”.
Resolution time: this is usually the most important metric. General means how long after I submit my request will the request get taken care of. Usually this will depend on the time the issue was received and the priority of the issue. For example, an issue with a server that is mission critical would be flagged as High priority and require a faster resolution than a single user having an issue with Microsoft Word. One important point to keep in mind is that certain issues require multiple vendors to work together, and they may have different SLAs. Let’s say your Internet is not working you IT service provider determines the issue is with the ISP (Internet Service Provider) while your IT service provider’s SLA is 1 hour to resolve this type of outage. The ISP might have a 24 hour SLA. In this case your IT provider needs to wait for ISP to resolve the issue all they can do is follow-up.
In summary it’s important to understand what is mission critical for your business and make sure all your service providers can deliver the SLA you require.