How service level agreements can ensure real uptime
Large telcos and networking vendors have long used service level agreements (SLAs) to overpromise and underdeliver, leaving customers with serious service issues and substandard connectivity.
Vendors are, of course, obliged to fulfil any agreement they make with a customer, but the telecoms industry has often left its clients befuddled by the terms of the agreement. Providers can write contracts to ensure they evade accountability, leaving customers without the connectivity they believed they were paying for, says Ashwath Nagaraj, chief technology officer of Aryaka Networks.
With cloud services becoming more fundamental for businesses in the face of the Covid-19 pandemic, it is important to re-evaluate the SLA. The challenge now is to remodel the agreements to serve the customers in this new environment.
Barely getting by
In our digital age, every business relies on staying connected. Manufacturing, healthcare, financial services indeed, any sector all need reliable network infrastructure to operate.
At a time when many companies have sped up their digital transformation initiatives, this is especially true. As they transform their businesses, customers want assurance that their network won’t collapse under the strain. Time that should be focused on transforming the business should not be wasted addressing network issues.
What is the role of SLAs in this?
An SLA is a contracted promise of performance from the provider. The customer should be able to make a calculation of the business impact of the delivered SLA. The SLA also tells the customer the penalties on the provider for failure to meet the uptime guarantee. Should the vendor renege on its promises the agreement enables the customer to claim compensation.
The customer is essentially making a trade-off paying more for a higher SLA service to minimise downtime when downtime has a high business impact. A good SLA will leave little in doubt; a bad one will use jargon-filled descriptions to obscure its meaning.
Four important things to look for in the SLA:
- The promise – this must be clear, not mind-boggling
- The measurement, and the ability to verify compliance
- The penalties on failure to meet the promise
- The ability to collect the penalty (a.k.a. service credit)
Most SLAs promise an “uptime”. The promise also includes the definition of downtime. Vendors will promise a number of ‘nines’, with five nines equating to 99.999% uptime (which Is about 6 minutes a year, or 30 seconds a month), four nines equating to a 99.99% uptime (which is about 60 minutes a year, or 5 minutes a month), and so on.
Five nines is a common headline figure for telco SLAs. The trouble is that when you check the fine print a wider range of numbers is cited. Other promises are latency, jitter, etc. All need to be clear in definition.
How does the vendor measure downtime? Does the measurement have the sampling resolution to catch violation of the promised uptime? Does it really correspond to the network downtime you experience? An example of a misleading SLA would be one which claims 99.999% uptime, measured monthly (that’s 30 seconds downtime per month), but actually checks the network every minute, and counts downtime in minutes.
A network can be down for 35 seconds of every minute, a 0% uptime (you never had 30 seconds of continuous uptime) while the vendor could claim a 100% uptime (they saw no minute where their probe failed). Look for an SLA that specifies measurement granular enough to show real uptime rather than a meaningless figure to flatter the vendor.
Backing up the promise must be a meaningful penalty for failure. Connectivity failures significantly hurt productivity, hindering a business’s ability to perform.
This is where many vendors contrive to hide their inability to deliver to their promise. For example – one vendor claims a 99.999% uptime but gives a single 5% penalty for a network availability of between 97.9% and 99.999%. So if you have 15 hours of downtime per month that’s 1800 times the promised downtime you get a 5% credit. If you paid a lot of money for a 99.999% uptime because the 30s downtime was costly to you, 15 hours downtime is probably going to kill your business. This is not a five nines service it’s less than a two nines service.
Some providers offer as little as 2% of the daily circuit fees, a negligible amount for providers, leaving them with most of a customer’s fee whether or not they perform. Demanding a higher percentage is one obvious strategy for countering these issues. At a minimum it is advisable to put a clause in the agreement allowing you to break the contract if the provider breaches the SLA too often.
Ensure that the vendor maintains an accurate record of historical and current SLA measurement, and that that data is available to you as a customer. Many vendors make it the customer’s responsibility to continuously monitor the network and to prove that there was an outage weeks after the fact.
Of course, this won’t happen with service providers that don’t measure uptime themselves or are embarrassed about the figures.
The author is Ashwath Nagaraj chief technology officer at Aryaka Networks.