iCore Ltd

Keeping businesses moving …

Outsourcing & Clock Starts: is your SLA being cheated?

As outsourcing best practices and challenges will be the theme for our upcoming blog posts, this week we focus on a challenge that many of you will have faced or could currently be experiencing – clock starts. You may be finding that SLA reporting from your outsourcer does not match the reality of your users’ experience of availability, and this could be the reason why.

What do we mean by Clock Starts?

When we talk about ‘clock starts’, we are talking about two different things:

1) When the SLA clock `actually` starts. Is it when the ticket is logged or, retrospectively, when the incident should first have been identified?

2) What severity measurement applies and when does it start for an incident? Does it start when the incident is first logged or when the severity of the incident changes?

Both are important and can significantly impact the way that your SLAs are reported. We will discuss the meaning of both types of ‘clock starts’, why they are important, and what you can do to address them. This is especially important when measuring the impact of major incidents, but in principle can be applied to all incidents.

1) When does an incident start?

When you are reviewing an incident, especially the impact of a service outage, should you take the start time of the incident from when it was first logged, or through event monitoring when the incident first occurred?

Your outsourced partner will usually articulate that the clock should start from when the ticket was logged, as this was the first time that the business was aware it was impacted by the incident.  Additionally, this can easily be measured directly from the service management tool.

But the incident probably occurred before then…………

To incentivise your outsourced service provider to supply a proactive service, they should be aware of the incident before it is logged by the business. To do this, they must be held accountable for when the incident should have been identified through monitoring, rather than when it was logged.

Take batch processing as an example. Your business would not be aware that a batch process had failed until after the event, at which time a ticket will be logged for the incident to be investigated. This means that the clock would start for the incident much later, and the measure of the SLA would therefore be shorter than the actual length of the impact to your business.

In this case, a monitoring capability needs to be in place to identify when the batch process failed, and the event should have raised the failure into the service management tool, thus starting the clock.  When a user logs a ticket relating to the batch process failure, the incident can then be related to the ticket raised from the event, and the clock start would be taken from the original ticket.

This isn’t practical for every incident, but when reviewing Major Incidents and their impact on the SLA, it is important that the principle for when the clock is measured from is set and reflects the time that an incident lasts.

2) What SLA measure applies and when does it start?

We recommend that a single, agreed definition of the SLA measure that applies for an incident is adhered to by all key parties. That way there is no ambiguity, standards are aligned, and service can be restored more effectively.

Let’s look at a possible scenario……

A user logs a ticket with the Service Desk relating to internet connectivity. This user is the only one in the office, so it only impacts them. As it is an isolated issue, the Service Desk agent categorises it as low severity. However, as more people come into the office and start to log tickets, it is re-prioritised as a high severity incident. Reports then start to come through from other sites that they also can’t access the internet.  At this point it is realised that the issue is impacting most users, and is therefore raised as a critical severity incident.

Once the incident is resolved, the severity is lowered until confirmation comes in that all sites now have service again, at which point the incident is closed.

So, the question is, when does the measure period start for the Critical Severity incident?

1) When the ticket is change to Critical Severity, or

2) When the ticket is first logged.

Again, your outsourced partner will argue that it wasn’t a critical incident until it was identified as being so. Yet it was their failure to identify the extent of the incident that meant that there was a delay in identifying the criticality correctly.

So when does the Clock Start?

Ultimately, this is all about the service being provided to the business, and how to measure and incentivise the provider of the service to identify an incident as quickly as possible, and with the correct understanding of impact.

This can only be done if the clock on an incident starts as early as possible, and the highest severity of the incident is applied across the life of the incident from a measurement perspective.

In practical terms this isn’t always possible when the incident ticket is active, however when the incident is retrospectively reviewed, there is no reason why the clock on the incident can’t be updated to reflect the actual time.

Clock Starts and Stops

Don’t let clock starts and stops be used as a way of a service provider avoiding responsibility for an SLA breach. It is an opportunity for you to use them as a way of incentivising them to provide a better level of service.

If you start an SLA measure as soon as possible and only stop the clock at times that are properly agreed with the business, then the service provider must be incentivised to find ways of improving the levels of service.

This shouldn’t be about giving the service provider ways of avoiding or evading responsibility; it should be about them taking responsibility and finding ways to improve service.

If you would like to discuss how iCore can help you achieve your sourcing goals, then contact us on +44 (0) 203 8211252 or email us at info@icore-ltd.com