SLA and SLO fundamentals and how to calculate SLA (2024)

SLA aka Service-Level Agreement is an agreement you make with your clients/users, which is a measured metric that can be time-based or aggregate-based.

We can calculate the tolerable duration of downtime to reach a given number of nines of availability, using the following formula:

SLA and SLO fundamentals and how to calculate SLA (2)

For example, a web application with an availability of 99.95% can be down for up to 4.38 hours max in a year.

The following table explains the maximum duration of tolerated downtime per year/month/week/day/hour.

SLA and SLO fundamentals and how to calculate SLA (3)

Let’s imagine a Backend with an API, that serves 250M requests per day, and an SLA Aggregate-based of 99.99% which cannot exceed more than 25k errors per day.

SLA and SLO fundamentals and how to calculate SLA (4)

Note that an error is counted if it’s an internal server error HTTP 5XX

SLO aka Service-Level Objective is an SLA agreement that defines the expectations and goals that the company should achieve during a defined period of time.

Example:

Imagine a company with a current Uptime-Based SLA is 95% which means they have a tolerated maximum of 1.5days of downtime per month. (Which is very bad)

We will define our next objectives so the SLA should meet 99%.

For that, we have to take several actions, here are some examples:

  • Review the hardware infrastructure
  • Add preventive monitoring
  • Find the root causes of the downtimes
  • Review the network configurations
  • Review for any single point of failure

In this example, we are going to collect Nginx Logs and ship them to Elasticsearch in order to visualize them in Kibana to create an SLA Dashboard

SLA and SLO fundamentals and how to calculate SLA (5)

Why vector as logs collector? because it’s a lightweight, ultra-fast tool for building observability pipelines, where we will collect, transform, and route the Nginx logs to Elasticsearch.

Step 1: Configure your Nginx to provide more detailed logs
Edit /etc/nginx/nginx.confg in http and block add the following :

log_format apm '"$time_local" client=$remote_addr '
'method=$request_method request="$request" '
'request_length=$request_length '
'status=$status bytes_sent=$bytes_sent '
'body_bytes_sent=$body_bytes_sent '
'referer=$http_referer '
'user_agent="$http_user_agent" '
'upstream_addr=$upstream_addr '
'upstream_status=$upstream_status '
'request_time=$request_time '
'upstream_response_time=$upstream_response_time '
'upstream_connect_time=$upstream_connect_time '
'upstream_header_time=$upstream_header_time';

On your Nginx server definition edit the log configuration as follow:

server {
....
access_log /var/log/nginx/access.log apm;
error_log /var/log/nginx/error.log;
....
}

Step 2: Install Vector to collect, parse and ship your Nginx Logs.

curl --proto '=https' --tlsv1.2 -sSf https://sh.vector.dev | bash

Click for more installation documentation

Using VRL (Vector Remap Langage) to parse the Nginx Logs:

[sources.nginx_source]
type = "file"
ignore_older_secs = 600
include = [ "/var/log/nginx/error.log" ]
read_from = "beginning"
max_line_bytes = 102_400
max_read_bytes = 2_048
[transforms.modify_logs]
type = "remap"
inputs = ["nginx_source"]
#parse each line with VRL regex
source = """
. = parse_regex!(.message, r'^\"(?P<timestamp>.*)\" client=(?P<client>.*) method=(?P<method>.*) request=\"(?P<method_type>.*) (?P<request_path>.*) (?P<http_version>.*)\" request_length=(?P<request_length>.*) status=(?P<status>.*) bytes_sent=(?P<bytes_sent>.*) body_bytes_sent=(?P<body_bytes_sent>.*) referer=(?P<referer>.*) user_agent=\"(?P<user_agent>.*)\" upstream_addr=(?P<upstream_addr>.*) upstream_status=(?P<upstream_status>.*) request_time=(?P<request_time>.*) upstream_response_time=(?P<upstream_response_time>.*) upstream_connect_time=(?P<upstream_connect_time>.*) upstream_header_time=(?P<upstream_header_time>.*)$')
#covnert metrics to proper types
.timestamp = to_timestamp!(.timestamp)
.request_length = to_int!(.request_length)
.status = to_int!(.status)
.bytes_sent = to_int!(.bytes_sent)
.body_bytes_sent = to_int!(.body_bytes_sent)
.upstream_status = to_int!(.upstream_status)
.request_time = to_float!(.request_time)
.upstream_response_time = to_float!(.upstream_response_time)
.upstream_connect_time = to_float!(.upstream_connect_time)
.upstream_header_time = to_float!(.upstream_header_time)
.host = "YOUR_HOSTNAME_HERE"
"""#for debug mode only - output to console
#[sinks.debug_sink]
#type = "console"
#inputs = ["modify_logs"]
#target = "stdout"
#encoding = "json"
#OUPUT 1 : Elasticsearch
[sinks.send_to_elastic]
type = "elasticsearch"
inputs = [ "modify" ]
endpoint = "https://elasticsearch-endpoint.com"
index = "websitelogs-%F"
mode = "bulk"
auth.user="xxxxxxxxxxxxxxxxxxxxxxx"
auth.password="xxxxxxxxxxxxxxxxxxx"
auth.strategy="basic"
systemctl restart vector.service

Check out the logs of the vector systemd unit :

journalctl -u vector.service

it should look like the following :

SLA and SLO fundamentals and how to calculate SLA (6)

For demo purposes, I have deployed an Elasticsearch instance in the elastic cloud

In Kibana, we need to create an Index Pattern to read from Elasticsearch indexes.

SLA and SLO fundamentals and how to calculate SLA (7)

Note: The timestamp should be the one from logs, not the Time of Ingest(default one by Elasticsearch).

Our logs are successfully being shipped into Elasticsearch, so the next step is to create a dashboard in Kibana with some graphs in order to calculate our current SLA.

Let’s discover the logs

SLA and SLO fundamentals and how to calculate SLA (8)

Well, it looks like we got around 8300 for the hit last 15 minutes.

and every hit log is well prepared and looks as follow :

SLA and SLO fundamentals and how to calculate SLA (9)

Let’s create our first graph, the count of Hits:

SLA and SLO fundamentals and how to calculate SLA (10)
SLA and SLO fundamentals and how to calculate SLA (11)

So as we agreed on the formula above, we will consider only 5XX as failed requests, and the rest of the status codes are successful (4XX are considered as client behavior).

so Our SLA Aggregate-based during the last 15mins is 87.04% and during the last 30 days is as follow (98.56%):

SLA and SLO fundamentals and how to calculate SLA (12)

This gap of difference between the last 15 mins and the last 30 days leads us to understand that an incident is going on.

Our Dashboard will start to look like this :

SLA and SLO fundamentals and how to calculate SLA (13)

Metrics are so powerful than Logs, you can use them to get real-time dashboards.
In this tutorial, we used Nginx logs and Elasticsearch as a document-store database.
But for better performance and real-time dashboards, I highly recommend using metrics instead of logs.

For that, we can use time-series databases such as InfluxDB or Warp10 to store our metrics and use Grafana as a visualization tool.

On vector to configure the output sink to InfluxDB by adding the following bloc:

#OUPUT 2 : InfluxDB Database
[sinks.influxdb_output]
type = "influxdb_logs"
inputs = [ "modify_logs" ]
bucket = "vector-bucket"
consistency = "any"
database = "xxxxxxxxxxx"
endpoint = "https://your-endpoint.com"
password = "your-password-here"
username = "username"
batch.max_events=1000
batch.timeout_secs=60
namespace = "service"

then restart your vector service

In this article, we discovered what is SLA and SLO and how SLA Aggregate-based can be calculated from Nginx Log.
I will cover in the next article how to calculate SLA time-based, and how to improve the SLA by finding the root causes.

SLA and SLO fundamentals and how to calculate SLA (2024)

FAQs

How do you calculate SLA? ›

SLA formula: (365 - {downtime days}) / 365 * 100 = SLA where 365 is 365 days which translates to yearly 24/7 service uptime.

What is an SLO vs SLA? ›

What is an SLO? An SLO (service level objective) is an agreement within an SLA about a specific metric like uptime or response time. So, if the SLA is the formal agreement between you and your customer, SLOs are the individual promises you're making to that customer.

What are the fundamentals of SLA? ›

Key components of an SLA
  • Agreement overview.
  • A list of stakeholders.
  • The goals of all stakeholders.
  • A description of services.
  • Service levels.
  • A list of services excluded from the agreement.
  • Conditions of cancellation.
  • A plan if goals aren't reached.
Apr 23, 2024

How is SLO calculated? ›

When your data source supports the bad-over-total ratio metrics, you can use it for your SLO. In this case, bad events are compared against total . Users can provide input to these two streams for Nobl9 to calculate their SLOs (time above the threshold or good-to-total / bad-to-total occurrences ratio).

How is SLA measured? ›

Service Level Agreement (SLA) metrics are used to measure a service provider's performance against agreed service level goals. These metrics are an essential part of SLAs as they offer both parties a way to objectively measure the quality of service and identify areas for improvement.

What are the 3 types of SLA? ›

What are the three types of SLAs? There are three basic types of SLAs: customer, internal and multilevel service-level agreements. A customer service-level agreement is between a service provider and its external or internal customers. It is sometimes called an external service agreement.

What is SLA vs KPI vs SLO? ›

A KPI is a metric you track, an SLA is something you promise, and an SLO is a range for those KPIs to live in.

What is an example of an SLO? ›

E-commerce website: The e-commerce website should be available 99.9% or 99.99% of the time. This SLO example provides a standard of availability that allows customers to browse and purchase products without interruptions.

What are rules for SLA? ›

SLA best practices
  • Create an SLA that stops tracking time to resolution while you're waiting for a customer to reply. ...
  • Remember the agent experience. ...
  • Break up large, complex SLAs. ...
  • Set different performance goals based on ticket priority levels. ...
  • Keep some SLAs running 24/7, and restrict others to normal business hours.

How to track SLA? ›

The first step to track SLAs is to define the key metrics that will be used to evaluate the service quality and outcomes. These metrics should be SMART: specific, measurable, achievable, relevant, and time-bound.

What is an example of a SLA? ›

For example, a company can draw up an internal service-level agreement between its sales department and its marketing team. This SLA might specify that marketing needs to provide a certain number of leads to sales per month to reach its quota.

How to calculate SLAs? ›

Clear SLAs align call center performance with customer expectations and organizational goals. How to Calculate Service Level: Divide the number of calls answered within a specific timeframe by the total number of calls, then multiply by 100. This service level formula helps assess service performance and quality.

What is an SLO vs an SLA? ›

SLAs are used externally to define an agreement between a company's service and its paid users. SLOs are objectives that are measured internally to determine whether the SLA is being met. If an SLO's terms are violated, teams must respond and react quickly to prevent from breaking the SLA.

What is the formula for SLA for incidents? ›

There are 2 formulas here:
  • For SLA which uses 24/7 default calendar. For tickets that met the SLAs, Time to Resolution as x = (SLA - displayed value in green) For tickets that did not meet SLA Time to Resolution as y = (SLA + displayed value in red) Sum of hours = ( Σx + Σy ) = z . ...
  • For SLA which uses 9-5 calendar.
Aug 7, 2023

What is the standard SLA percentage? ›

SLA Uptime Metrics

The industry standard is five 9's, or 99.999% availability. But not every service provider offers that. In fact, when viewed over an entire year, what many companies offer can leave customers down for much longer than they think. Consider a service provider who offers 99% uptime in their SLA.

What is the SLA for 99.99 per month? ›

Uptime and downtime with 99.9 % SLA

Weekly: 10m 4.8s. Monthly: 43m 28s. Quarterly: 2h 10m 24s. Yearly: 8h 41m 38s.

How do you calculate agreement level? ›

Cohen's kappa (κ) calculates inter-observer agreement taking into account the expected agreement by chance as follows: κ = (observed agreement [Po] – expected agreement [Pe])/(1-expected agreement [Pe]). In the above example [Table 1, Situation 1], Cohen's k = (0.80 − 0.50)/(1 − 0.50) = 0.30/0.50 = 0.60.

How is response SLA calculated? ›

Respond – Response SLA is calculated from the time the incident is created and assigned to a group till it is assigned to someone from the group. It is the time taken to acknowledge the ticket. Resolution – Resolution SLA is calculated from the time the incident is created till the time the incident is resolved.

Top Articles
8. Grading: Scales are good; Curves are bad.
B+ Letter Grade is a 3.3 GPA or 87–89% – GPA Calculator
Lesson 5 Homework 4.5 Answer Key
Lbl A-Z
Kia North Huntingdon Pa
Berry Mcgreevey Funeral Home Westlake Ohio
Peoplesgamezgiftexchange House Of Fun Coins
Anastasiya Kvitko Forum
Hygeia: The Greek Goddess of Health | History Cooperative
Qmx Airport
432-237-3514
Aeries Portal Tulare
Dan Mora Growth
Telegram Scat
Allmovieshub. In
Gotcha Paper 2022 Danville Va
Tamara Lapman
Nm Ose
Toledo Schools Closed
The Exorcist: Believer Showtimes Near Regal Carlsbad
9-1-1 Kidnapped Boy Episode Cast
Sunset On June 21 2023
800 Times 6
Calverton-Galway Local Park Photos
New England Revolution vs CF Montréal - En vivo MLS de Estados Unidos - 2024 - Fase Regular
Lagrange Tn Police Officer
415-261-2242
Translations Of Linear Functions Worksheet Answer Key
Madden 24 Repack
97226 Zip Code
Caprijeans ARIZONA Ultra Stretch Gr. 36, N-Gr, rosa Damen Jeans High Waist mit seitlichem Streifen
Alumni of University of Michigan: class of 1978
Courier Press Sports
Dollar General Warehouse Pay Rate
Botw Royal Guard
Emerson Naturals Kratom
This Modern World Daily Kos
Massui Login
Ogden Body Rubs
What Kinds of Conditions Can a General Surgeon Operate On?
149 Capstone Project Ideas & Examples – 2024
Sriracha Sauce Dollar General
9Xmovie Worldfree4U
28 Box St
Aultman.mysecurebill
Craigslist South Jersey Nj
Wmlink/Sspr
Janitronics Team Hub
Christopher Carlton Cumberbatch
Ups Printing Services
indianapolis community "free" - craigslist
Sound Of Freedom Showtimes Near Sperry's Moviehouse Holland
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated:

Views: 5986

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.