Best Practices for API Rate Limits and Quotas with Moesif to Avoid Angry Customers (2024)

Like any online service, your API users expect high availability and good performance. This also means one customer should not be able to starve another customer’s access to your API. Adding rate limiting is a defensive measure which can protect your API from being overwhelmed with requestsand improve general availability. Similarly, adding quota management also ensures customers stay within their contract terms and obligations ensuring you’re able to monetize your API.Without quota management, a customer could easily use far more resources than their plan allows even if they stay within your overall server rate limits. Yet, incorrect implementations can cause customers to become angry due to their requests not working as expected. Worst, a bad rate limiting implementation could fail itself causing all requests to be rejected. This guide walks through different types of rate limits and quotas. Then, it walks through ways to set up rate limiting that protects your API without making customers angry.

How do rate limit and quotas works

Both quotas and rate limits work by tracking the number of requests each API user makes within a defined time interval and then taking some action when a user exceeds the limit which could be a variety of things such as rejecting the request with a 429 Too Many Requests status code, sending a warning email, adding a surcharge, among other things. Just like different metrics are needed to measure different goals, different rate limits are used to achieve different goals.

Rate limits vs quota management

There are two different types of rate limiting, each with different use cases. Short term rate limits are focused on protecting servers and infrastructure from being overwhelmed.Whereas, long term quotas are focused on managing the cost and monetization of your API’s resources.

Rate limits

Short term rate limits look at the number of requests per second or per minute and help “even out” spikes and bursty traffic patterns to offer backend protection.Because short term rate limits are calculated in real-time, there is usually little customer-specific context. Instead, these rate limits may be measured using a simple counter per IP address or API key.

Example use cases for rate limits:

Protect downstream services from being overloaded by traffic spikes
Increase availability and prevent certain against DDoS attacks from bringing down your API
Provide a time buffer to handle capacity scaling operations
Ensure consistent performance for customers and even out load on databases and other dependent services
Reduce costs due to uneven utilization of downstream compute and storage capacity.

Identifier

Due to their time sensitivity, short term rate limits need a mechanism to identify different clients without relying heavily on external context. Some rate limiting mechanisms will use IP addresses, but this can be inaccurate. For example, some customers may call your API from many different servers. A more robust solution may use the API key or the user_id of the customer.

Scope

Short term rate limits can be either scoped to the server or a distributed cluster of instances using a cache system like Redis. You can also use information within the request such as the API endpoint for additional scope. This can be helpful to offer different rate limits for different services depending on their capacity. For example, certain services may be very costly to service and can be easily overwhelmed such as launching batch jobs or running complex queries on a database.Short term rate limits can be imperfect given their real-time nature which makes them a poor form for billing and financial terms, but great for backend protection.

Quota management

Unlike short term rate limits, long term quotas measure customer utilization of your API over longer durations such as per hour, per day, or per month. Quotas are not designed to prevent a spike from overwhelming your API. Rather, quotas regulate your API’s resources by ensuring a customer stays within their agreed contract terms. Because you may have a variety of different API service tiers, quotas are usually dynamic for each customer, which makes them more complex to handle than short-term rate limiting. Besides quota obligations, historical trends in customer behaviors can be used for spam detection and automatically blocking users who may be violating your API’s terms of service (ToS).

Examples use cases for quota limits:

Block intentional abuse such as sending spam messages, scraping, or creating fake reviews
Reduce unintentional abuse while allowing a customer’s usage to burst if needed
Properly monetize your API via metering and usage-based billing
Ensuring a customer does not consume too many resources or rake up your cloud bill.
Enforce contract terms of service and prevent “freeloaders”

Identifier

Long term quotas are almost always calculated on a per-tenant or customer level. IP addresses won’t work for these cases because an IP address can change or a single customer may be calling your API from multiple servers circumventing the enforcement.

Scope

Because quotas are usually enforcing the financial and legal terms of a contract, it should be unified across all servers and be accurate. There can’t be any “guesstimation” when it comes to quotas.

How to implement rate limiting

Usually a gateway server like NGINX or Kong is the ideal spot to integrate rate limiting as most external requests will be routed through your gateway layer.For short term rate limit violations, the universal standard is to reject requests with 429 Too Many Requests. Additional information can be added in the response headers or body instructing the client when the throttle will be cleared or when the request can be retried.

For long term quota violations, a number of different actions can be taken. You could either reject the requests similar to short term rate limiting, but you could also handle other ways such as adding an overage fee.

Blocking users exceeding their quota

An easy way to manage quotas are with Moesif’s API Governance features. This enables you to add rules that regulate your API with just a simple SDK and a few clicks within the UI. Instructions on how to do this are below:

Within Moesif, create a user cohort under the User Lookup tab. Add your criteria when a user is considered exceeding their quota. In this example, when a user makes more than 1,000 /purchases or /purchases/:id/decline within an hour period.

Now that we created the cohort, go to API Governance under the Alerting & Governance tab. From here, create a new governance rule as shown below. In this case, we are short circuiting the request with the status code 429 Too Many Requests. We also provide an informational message on why the request is rejected.

Informing customers of rate limit and quota violations

Like any fault or error condition, you should have active monitoring and alerting to understand when customers are approaching or exceeding their limits/quotas. Your customer success team should proactively reach out to customers who run into these issues and assist them to optimize their integration. Because manual outreach can be slow and unscalable , you should have a system in place that automatically informs customers when they do run into rate limits as their transactions are getting rejected which can cause issues in their applications.

An easy way to keep customers informed of such issues is via Moesif’s behavioral email feature. Instructions on how to do this are below:

Within Moesif, create a user cohort under the User Lookup tab. Add your criteria when to alert customers such as by looking at the number of API calls or when a rate limit header reaches a certain threshold. In this example, we add a filter response.headers.Ratelimit-Remaining < 10

Now that we created the cohort, go to Behavioral Emails under the Alerting & Governance tab. From here, create a new email template and design it to fit your requirements as shown below.

Rate limit remaining headers

Besides sending emails, it’s also helpful to inform the customer of any rate limit remaining using HTTP response headers. There is an Internet Draft that specifies the headers RateLimit-Limit, RateLimit-Remaining and RateLimit-Reset.

By adding these headers, developers can easily set up their HTTP clients to retry once the correct time has passed. Otherwise, you may have unnecessary traffic as a developer won’t know exactly when to retry a rejected requested. This can create a bad customer experience.

Rate limit implementation errors

Even a protection mechanism like rate limiting could have errors itself. For example, a bad network connection with Redis could cause reading rate limit counters to fail. In such scenarios, it’s important to not artificially reject all requests or lock out users even though your Redis cluster is inaccessible. Your rate limiting implementation should fail open rather than fail closed meaning all requests are allowed even though the rate limit implementation is faulting.

This also means rate limiting is not a workaround to poor capacity planning as you should still have sufficient capacity to handle these requests or even designing your system to scale accordingly to handle a large influx of new requests. This can be done through auto-scale, timeouts, and automatic trips that enable your API tostill function.

Conclusion

Quotas and rate limits are two tools that enable you to better manage and protect your API resources. Yet, rate limits are different from quotas in terms of business use case. It’s critical to understand the differences and limitations of each. In addition, it’s also important to provide tooling such that customerscan stay informed of rate limit issues and a way to audit 4xx errors including 429.

API Abuse , API Gateways , Quotas

Derric Gilling

Co-founder & CEO @Moesif. Previously Computer Architect @Intel. Studied @UMichigan.

San Francisco
Email
Twitter
LinkedIn
GitHub

FAQs

Best Practices for API Rate Limits and Quotas with Moesif to Avoid Angry Customers? ›

Best Practices for API Rate Limits and Quotas with Moesif to Avoid Angry Customers. Like any online service, your API users expect high availability and good performance. This also means one customer should not be able to starve another customer's access to your API.

Read On ›

How do you avoid hitting rate limits in API integration? ›

Reducing the number of API requests

Optimize your code to eliminate any unnecessary API calls. ...
Cache frequently used data. ...
Use bulk and batch endpoints, such as Update Many Tickets, that let you update up to 100 tickets with a single API request.

Discover More Details ›

How do I overcome the API rate limit? ›

Use HTTP 429 with available response headers. ...
Retry the request with exponential backoff using HTTP response statuses. ...
Don't retry the rate-limited request.

Jan 25, 2024

How do you manage API rate limits? ›

Different Methods of Rate Limiting

Throttling. Throttling is performed by setting up a temporary state within the API, so the API can properly assess all requests. ...
Request Queues. Another popular method of rate limiting is “requests queues”, which limits the number of requests in any given period of time. ...
Algorithm-Based.

See Details ›

What is the difference between API rate limit and quota? ›

In Portal, the quota specifies the maximum number of hits per day or month, while the rate limit specifies the maximum number of hits per second.

Find Out More ›

What are the best practices for API rate limiting exceedance? ›

How to implement rate limiting and throttling

Implement rate limiting logic. ...
Handle rate limit exceedances. ...
Reset rate limits. ...
Logging and monitoring. ...
Inform clients. ...
Test and iterate. ...
Consider rate limiting algorithms. ...
Implement API throttling (Optional)

More items...

Oct 24, 2023

Tell Me More ›

How to get around being rate limited? ›

tl;dr: the easiest way to get around rate limiting is to use a scraping API such as ZenRows or ScraperApi. If you're scraping a website and suddenly run into Error 1015, then it's CloudFlare that is telling you that you're making too many requests.

Show Me More ›

What is the problem with rate limit? ›

If there are too many requests from a single IP within the given timeframe, the rate limiting solution will not fulfill the IP address's requests for a certain amount of time. Essentially, a rate-limited application will say, "Hey, slow down," to unique users that are making requests at a rapid rate.

Explore More ›

How can I resolve API throttling or rate exceeded errors? ›

Resolution

Reduce the frequency of the API calls.
Stagger the intervals of the API calls so that they don't all run at the same time.
Use APIs that return more than one value. ...
Implement error retries and exponential backoff when you make API calls.
Increase Parameter Store throughput.

What is rate limit prevention? ›

In summary, rate limiting is a method of controlling traffic flow to a service or server by restricting the number of requests that can be made within a certain time frame. It is an essential technique for preventing resource abuse, ensuring fair use of services and protecting against DDoS attacks.

Show Me More ›

What is the main purpose of an API rate limit? ›

API rate limiting is a set of measures put in place to help ensure the stability and performance of an API system. It works by setting limits on how many requests can be made within a certain period of time — usually a few seconds or minutes — and what actions can be taken.

Read The Full Story ›

What is the difference between API rate limiting and throttling? ›

While they share the common goal of managing API traffic, their approaches and purposes differ significantly. Rate limiting acts as the equitable gatekeeper, ensuring all users play by the same rules, while throttling is the adaptive traffic controller, maintaining the flow regardless of conditions.

See Details ›

How to overcome rate limit error? ›

One easy way to avoid rate limit errors is to automatically retry requests with a random exponential backoff. Retrying with exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the unsuccessful request.

Get More Info Here ›

Should you rate limit API? ›

API rate limiting is one of the fundamental aspects of managing traffic to your APIs. It is important for quality of service, efficiency and security. It is also one of the easiest and most efficient ways to control traffic to your APIs.

How do you increase the quota for an API? ›

Increase the quota for an API

Go to Google Cloud and sign in as a Google Workspace super administrator. Under Project, select the project you're using for the migration. Quotas. Using the checkboxes, select one or more quotas to edit, then click Edit Quotas.

What is a daily API limit? ›

Learn what counts toward your API limit and what happens if you reach your Salesforce API daily limit. If you have Salesforce Sync enabled, you may exceed their allotted API daily call limit. These limits are administered within Salesforce and are typically set at 15,000 calls within a 24-hour period.

View Details ›

How to prevent rate limit exceeded? ›

Rate Limit Exceeded Twitter Fix: 3 Solutions That Work

Ensure There Are Zero Problems From X's End. The first rule of troubleshooting is to ensure the service you're using, i.e., Twitter, doesn't have any issues. ...
Wait for the Cooldown Period To End. ...
Unfollow Some Users or Get More People To Follow You.

How to disable rate limiting? ›

Yes, the rate limiter can be disabled for an API Definition by selecting Disable Rate Limits in the API Designer, or by setting the value of disable_rate_limit to true in your API definition. Alternatively, you could also set the values of Rate and Per (Seconds) to be 0 in the API Designer.

Learn More ›

What is rate limiting in API connect? ›

In API Connect, rate limits can be defined as unlimited, or with a specified number of calls per second, minute, hour, day, or week. Rate limits can be "hard" (enforced) or "soft". If the rate limit is hard and a call exceeds the limit, then the call is aborted and an error is returned.

Discover More Details ›

What is the rate limit for REST API? ›

API providers use REST API rate limits to control the frequency of client requests to their web servers. This allows the providers to maintain their server's reliability and efficiency and to distribute resources equally among users. REST API limits differ from API throttling, a more dynamic form of control.

Show Me More ›