A ChatGPT rate limit refers to the constraint set by an API on the frequency at which a user or client can interact with the server within a predetermined time frame. Rate limits are a standard measure implemented in APIs, serving several key purposes:
- They act as a safeguard against the misuse or abuse of the API. For instance, an ill-intentioned actor might inundate the API with a multitude of requests, attempting to overwhelm it or instigate service disruptions. With rate limits, OpenAI can avert such behavior.
- Also, ChatGPT Rate limits help to ensure equitable access to the API for all. If a single individual or entity sends an excessively high number of requests, it might cause the API to slow down for everyone else. By controlling the number of requests a single user can send, OpenAI makes certain that the highest possible number of people can use the API without experiencing decelerated performance.
- And last, Rate limits aid OpenAI in managing the collective demand on its infrastructure. If the API experiences a sudden surge in requests, it could strain the servers and lead to performance issues. Through the implementation of rate limits, OpenAI can uphold a steady and consistent user experience.
You can also read How to get ChatGPT API Key?
What are the rate limits for ChatGPT API?
The rate limits applicable to your organization can be found under the ‘rate limits‘ category on the account management page.
Please notice, that Rate limits are imposed at the level of the organization, rather than individual users, and depend on the specific endpoint being accessed and the type of account you hold. OpenAI quantifies rate limits in two ways: RPM (requests per minute) and TPM (tokens per minute). The following table outlines the standard rate limits for API. However, these limits can be augmented based on your specific needs, once you’ve completed and submitted the Rate Limit Increase Request Form.
TEXT & EMBEDDING | CHAT | EDIT | IMAGE | AUDIO | |
---|---|---|---|---|---|
Free trial users | 3 RPM 150,000 TPM | 3 RPM 40,000 TPM | 3 RPM 150,000 TPM | 5 images / min | 3 RPM |
Pay-as-you-go users (first 48 hours) | 60 RPM 250,000 TPM | 60 RPM 60,000 TPM | 20 RPM 150,000 TPM | 50 images / min | 50 RPM |
Pay-as-you-go users (after 48 hours) | 3,500 RPM 350,000 TPM | 3,500 RPM 90,000 TPM | 20 RPM 150,000 TPM | 50 images / min | 50 RPM |
For example, for the gpt-3.5-turbo-16k
model, the TPM limit for pay-as-you-go users is twice the values mentioned above, thereby setting the limits at 120K TPM and 180K TPM respectively.
For the legacy models, the unit of TPM (tokens per minute) varies based on the version of the model.
TYPE | 1 TPM EQUALS |
---|---|
davinci | 1 token per minute |
curie | 25 tokens per minute |
babbage | 100 tokens per minute |
ada | 200 tokens per minute |
In real-world terms, this implies that you can transmit roughly 200 times more tokens per minute to an ada model compared to a davinci model.
It’s crucial to understand that the rate limit can be reached by either metric depending on which one is exhausted first. For instance, if you were to send 20 requests, each with only 100 tokens, to the Edit endpoint, this would max out your limit, even if you didn’t send 150k tokens within those 20 requests.
GPT-4 rate limits
During the restricted beta launch of GPT-4, the model will enforce stricter rate limits to manage demand. The default rate limits for gpt-4/gpt-4-0613
are set at 40k TPM and 200 RPM, while for gpt-4-32k/gpt-4-32k-0613
, the limits are 150k TPM and 20 RPM.
Due to capacity limitations, OpenAI currently unable to process requests for rate limit increases. At this stage, the model is primarily designed for experimental and prototyping purposes, rather than for high-volume production scenarios.
How do ChatGPT rate limit work?
Lets say, your rate limit is set at 60 requests per minute and 150k davinci tokens per minute, you’ll be restricted either when you hit the cap on requests/min or exhaust your token allocation—whichever occurs first. For instance, if your maximum requests/min is 60, you should be able to issue 1 request every second.
If you send 1 request every 800ms, once you reach your rate limit, you would only need to introduce a delay of 200ms in your program before sending another request; otherwise, subsequent requests would be unsuccessful. With a default setting of 3,000 requests/min, users can effectively send 1 request every 20ms, or every .02 seconds.
What happens if I hit a rate limit error?
Hitting a rate limit implies that an excess number of requests have been made in a limited duration, resulting in the API denying additional requests until a certain time period elapses.
Rate limits vs max_tokens
Each model OpenAI provide has a fixed limit on the number of tokens that can be included as input during a request. Keep in mind, that this maximum token limit per model cannot be extended. For instance, if you’re utilizing the text-ada-001
model, you can send a maximum of 2,048 tokens per request to this model.