OpenAI’s API is powered by a variety of models, each with distinct abilities and cost structures. Through fine-tuning, these base models can be modified to a certain extent to accommodate your particular needs. Provided below is a list of OpenAI GPT Models.
OpenAI GPT Models
MODELS | DESCRIPTION |
---|---|
GPT-5 | Under development, available – Late 2024 |
GPT-4 | A set of models that improve on GPT-3.5 and can understand as well as generate natural language or code |
GPT-3.5 | A set of models that improve on GPT-3 and can understand as well as generate natural language or code |
GPT-3 | A set of models that can understand and generate natural language |
TTS | A set of models that can convert text into natural sounding spoken audio |
DALL·E | A model that can generate and edit images given a natural language prompt |
Whisper | A model that can convert audio into text |
Embeddings | A set of models that can convert text into a numerical form |
Codex | A set of models that can understand and generate code, including translating natural language to code |
Moderation | A fine-tuned model that can detect whether text may be sensitive or unsafe |
Also, OpenAI recently published open source models including Point-E, Whisper, Jukebox, and CLIP.
Below you can find short description of each Model. Also you can read more about Limitations of GPT models, Methods OpenAI use and follow Updates.
GPT-5
Under development, available – Late 2024
GPT-4 Turbo
The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. This preview model is not yet suited for production traffic.
CONTEXT WINDOW: 128,000 tokens
TRAINING DATA: Up to Apr 2023
GPT-4
GPT-4 is a robust multimodal model that currently accepts and emits text, with the future prospect of image inputs. It’s capable of addressing challenging tasks with superior precision, courtesy of its extensive general knowledge and refined reasoning skills. Designed primarily for chat, GPT-4, like its predecessor gpt-3.5-turbo, also excels at traditional completion tasks.
CONTEXT WINDOW: 128,000 tokens
TRAINING DATA: Up to Apr 2023
GPT-3.5
The GPT-3.5 models are adept at understanding and generating both natural language and code. Among these, gpt-3.5-turbo stands out as the most powerful and cost-efficient. Although optimized for chat-based tasks, it also shows impressive performance with standard completion tasks.
CONTEXT WINDOW: 16,385 tokens
TRAINING DATA: Up to Sep 2021
GPT-3 (deprecated)
GPT-3 models are equipped to understand and generate natural language. Although surpassed by the more powerful GPT-3.5 models, the original GPT-3 base models, including davinci, curie, ada, and babbage, are currently the only models available for fine-tuning.
DALL·E
DALL·E is an AI model that creates realistic images and artworks from natural language descriptions. The DALL·E API facilitates the creation of new images of a specified size, editing of existing images, or generating variations of user-provided images.
The current API supports the second iteration of the DALL·E model, which offers superior resolution, accuracy, and realism compared to its predecessor. The most known competitor of Dalle is Stable Diffusion by Stability AI.
DALL·E 3
The latest DALL·E model released in Nov 2023.
TTS
TTS is an AI model that converts text to natural sounding spoken text. We offer two different model variates, tts-1
is optimized for real time text to speech use cases and tts-1-hd
is optimized for quality. These models can be used with the Speech endpoint in the Audio API.
MODEL | DESCRIPTION |
---|---|
tts-1 | Text-to-speech 1New The latest text to speech model, optimized for speed. |
tts-1-hd | Text-to-speech 1 HDNew The latest text to speech model, optimized for quality. |
Whisper
The Whisper model, offered via the OpenAI API, is a versatile speech recognition model trained on a variety of audio data. It’s capable of speech recognition, language identification, and speech translation across multiple languages. Currently, the Whisper v2-large model, also known as whisper-1, is available. OpenAI has also recently launched ChatGPT Whisper APIs.
Embeddings
The text-embedding-ada-002 model is a second-generation embedding model that presents a more cost-efficient solution than the previous 16 first-generation models. Embeddings are numerical representations of text that indicate the relationship between two text pieces, useful for search, clustering, recommendations, anomaly detection, and classification.
Codex
The Codex models are proficient in understanding and generating code. Their training data consists of natural language and billions of lines of open-source code from GitHub. They show the highest proficiency in Python and are skilled in twelve languages including:
- JavaScript,
- Go,
- Perl,
- PHP,
- Ruby,
- Swift,
- TypeScript,
- SQL,
- Shell.
During the limited beta, Codex models offer free usage with reduced rate limits, with future pricing plans to accommodate a broader range of applications.
Moderation
The Moderation models aim to ensure content adherence to OpenAI’s usage policies. They provide classification capabilities to detect various content categories, such as hate, hate/threatening, self-harm, sexual, sexual/minors, violence, and violence/graphic.