On Tuesday, OpenAI revealed a big update to its large language model APIs, such as GPT-4 and GPT-3.5-turbo. The update includes a new function-calling feature, big cost cuts, and an option for the GT-3.5-turbo model to have a 16,000-token context window.
In large language models (LLMs), the “context window” is like a short-term memory that saves the contents of the prompt input or, in the case of a chatbot, the whole conversation. In language models, raising the size of the context has become a technological race. For example, Anthropic just announced that its Claude language model can have a 75,000-token context window. OpenAI has also made a version of GPT-4 with 32,000 tokens, but it is not yet available to the public.
In this vein, OpenAI just released a new version of gpt-3.5-turbo with 16,000 context windows. It’s called “gpt-3.5-turbo-16k,” and it lets a message be up to 16,000 tokens long. With four times the context length of the normal 4,000 version, gpt-3.5-turbo-16k can process about 20 pages of text in a single request. This is a big help for developers who want the model to be able to handle and respond to larger pieces of text.
OpenAI listed at least four other major changes to its GPT APIs, which were described in length in the announcement post:
Adding the ability to call functions to the Chat Completions API Versions of the GPT-4 and GPT-3.5-turbo that are better and “easier to steer” The price of the “ada” embeddings model will go down by 75%. Input coins for GPT-3.5-turbo will now cost 25% less. With function calling, it’s now easier for developers to make apps that can call external tools, turn natural language into API calls, or query a database. For example, it can turn a prompt like “Email Anya to see if she wants to get coffee next Friday” into a function call like “send_email(to: string, body: string)”. In particular, this function will make it easier for API users to get consistent JSON-formatted output, which was hard to do before.
About “steerability,” which is a fancy word for getting the LLM to act the way you want it to, OpenAI says that its new “gpt-3.5-turbo-0613” model will have “more reliable steerability via the system message.” The system message in the API is a special command that tells the model what to do, like “You are Grimace. You never talk about anything else.”
In addition to making things work better, OpenAI also offers big savings on costs. Notably, the price of input tokens for the popular GPT-3.5-turbo has gone down by 25%. This means that developers can now use this model for about $0.0015 per 1,000 input tokens and $0.002 per 1,000 output tokens, which is about 700 pages per dollar. The cost of the gpt-3.5-turbo-16k type is $0.003 per 1,000 tokens sent in and $0.004 per 1,000 tokens sent out.
Also, OpenAI is cutting the price of its “text-embedding-ada-002” embeddings model by a huge 75%. This model is more difficult to use than its talking siblings. An embedding model is like a translator for computers. It turns words and ideas into a numerical language that machines can understand. This is important for tasks like searching text and offering relevant content.
Since OpenAI is always making changes to its models, the old ones won’t last forever. Today, the company also said that it will stop making some older versions of these models, such as the gpt-3.5-turbo-0301 and gpt-4-0314, which will be phased out. The company says that developers can still use these older models until September 13. After that date, developers won’t be able to use them anymore.