GPT-4 becomes faster, closing the latency gap with GPT-3.5

GPT-4, OpenAI's latest AI model, approaches the speed of GPT-3.5, its predecessor. A recent study found that the median GPT-4 latency has remained consistent over the past three months, remaining below 1ms per token. However, latency at the 99th percentile more than halved over the same period. This means that the majority of requests are now processed by GPT-4 faster than by GPT-3.5.

Factors that contribute to latency are round-trip travel time, queuing time, and processing time. Processing time can vary significantly depending on the complexity and length of the prompt. It should be noted that a high number of tokens does not always translate into a slower response. For example, a 204 token prompt, although simple, can be answered in just 4.5 seconds. In contrast, a 33 token prompt, if complex, can take up to 32 seconds to process.

Despite its higher cost, GPT-4 is no longer slower than GPT-3.5 for the majority of queries.

OpenAI also explores another intriguing question: does latency increase as the user approaches their throughput limits? In other words, is OpenAI deliberately slowing down users? The results of this study will be published in a future article.

Source

Fashionable

Blogs

Recent publications

Moshi Keynote Highlights: Kyutai YouTube Presentation

Declare Your AIndependence: Block AI Bots, Scrapers, and Crawlers with a Single Click

The Evolution of AI: From Concept to Reality

GPT-4 becomes faster, closing the latency gap with GPT-3.5

Most popular AI tools

Fashionable

Fashionable

Blogs

Recent publications

Moshi Keynote Highlights: Kyutai YouTube Presentation

Declare Your AIndependence: Block AI Bots, Scrapers, and Crawlers with a Single Click

The Evolution of AI: From Concept to Reality

GPT-4 becomes faster, closing the latency gap with GPT-3.5

Most popular AI tools

Shop the look

Choose Options

Hey, last bastion of humanity!

Edit Option

Back In Stock Notification

Fashionable