The AI arms race continues apace: Anthropic is releasing its newest model, called Claude 3.5 Sonnet, which it claims is equal to or better than OpenAI’s GPT-4o or Google’s Gemini across a wide variety of tasks. The new model is now available to Claude users on the web and iOS, and Anthropic is also making it available to developers.
Claude 3.5 Sonnet will ultimately be the mid-range model in the range – Anthropic uses the name Haiku for its smaller model, Sonnet for the conventional mid-range option, and Opus for its higher-end model. (The names are weird, but every AI company seems to be naming things in their own weird ways, so we’ll let that slide.) But the company says the 3.5 Sonnet outperforms the 3 Opus, and its benchmarks show that this is by a very wide margin. The new model is also apparently twice as fast as the previous one, which could be even greater.
AI model benchmarks should always be considered with caution; There are so many of them, it’s easy to choose the ones that make you look good, and the models and products are changing so fast that no one seems to have a lead for long. That said, Claude 3.5 Sonnet looks impressive: it outperformed GPT-4o, Gemini 1.5 Pro, and Meta’s Llama 3 400B in seven of the nine overall benchmarks and four of the five vision benchmarks. Again, don’t make too much of this, but it looks like Anthropic has built a legitimate competitor in this space.
What does all this really mean? Anthropic says Claude 3.5 Sonnet will be much better at writing and translating code, managing multi-step workflows, interpreting tables and graphs, and transcribing text from images. This new and improved Claude is also apparently better at understanding humor and can write in a much more human way.
Along with the new model, Anthropic is also introducing a new feature called Artifacts. With Artifacts, you’ll be able to see and interact with the results of your Claude requests: if you ask the model to design something for you, it can now show you what it looks like and let you edit it directly in the app. If Claude writes you an email, you can edit it in the Claude app instead of copying it to a text editor. It’s a small but smart feature – these AI tools need to become more than just chatbots, and features like Artifacts just give the app more activity.
In fact, the artifacts seem to be a sign of Claude’s long-term vision. Anthropic has long said it is primarily focused on enterprises (even hiring consumer technology people like Instagram co-founder Mike Krieger) and said in its press release announcing the Claude 3.5 Sonnet that it plans to turn Claude into a tool for companies “securely centralize their knowledge, documents and ongoing work in a shared space.” This sounds more like Notion or Slack than ChatGPT, with Anthropic’s templates at the heart of the entire system.
For now, however, the model is the big news. And the pace of improvement here is incredible to watch: Anthropic released Claude 3 Opus in March, proudly saying it was as good as GPT-4 and Gemini 1.0, before OpenAI and Google released better versions of their models. Now, Anthropic has taken its next step and it certainly won’t be long before its competition does too. Claude is not as talked about as Gemini or ChatGPT, but he is very much in the running.