AI

Claude just got upgraded, and why you shoud use it over ChatGPT

Claude just got upgraded, and why you shoud use it over ChatGPT

| How good is the new Claude 3.5 sonnet, and what is computer use?


On Oct 22, 2024, Anthropic announced upgraded Claude 3.5 sonnet, new model Claude 3.5 Haiku, and computer use.
We are going to see how they compares to other publicly available models like OpenAI’s GPT-4o or Google’s Gemini.


Claude 3.5 sonnet - Anthropic’s top model
Claude 3.5 Haiku - Fast and efficient model (uses less tokens/resources).

Claude_3.5_bench

Claude 3.5 sonnet is now the top model in SWE-bench Verified.
SWE-bench tests language models on real-world GitHub issues (coding)

  • Resolved 49%
SWE_bench_leaderboard_Oct_2024


Claude 3.5 sonnet is also higher on TAU-bench than GPT-4o.


Computer use (beta) -

AI can control/use your computer and apps on it (by looking at a screen, moving a cursor, clicking buttons, and typing text).
Computer use is still experimental and currently available with Claude 3.5 sonnet.

Computer use docs


OSWorld Benchmark for Claude 3.5 Sonnet in screenshot-only category.
(OSWorld evaluates AI models’ ability to use computers like people)

  • 50 steps - 22%
  • 15 steps - 14.9 %
OS_World_Bench_Oct_2024


OpenAI (ChatGPT) offers GPT-4o and GPT-4o mini as free models. After 4 or 5 prompts, you will hit the free plan limit and get switched to 4o mini.

Claude 3.5 sonnet is also available free for limited use, and you should consider using sonnet because it is a better model than 4o.

(Claude can’t create images or have internet access, but you can upload images or docs.)


*Comparison table is taken from Anthropic’s post

Tags