Stay updated with our daily and weekly newsletters, packed with the latest news and exclusive content on leading AI developments.
A month after the release of OpenAI’s GPT-4o, a new language model has taken the spotlight: Anthropic’s Claude 3.5 Sonnet chatbot and LLM, announced today, outperforms all existing models on key benchmark tests, as stated by the company. Additionally, it is both faster and more cost-effective than its Claude 3 predecessors.
However, there’s a difference between launching a powerful model and users actually experiencing its full potential. Anthropic’s Claude 3.5 Sonnet doesn’t seem to struggle with this issue. On the day of its release, numerous AI enthusiasts and experts took to the internet to share their positive experiences and showcased the impressive capabilities of what is being dubbed the “most intelligent” LLM.
Enhancing Coding and Product Development
AI influencer Allie K. Miller shared that Claude 3.5 Sonnet quickly created a fully functional web app for the game Mancala based on a single screenshot of the game’s instructions in just 25 seconds. Miller was amazed to find that the model not only coded the game but also provided a preview for testing and the game’s rules.
Similarly, the account TestingCatalog News demonstrated how Claude 3.5 Sonnet built a simple contact form using React jsx code, which ran successfully in the newly launched “Artifacts” playground, an interactive feature accompanying the chatbot interface.
Impressively, Claude 3.5 Sonnet also managed to recreate iconic imagery from the 1995 movie Hackers, specifically the “Data flow” 3D scene, on its first attempt.
Pietro Schirano, the founder of EverArt, an AI image generation startup, combined Claude 3.5 Sonnet with another tool, Maestro, and noted that it showed potential signs of AGI (Artificial General Intelligence). He asked the model to create a Mario game clone using geometric shapes, and within three minutes, it produced character animations and unique shapes.
Anthropic Endorsements
Naturally, Anthropic’s own developers are endorsing Claude 3.5 Sonnet. Alex Albert, the company’s developer relations team leader, noted that the model is becoming proficient in coding and autonomously fixing pull requests. He predicted that within a year, a large portion of code will be generated by LLMs.
Maggie Vo, another technical staffer at Anthropic, shared her excitement on X, mentioning that Claude 3.5 Sonnet could now handle half of her job responsibilities.
Pressure on OpenAI
With Claude 3.5 Sonnet overtaking OpenAI’s GPT-4o and being priced similarly, OpenAI faces increased pressure to validate its models’ superiority. Ethan Mollick, a professor at Pennsylvania University’s Wharton School of Business, remarked that the Artifacts feature of Claude 3.5 Sonnet is akin to a simpler version of GPT-4’s Code Interpreter. After testing the new model, he shared a video demonstrating how he created and edited a playable game using Claude.
Some, like X user @kimmonismus, criticized OpenAI for lagging behind in the race towards AGI, accusing the company of over-promising and under-delivering, especially when competitors like Anthropic roll out substantial features without much fanfare.
Remaining Challenges
Despite the widespread praise, Claude 3.5 Sonnet still has flaws. For instance, it struggles with simple cognitive tasks like playing tic-tac-toe and makes basic errors, such as misunderstanding simple math problems. Tech journalist Timothy B. Lee showcased a blunder where the model incorrectly assessed the value of coins.
Nonetheless, these issues are relatively minor, and Claude 3.5 Sonnet represents a significant advance for Anthropic and the development of LLMs as a whole. The continuous improvements in AI performance show no signs of slowing down, fueled by the ever-growing availability of computing resources like GPUs.