Meta initiates trials of a multimodal AI competitor to GPT-4V in Ray-Ban smart glasses

Subscribe to our daily and weekly newsletters to get the latest updates and exclusive content on industry-leading AI coverage.

Today, Meta Platforms, the parent company of Facebook, Instagram, WhatsApp, and Oculus VR, announced more exciting news. Following the release of their new voice cloning AI, Audiobox, the company revealed that it would start a small trial in the U.S. for a new multimodal AI designed to work with their Ray-Ban Meta smart glasses. These glasses are developed in partnership with the eyewear brand Ray-Ban.

According to a video posted on Instagram by Meta’s chief technology officer Andrew Bosworth, the new multimodal AI is scheduled for a public launch in 2024. He explained that this AI assistant would not just answer your questions but also provide information about the world around you by using the camera on the glasses. The trial for this AI will be conducted via an early access program starting this week in the U.S., although Bosworth did not mention how to participate in this program.

The latest version of these glasses, introduced at Meta’s annual Connect conference in Palo Alto last September, starts at a price of $299. The current models already come with a built-in AI assistant but have limitations. They can’t respond intelligently to video or photos, nor to live views captured by the glasses’ built-in cameras. The existing assistant is controlled by voice, similar to Amazon’s Alexa or Apple’s Siri.

In his Instagram post, Bosworth demonstrated one of the new capabilities of the multimodal version by showing himself wearing the glasses and looking at a piece of wall art depicting California. He also held a smartphone, indicating that the AI might need to be paired with a phone to operate. The AI successfully identified the art as a “wooden sculpture” and described it as “beautiful.”

Meta’s CEO Mark Zuckerberg shared a video showing him using the new multimodal AI with the Ray-Ban Meta smart glasses. The AI could perform several tasks, such as describing a shirt and suggesting matching pants, writing a witty caption about a dog in a costume, identifying a fruit he was holding, and translating text from a meme from Spanish to English.

This move aligns with Meta’s broad adoption of AI across its products and platforms, including its push for open-source AI with its LLM Llama 2. However, it’s intriguing that Meta’s first multimodal AI effort is appearing in a device rather than as an open-source model online.

The integration of generative AI into hardware has been gradual, led by smaller startups like Humane with its “Ai Pin” running OpenAI’s GPT-4V. OpenAI, on the other hand, has offered GPT-4V, its multimodal AI, through its ChatGPT app for iOS and Android, accessible with a ChatGPT Plus or Enterprise subscription.

This development also recalls Google’s earlier experiment with Google Glass, a smart glasses prototype from the 2010s that faced criticism for its design and limited practical use despite initial excitement.

Will Meta’s new multimodal AI for Ray-Ban Meta smart glasses overcome the challenges faced by Google Glass? Has the public’s perception shifted enough to embrace a product that includes a camera on your face? Only time will tell.

Related Posts

Exclusive: Amazon AWS Strives to Eclipse Microsoft with Advanced AI Innovations at Re:Invent

“Exclusive: Voltron Data Enhances AI Capabilities with Theseus Distributed Query Engine”

Sam Altman’s Comeback at OpenAI Emphasizes the Critical Need for Trust and Inclusivity

AI21 Secures Additional $53 Million to Compete with OpenAI in Enterprise Generative AI