Subscribe to our daily and weekly newsletters for the latest updates and exclusive content on cutting-edge AI developments.
Nous Research, a private research group renowned for its work in the large language model (LLM) field, recently introduced a new vision-language model called Nous Hermes 2 Vision. This model, available on Hugging Face, is a lightweight, open-source extension of their previous OpenHermes-2.5-Mistral-7B model. It includes advanced vision capabilities, enabling it to interpret images and extract text from visual content.
However, shortly after its launch, users observed that the model had higher-than-expected rates of hallucinations, causing various glitches. This led to a rebranding of the project to Hermes 2 Vision Alpha. A more stable version is anticipated, promising similar benefits with fewer issues.
The Nous Hermes 2 Vision Alpha, named after the Greek messenger deity Hermes, is engineered to navigate the complexities of human communication with impressive finesse. It seamlessly integrates image data provided by users with its accumulated knowledge to deliver detailed answers in natural language. For example, it can analyze a photo of a burger to determine its health implications.
Although ChatGPT, based on GPT-4V, also allows image prompting, Nous Hermes 2 Vision stands out due to two significant enhancements. First, it employs SigLIP-400M instead of the larger 3B vision encoders commonly used, making the model more streamlined and efficient in handling vision-language tasks. Second, it has been trained on a specialized dataset with function calling, which enables users to prompt the model with a
Despite these advancements, early experiences with the model revealed some persistent issues. The co-founder expressed concerns about the model’s tendency to hallucinate and produce errors shortly after its release. Consequently, the model was designated as an alpha release. Quan Nguyen, the research fellow leading AI initiatives at Nous, acknowledged these problems and promised an updated version by the end of the month to address them.
Even though some issues remain unresolved, Nguyen noted that the function calling feature continues to perform well, provided a good schema is used. He also mentioned the potential for a dedicated model for function calling if user feedback proves favorable.
To date, Nous Research has developed 41 open-source models across various architectures and capabilities, as part of its Hermes, YaRN, Capybara, Puffin, and Obsidian series.
Stay informed with the latest news in AI by subscribing to our daily updates.