Galileo, a pioneer in enterprise generative AI, has introduced Galileo Luna, an innovative suite of Evaluation Foundation Models (EFMs) set to revolutionize how companies evaluate their GenAI systems. With Luna, Galileo aims to tackle the major hurdles of speed, cost, and accuracy that have prevented generative AI from being widely adopted in production settings.
Luna was developed to overcome the limitations of existing GenAI evaluation methods, which often tend to be slow, costly, and inaccurate. Vikram Chatterji, Co-Founder and CEO of Galileo, explained that the creation of Luna was driven by the need for evaluations in production environments that are quick, cost-effective, and highly accurate.
Luna’s development represents a significant achievement for Galileo, which has been leading the enterprise GenAI space since it started in early 2021. The nearly year-long intensive R&D effort highlights the company’s commitment to advancing AI evaluation.
Galileo Luna, with its suite of EFMs, surpasses current AI evaluation methods. In benchmark tests measuring accuracy with the AUROC score, Luna achieved a score of 0.78, outperforming competitors like GPT-3.5, Trulens Groundedness, and RAGAS Faithfulness.
The key to Luna’s innovation is its purpose-built small language models, specifically designed for tasks like hallucination detection, context quality assessment, and data leakage prevention. This specialized approach allows Luna to excel in speed, cost, and accuracy. Chatterji explained that Luna’s tailored language models not only reduce computational overhead and cost significantly but also enable evaluations that are much faster and cheaper than those with GPT-3.5.
Luna also stands out for its accuracy, surpassing other methods by up to 20% in detecting issues such as hallucinations and prompt injections. Its design includes multi-headed small language models and advanced techniques like intelligent chunking, ensuring it maintains context better and offers more precise evaluations.
When comparing monthly costs for evaluating one million queries, Luna is far more cost-effective than other methods. For instance, Luna costs just $175 per month, significantly undercutting GPT-3.5 at $6,248, RAGAS Faithfulness at $7,994, and Trulens Groundedness at $16,641.
One of Luna’s most remarkable features is its ability to function without traditional ground truth datasets, thanks to pre-trained evaluation models fine-tuned on various domain-specific datasets. This eliminates the need for creating custom test sets, streamlining the evaluation process and reducing reliance on extensive human-generated data.
Luna is relevant to industries requiring high reliability and speed in AI evaluations, such as healthcare, finance, and telecommunications. It particularly benefits large-scale enterprise applications involving millions of queries per month.
Luna also delivers exceptional speed in AI evaluation, with latency of just 0.232 seconds per query, a significant improvement over other methods like GPT-3.5 and Galileo Chainpoll. This makes Luna up to 11 times faster than competing approaches.
Additionally, Luna can be customized to meet specific customer needs with Galileo’s Fine Tune product, achieving accuracy levels of 95% or higher for crucial tasks in sectors like pharmaceuticals and financial services.
As the generative AI landscape rapidly evolves, Galileo remains committed to staying at the forefront of innovation. Chatterji stated that Luna will expand in three key areas: supporting more evaluation task types, continually improving accuracy, and further reducing costs and latency.
With Luna, Galileo has strengthened its position as a leader in enterprise GenAI evaluation. As more organizations seek to leverage generative AI, Luna’s ability to provide fast, cost-effective, and accurate evaluations will be key to driving widespread adoption and maximizing the potential of this transformative technology.