Join our daily and weekly newsletters for the latest updates on industry-leading AI coverage.
The rising popularity of large language models (LLMs) has sparked interest in embedding models, advanced deep learning systems that convert various data types into numerical forms. Embedding models play a crucial role in retrieval-augmented generation (RAG), a significant application of LLMs for businesses. The potential of embedding models, however, extends well beyond current RAG applications. Recent advances have been impressive, and 2024 is expected to bring even more breakthroughs.
How Embeddings Work
Embeddings work by transforming data, such as images or text documents, into numerical lists that capture their essential features. These models are trained on large datasets to identify relevant features that distinguish different data types.
For instance, in computer vision, embeddings can identify features like objects, shapes, colors, and visual patterns. In text applications, embeddings capture semantic details like concepts, locations, people, companies, and objects.
In RAG applications, embedding models encode the features of a company’s documents, storing them in a vector store—a specialized database for embeddings. When a new prompt arrives, the application calculates its embedding and retrieves documents from the vector store that have similar embeddings. The relevant document content is then incorporated into the prompt, enabling the LLM to generate responses based on that information. This method personalizes LLM responses using proprietary documents or information not part of their training data and helps mitigate issues like hallucinations, where LLMs generate incorrect information due to missing data.
Beyond Basic RAG
Although RAG has been a valuable addition to LLMs, the benefits of retrieval and embeddings extend beyond document matching.
Embeddings are primarily used for retrieval and visualizing concepts. But retrieval has broader applications beyond simple chatbot question-answering. It can be essential in various LLM use cases, allowing users to match LLM prompts to tasks such as SQL queries, extracting data, long-form generation, or automating workflows. Retrieval is a core step in enhancing LLMs with relevant context, and many enterprise LLM applications will require some form of retrieval.
Embeddings have applications beyond just document retrieval. For example, researchers at the University of Illinois at Urbana-Champaign and Tsinghua University recently used embedding models to lower the costs of training coding LLMs. They developed a method to pick the smallest, most diverse subset of a dataset needed to train the model effectively, thus maintaining high quality with fewer examples.
Embeddings for Enterprise Applications
Vector embeddings allow working with unstructured and semi-structured data. Semantic search—which RAG belongs to—is just one use case. Embeddings can also be used for other data types like images, audio, and video, with new multimodal transformers on the horizon making this possible.
Companies are exploring embedding models to analyze the vast amounts of unstructured data they generate. For instance, embeddings can help categorize millions of customer feedback messages or social media posts to identify trends, common themes, and shifts in sentiment.
Embeddings are perfect for enterprises needing to sift through massive data to spot trends and gain insights.
Fine-tuned Embeddings
2023 saw significant progress in fine-tuning LLMs with custom datasets, though it remains challenging. Currently, only a few companies with the necessary expertise and data are engaging in fine-tuning.
There will likely always be a progression from using RAG to fine-tuning, starting with the easier RAG and then optimizing with fine-tuning. More organizations are expected to engage in fine-tuning as open-source models improve, although fine-tuning remains more complex than utilizing RAG.
Fine-tuning embeddings has its own challenges, such as sensitivity to data shifts. Embeddings trained on short queries may not perform well on longer ones or may struggle with different types of questions. Strong in-house machine learning teams are usually needed for effective fine-tuning, making out-of-the-box solutions more practical for most enterprises.
However, recent advancements have made the training process for embedding models more efficient. For instance, a Microsoft study showed that pre-trained LLMs like Mistral-7B can be fine-tuned for embedding tasks with smaller datasets generated by a powerful LLM. This approach is simpler than traditional methods that require extensive manual work and costly data collection.
With the rapid advancements in LLMs and embedding models, we can expect many exciting developments in the coming months.