Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more.
Real-time database vendor Rockset is enhancing the AI capabilities of its database with improved vector search and scalability.
Rockset’s origins trace back to the open-source RocksDB key-value store created at Meta (formerly Facebook). The evolution of this technology supports Rockset’s real-time indexing capabilities. The company has secured $105 million in funding, including a $44 million round announced in August.
With the new update, Rockset is advancing into the generative AI arena, offering vector search as part of its real-time database platform. Initially previewed in April, the vector search feature has been refined and is now fully available. JetBlue is one of the early adopters, providing insights into their use of Rockset. Additionally, Rockset is integrating with popular tools like LangChain for AI orchestration and LlamaIndex for data framework.
Venkat Venkataramani, co-founder and CEO of Rockset, explains that their vector search capability has reached general availability (GA), allowing users to build similarity indexes using approximate nearest neighbor (ANN) at scale with real-time updates on vector embeddings and metadata.
The market for vector search capabilities has become very competitive in 2023. Vectors, numerical representations of data, are essential for powering large language models (LLMs). New specialized vector databases like Pinecone and Milvus have emerged, complementing existing technologies such as DataStax, MongoDB, and Neo4j that now support vector embeddings.
Rockset differentiates itself as a real-time database. Venkataramani highlights that new data in a Rockset database leads to updates in the database index and vector embeddings in real time, with latency in the single-digit milliseconds. Rockset’s compute-compute separation approach separates the resources for building indexes from those used for queries, enhancing real-time data indexing and query performance.
Unlike other vector databases that require periodic index rebuilding, Rockset supports real-time updates.
There are various methods to enable vector search, including approximate nearest neighbor (ANN) and the more precise K Nearest Neighbor (KNN). While KNN finds the exact top similar results, it is computationally intensive, especially for large datasets. ANN, on the other hand, returns results that are close enough and less computationally demanding. Rockset uses both KNN and ANN based on the query and data. The query optimizer in Rockset determines the best approach to provide the fastest results.
Real-time updates to vector embeddings are a core feature of Rockset, allowing ANN indexes to reflect the latest data within milliseconds.
Despite recent advancements in AI by OpenAI, such as the GPT builder and assistants API, Venkataramani believes that vector databases will continue to be essential, particularly for larger enterprise applications and generative AI use cases. He argues that many large companies have security and compliance requirements that prevent them from sending all their data to third parties for chatbot development. Venkataramani sees ongoing demand for vector database capabilities to power Retrieval Augmented Generation (RAG) and other use cases, including similarity search at scale.
In summary, Venkataramani believes that while the use cases for vector databases may evolve, their necessity will persist for developing AI applications.
Stay in the know! Get the latest news delivered to your inbox daily. Subscribe by agreeing to VentureBeat’s Terms of Service.