Join our daily and weekly newsletters for the latest updates and exclusive content on leading AI coverage. Learn More
DataStax is making it simpler for developers to build generative AI retrieval augmented generation (RAG) applications with a new data API available now.
DataStax is a key player behind the open-source Apache Cassandra database, which forms the base of its AstraDB cloud database service. In 2023, like many other database providers, DataStax added vector database capabilities to its platform. At a recent event, the CEO of DataStax boldly claimed that Cassandra is the best database for generative AI.
Vector database capabilities are essential for RAG applications, combining large language models (LLMs) with data platforms to deliver highly accurate and tailored results.
Since July 2023, AstraDB users have had access to vector capabilities, but they needed to use the Cassandra Query Language (CQL) to query data. The new data API changes this, allowing developers to use Python and JavaScript to access the database. This narrows the gap between DataStax and purpose-built vector databases like Pinecone, which recently updated its platform with serverless database functionality.
Register for VB Transform On-Demand Access
In-person passes for VB Transform 2024 are sold out, but you can still register for exclusive on-demand access post-conference. Learn More
There has been a divide between vector-only databases and hybrid databases with robust query models. DataStax aims to bridge this gap with the new data API.
The new data API doesn’t introduce new vector capabilities to AstraDB but makes the development process easier. It reduces the mismatch between developers’ needs and the database’s offerings. Since July 2023, about half of the new users on AstraDB are focused on building generative AI applications. The challenge had been that developers couldn’t easily use familiar programming languages like Python and JavaScript for these tasks.
Previously, developers had to use CQL, which required more in-depth data modeling knowledge. The new API simplifies this by handling vectorization automatically, offering a more straightforward interface in Python and JavaScript. It optimizes performance by efficiently storing and indexing vector data at the database level, lowering the learning curve and enhancing performance.
APIs are crucial
Traditional database APIs often translate code from languages like Python or JavaScript into the database’s query language, similar to how Object Relational Mappers (ORMs) work. The DataStax data API differs since Cassandra’s architecture supports high-performance primitives that facilitate various query patterns. This connection at a deeper layer enhances overall query performance.
The data API presents a simple JSON-based data format, making it easy for developers to send and retrieve data efficiently. This approach ensures that performance is maintained at the database’s storage level.
Speeding up vectors with JVector engine
A significant component of DataStax’s vector database enhancement is the JVector search engine within AstraDB. JVector, an open-source embedded vector search engine crafted by DataStax, uses the DiskANN algorithm. DiskANN optimizes storage for the ANN (approximate nearest neighbor search) algorithm, outpacing other algorithms at large storage and distribution scales.
The JVector engine enables AstraDB to achieve superior relevancy and recall compared to other vector databases. DataStax is committed to open-sourcing its vector work, including JVector and the data API, benefiting both the Cassandra community and AstraDB customers.
DataStax is dedicated to supporting the open-source ecosystem while ensuring that developers find it easy to choose the right cloud service.