DataStax, Google partner to bring vector search to NoSQL AstraDB

6/12/2023

Reading time:3

This resource is based on an article originally published here.

DataStax is partnering with Google to bring vector search to its AstraDB NoSQL database-as-a-service in an attempt to make Apache Cassandra more compatible with AI and large language model (LLM) workloads.

Vector search, or vectorization, especially in the wake of generative AI proliferation, is seen as a key capability by database vendors as it can reduce the time required to train AI models by cutting down the need to structure data — a practice prevalent with current search technologies. In contrast, vector searches can read the required or necessary property attribute of a data point that is being queried.

“Vector search enables developers to search a database by context or meaning rather than keywords or literal values. This is done by using embeddings, for example, Google Cloud’s API for text embedding, which can represent semantic concepts as vectors to search unstructured datasets such as text and images,” DataStax said in a statement.

Embeddings can be seen as powerful tools that enable search in natural language across a large corpus of data, in different formats, and extract the most relevant pieces of data, Datastax said.

Vector databases are seen by analysts as a “hot ticket” item for 2023 as enterprises look for ways to reduce spending while building generative AI based applications.

AstraDB’s vector search accessible via Google-powered NoSQL copilot

Vector search along with other updates will be accessible inside AstraDB via a Google-powered NoSQL copilot that will also help DataStax customers build AI applications, the company said.

Under the hood, the NoSQL copilot combines Cassandra’s vector Search, Google Cloud’s Gen AI Vertex, LangChain, and GCP BigQuery.

“DataStax and GCP co-designed NoSQL copilot as an LLM Memory toolkit that would then plug into LangChain and make it easy to combine the Vertex Gen AI service with Cassandra for caching, vector search, and chat history retrieval. This then makes it easy for enterprises to build their own Copilot for their business applications and use the combination of AI services on their own data sets held in Cassandra,” said Ed Anuff, chief product officer at DataStax.

Plugging into LangChain, an open source framework aimed at simplifying the development of generative AI-powered applications using large language models, is made possible due to an open source library jointly developed by the two companies.

The library, dubbed CassIO, aims to make it easy to add Cassandra-based databases to generative AI software development kits (SDKs) such as LangChain.

Enterprises can use CassIO to build sophisticated AI assistants, semantic caching for generative AI, browse LLM chat history, and manage Cassandra prompt templates, DataStax said.

Other integrations with Google include the ability for enterprises using Google Cloud to import and export data from Cassandra-based databases into Google’s BigQuery data warehouse by using the Google Cloud Console for creating and serving machine learning based features.

A second integration with Google will allow AstraDB subscribers to pipe real-time data to and from Cassandra to Google Cloud services for monitoring generative AI model performance, DataStax said.

DataStax has also partnered with SpringML to help accelerate the development of generative AI applications using SpringML’s data science and AI service offerings.

Availability of vector search for Cassandra

AstraDB, built on Apache Cassandra, will arguably be one of the first to bring vector search to the open source distributed database. Currently, vector search for Cassandra is being planned for its 5.0 release, a post by the database community, where DataStax is a member, showed.

In terms of availability, AstraDB’s vector search presently can be used in non-production workloads and is in public preview, DataStax said, adding that the search will be initially available exclusively on Google Cloud and later extended to other public clouds.

Next read this:

Related Articles

migration

proxy

datastax

GitHub - datastax/zdm-proxy: An open-source component designed to seamlessly handle the real-time client application activity while a migration is in progress.

11/1/2024

migration

proxy

datastax

GitHub - datastax/zdm-proxy: An open-source component designed to seamlessly handle the real-time client application activity while a migration is in progress.

11/1/2024

cloud

kubernetes

datastax

DataStax Hyper-Converged Database: The Future of Data Infrastructure Is Here | DataStax

7/11/2024

cluster

troubleshooting

datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

4/3/2024

cassandra

langchain

llamaindex

GitHub - michelderu/chat-with-your-data-in-cassandra: Chat with your data stored in DataStax Enterprise, Astra DB and Apache Cassandra - In Natural Language!

3/26/2024

node

python

astra

GitHub - Anant/Cassandra.Api: Open Source Application for DataStax Astra

3/7/2024

astra

cassandra

datastax.astra

Vector Databases Compared - Evaluating DataStax Astra DB Serverless (Vector) and Pinecone Vector Database

2/4/2024

node

hybrid.cloud

datastax

GitHub - IBM/datastax-cassandra-clickstream: Use DataStax Enterprise built on Apache Cassandra as a clickstream database

12/8/2023

examples

cassandra

datastax

GitHub - datastaxdevs/workshop-betterreads: Clone of Good Reads using Spring and Cassandra

12/2/2023

examples

cassandra

datastax

NoSQL Database Built on Apache Cassandra | DataStax

12/2/2023

Explore Further

cassio

openai

llm

CassIO: The Best Library for Generative AI, Inspired by OpenAI | HackerNoon

6/12/2023

cassio

llm

cassandra

CassIO: The Best Library for Generative AI, Inspired by OpenAI | HackerNoon

6/10/2023

sstable

cassandra

spark

Spark and Cassandra’s SSTable loader

11/1/2024

analytics

cassandra

spark

GitHub - apache/cassandra-analytics: Apache cassandra

9/4/2024

datastax

migration

proxy

datastax

GitHub - datastax/zdm-proxy: An open-source component designed to seamlessly handle the real-time client application activity while a migration is in progress.

11/1/2024

migration

proxy

datastax

GitHub - datastax/zdm-proxy: An open-source component designed to seamlessly handle the real-time client application activity while a migration is in progress.

11/1/2024

cloud

kubernetes

datastax

DataStax Hyper-Converged Database: The Future of Data Infrastructure Is Here | DataStax

7/11/2024

cluster

troubleshooting

datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

4/3/2024

llm

cassandra

langchain

llamaindex

GitHub - michelderu/chat-with-your-data-in-cassandra: Chat with your data stored in DataStax Enterprise, Astra DB and Apache Cassandra - In Natural Language!

3/26/2024