LlamaIndex

Illustration

Quick start

You would need to install the integration via pip install llama-index-vector-stores-lancedb in order to use it. You can run the below script to try it out :

python
import logging
import sys

# Uncomment to see debug logs
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import SimpleDirectoryReader, Document, StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.lancedb import LanceDBVectorStore
import textwrap
import openai

openai.api_key = "sk-..."

documents = SimpleDirectoryReader("./data/your-data-dir/").load_data()
print("Document ID:", documents[0].doc_id, "Document Hash:", documents[0].hash)

## For LanceDB cloud :
# vector_store = LanceDBVectorStore( 
#     uri="db://db_name", # your remote DB URI
#     api_key="sk_..", # lancedb cloud api key
#     region="your-region" # the region you configured
#     ...
# )

vector_store = LanceDBVectorStore(
    uri="./lancedb", mode="overwrite", query_type="vector"
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)
lance_filter = "metadata.file_name = 'paul_graham_essay.txt' "
retriever = index.as_retriever(vector_store_kwargs={"where": lance_filter})
response = retriever.retrieve("What did the author do growing up?")

Checkout Complete example here - LlamaIndex demo

Filtering

For metadata filtering, you can use a Lance SQL-like string filter as demonstrated in the example above. Additionally, you can also filter using the MetadataFilters class from LlamaIndex:

python
from llama_index.core.vector_stores import (
    MetadataFilters,
    FilterOperator,
    FilterCondition,
    MetadataFilter,
)

query_filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="creation_date", operator=FilterOperator.EQ, value="2024-05-23"
        ),
        MetadataFilter(
            key="file_size", value=75040, operator=FilterOperator.GT
        ),
    ],
    condition=FilterCondition.AND,
)

For complete documentation, refer here . This example uses the colbert reranker. Make sure to install necessary dependencies for the reranker you choose.

python
from lancedb.rerankers import ColbertReranker

reranker = ColbertReranker()
vector_store._add_reranker(reranker)

query_engine = index.as_query_engine(
    filters=query_filters,
    vector_store_kwargs={
        "query_type": "hybrid",
    }
)

response = query_engine.query("How much did Viaweb charge per month?")

In the above snippet, you can change/specify query_type again when creating the engine/retriever.

API reference

The exhaustive list of parameters for LanceDBVectorStore vector store are :

Methods