Quantization compresses high‑dimensional float vectors into a smaller, approximate representation, where instead of storing every vector as a float32 or float64, it’s stored in compressed form, without too much of a compromise in search quality. Quantization is helpful when:
LanceDB currently exposes multiple quantized vector index types, including:
IVF_PQ – Inverted File index with Product Quantization (default). See the
vector indexing
page for examples on IVF_PQIVF_RQ – Inverted File index with RaBitQ quantization (binary, 1 bit per dimension). See
below
for examplesIVF_PQ is the default indexing option in LanceDB, coming with quantization by default, and works well in many cases. However, in cases where more drastic compression is needed, RaBitQ is also a reasonable option.
RaBitQ is a binary quantization method that represents each normalized embedding using 1 bit per dimension, plus a couple of small corrective scalars. In practice, a 1024‑dimensional float32 vector that would normally take 4 KB can be compressed to roughly a few hundred bytes with RaBitQ, while still maintaining reasonable recall.
When you build an IVF_RQ index:
At query time, the incoming embedding goes through the same transformation. Similarity is estimated using fast binary dot products plus the corrective factors, which restores much of the original metric quality—especially in higher dimensions.
Compared to IVF_PQ, RaBitQ:
For a deeper dive into the theory and some benchmark results, see the blog post: LanceDB’s RaBitQ Quantization for Blazing Fast Vector Search .
You can create an RaBitQ‑backed vector index by setting index_type="IVF_RQ" when calling create_index.
num_bits controls how many bits per dimension are used:
import lancedb
# Connect to LanceDB
db = lancedb.connect("/path/to/db")
table = db.open_table("my_table")
table.create_index(
vector_column_name="vector",
index_type="IVF_RQ",
num_bits=1,
)import * as lancedb from "@lancedb/lancedb";
async function createIvfRqIndex() {
// Connect to LanceDB
const db = await lancedb.connect("/path/to/db");
const table = await db.openTable("my_table");
await table.createIndex("vector", {
config: lancedb.Index.ivfRq({
numBits: 1,
}),
});
}
await createIvfRqIndex();use lancedb::{
connect,
index::{Index, vector::IvfRqIndexBuilder},
};
// Connect to LanceDB
let db = connect("/path/to/db").execute().await.unwrap();
let table = db.open_table("my_table").execute().await.unwrap();
table
.create_index(
&["vector"],
Index::IvfRq(IvfRqIndexBuilder::default().num_bits(1)),
)
.execute()
.await
.unwrap();1 bit is the classic RaBitQ setting, but you could (at higher
computational cost) set it to 2, 4 or 8 bits if you want to
improve the fidelity for better precision or recall.
It’s also possible to tune the number of IVF partitions in IVF_RQ,
similar to how you would do in IVF_PQ.
The full list of parameters to the algorithm
are listed below.
distance_type: Literal[“l2”, “cosine”, “dot”], defaults to “l2”num_partitions: Optional[int], defaults to Nonenum_bits: int, defaults to 1max_iterations: int, defaults to 50sample_rate: int, defaults to 256target_partition_size: Optional[int], defaults to None