This section provides answers to the most common questions asked about LanceDB Cloud. By following these guidelines, you can ensure a smooth, performant experience with LanceDB Cloud.
Yes! It is recommended to establish a single database connection and maintain it throughout your interaction with the tables within.
LanceDB uses HTTP connections to communicate with the servers. By reusing the Connection object, you avoid the overhead of repeatedly establishing HTTP connections, significantly improving efficiency.
Table object?For optimal performance, table = db.open_table() should be called once and used for all subsequent table operations.
If there are changes to the opened table, the table will always reflect the latest version of the data.
We support IVF_PQ and IVF_HNSW_SQ as the index_type which is passed to create_index.
LanceDB Cloud tunes the indexing parameters automatically to achieve the best tradeoff
between query latency and query quality.
create_index()? Does creating an index too early cause unbalanced indices?create_index is asynchronous. LanceDB, in the background, will determine when to
trigger the index build job. When there are updates to the table data, we will optimize
the existing indices accordingly so that query performance is not impacted.
No! LanceDB Cloud triggers an asynchronous background job to index the new vectors.
Even though indexing is asynchronous, your vectors will still be immediately searchable.
LanceDB uses brute-force search to search over unindexed rows. This makes your new data
immediately available but may increase latency temporarily.
To disable the brute-force part of search, set the fast_search flag in your query to true.
No! Similar to adding data to the table, LanceDB Cloud triggers an asynchronous background job to update the existing indices. Therefore, no action is needed from users and newly updated data will be available for search immediately. There is absolutely no downtime expected.
No! LanceDB will automatically optimize the FTS index for you. Meanwhile, newly updated data will be available for search immediately.
This applies to scalar indices as well.
While LanceDB Cloud indexes are typically created quickly, best practices differ between index types:
Full-Text Search (FTS) and Scalar Indexes
Queries executed immediately after create_fts_index or create_scalar_index calls
may fail if the background indexing process hasn’t completed.
Wait for index confirmation before querying.
Vector Indexes
Queries after create_index will not generate errors,
but may experience degraded performance during ongoing index optimization.
For consistent performance, wait until indexing finishes.
It’s recommended to use list_indices to verify index creation before querying. As an alternative, you can check the table details
in the UI, where the existing indices will be displayed.
You can call index_stats with the index name to check the number of
indexed and unindexed rows.
It is strongly recommended to create scalar indices on the filter columns. Scalar indices
will reduce the amount of data that needs to be scanned and thus speed up the filter.
LanceDB supports BITMAP, BTREE, and LABEL_LIST as our scalar index types. You
can see more details
here
.
LanceDB implements an optimization algorithm to decide whether a delta index will be appended versus a full retrain on the index is needed.
Yes! LanceDB supports blazing-fast vector search with metadata filtering. Both prefiltering (default) and postfiltering are supported. We have seen 30ms as the p50 latency for a dataset size of 15 million. You can see here for more details.
id?LanceDB Cloud currently does not support an ID or primary key column. You are recommended to add a user-defined ID column. To significantly improve the query performance with SQL clauses, a scalar BITMAP/BTREE index should be created on this column.
Multiple factors can impact query latency. To reduce query latency, consider the following:
weak_read_consistency_interval_seconds parameter on the query node to trade off
between read consistency and query performance.fast_search work?If you do not need to query from the unindexed data, you can call fast_search to
make queries faster, with the unindexed data excluded.