RabitQ quantization is now supported for vector indices. Full-text search latencies are now reduced significantly. Scalar indices are now supported for JSON columns with type-aware indexing. We improved the implementation of our KMeans algorithm to run ~30x faster than before, with even more gains at large k. This results in IVF-based vector indices being built more quickly.
Vector Index Enhancements
target_partition_size parameter for vector indices, making num_partitions optional with sensible defaults per index type (
lance#4616
)parent.child) for both scalar and vector indices (
lance#4682
)Full-Text Search Improvements
JSON Support
Storage and Performance
use_opendal=true storage option (
lance#4597
)Index Management
Metadata and Configuration
Version (
lance#4754
)LANCE_LOG_FILE environment variable for file logging (
lance#4721
)Java Bindings
merge_insert API (
lance#4685
)compact functionality (
lance#4703
)enable_stable_row_ids option to WriteParams (
lance#4674
)Operation::Update fields including fields_modified and new fields from
lance#4589
(
lance#4788
)Python Bindings
Session when opening datasets for cache reuse (
lance#3927
)open_session in Python (
lance#4581
)use_index option for merge insert operations (
lance#4688
)Format Changes – Version 2.1 Encoding
Bloom Filter Index
Operation::UpdateConfig and its Python/Java bindings now use different fields (backwards compatible when serialized) (
lance#4350
)MERGED state to MemWAL index; mark_mem_wal_as_flushed renamed to mark_mem_wal_as_merged (
lance#4673
)target_partition_size parameter added for vector indices, changing recommended configuration (
lance#4616
)"blfoat16" → "bfloat16" in datatypes.rs (
lance#4852
)bytes_read for zonemap scans (
lance#4830
)LanceFileWriter (
lance#4600
)FlatMatchQuery to support List of Utf8 (
lance#4742
)shallow_clone referring to wrong base path (
lance#4617
)rows_per_zone in zone map index (
lance#4692
)num_cpus::get() calls (
lance#4768
)ProjectionPlan generation for full schema (
lance#4743
)FilteredRead for better limit query performance (
lance#4798
)lance_table::format::Index to IndexMetadata to avoid confusion (
lance#4760
)performance.md for index cache section (
lance#4738
)Full-Text Search runs dramatically faster, and index creation is now easier to manage.
LanceDB’s Full-Text Search now delivers dramatically faster performance and more relevant results. Complex queries with 50-100 terms run 3-8x faster with optimized algorithms, while improved caching and scoring ensure higher quality results at interactive speeds.
Long query optimization: - Complex queries with 50-100 terms now execute 3-8x faster through WAND algorithm improvements. lance#4576
Smart query execution: Automatic fallback between WAND and flat search based on selectivity, with configurable thresholds and block-level pruning to skip unpromising document blocks. lance#4551 , lance#4570
Precision-Ranked Results: Benefit from more relevant search rankings thanks to fixes in our BM25 scoring algorithm, ensuring the best answers surface first. lance#4525
Performance telemetry removal: Eliminated 80% overhead from hot paths, achieving 4-5x speedup. lance#4536
Paginated search results: Support for limit and offset in both vector and full-text search queries.
Custom index names: Users can now set custom names for indices via API and SQL. Setting train=false disables training for certain index types.
Empty index creation: Introduced the ability to create empty scalar indices, enabling users to define index structures upfront without initial data. lance#4033
More storage control: Added new configuration options for object store caching and behavior. lance#4509 #enterprise
Configurable timeouts: Set an overall request timeout for remote clients in Python and Node.js SDKs. This enhancement gives users more robust control over request reliability and latency. lancedb#2550
Streaming for large operations: Process massive insert, merge, and create_table requests without loading the entire dataset into memory.
Empty projection support: Introduced support for empty projections in queries, allowing users to execute queries that return no columns. This enhancement is particularly useful for operations like insert or delete, where the focus is on modifying data without retrieving any results.
Optimized remote queries: Improved performance for indexed queries by optimizing cache use and enabling remote filtered reads.
Scalar index prewarming: Preload frequently accessed scalar indices into memory on server startup for faster cold starts. #enterprise
Enhanced Merge Insert Observability: Introduced explain_plan and analyze_plan functions for merge_insert operations, enabling users to visualize and assess the execution plan and performance metrics of merge operations.
lance#4295
Detailed CPU metrics: The analyze_plan() output now includes cumulative CPU time for each operator.
lance#4519
Granular FTS controls: New tuning knobs and metrics for full-text search provide deeper insight and predictable performance. lance#4555 , lance#4560
Automatic conflict resolution for delete: Delete operations now handle conflicts gracefully to ensure data integrity.
lance#4407
Expression depth limit: Implemented a safeguard to prevent panics from excessively deep filter expressions. lance#4403
FTS Cache Performance: Fixed a critical memory sizing issue that was causing premature cache evictions, increasing cache hit rates from ~5% to over 80% for dramatically faster repeated queries. lance#4513
File Reading Stability: Resolved a panic that could occur when attempting to read from a file after its last row had been deleted. #lance4452
Search Cache Integrity: Fixed cache conflicts between different data partitions that could lead to inconsistent search results and degraded performance. #lance4490
Accurate Performance Metrics: Corrected inaccurate elapsed time reporting for IVF index nodes, ensuring reliable performance monitoring. #lance4491
Avoided column name collision in merge_insert: Prevented potential column name conflicts in merge_insert operations by renaming an internal column, ensuring smooth data ingestion.
#lance4499
Fixed index out-of-bounds error in posting iterator: Fixed an index out-of-bounds error in the posting list iterator that could cause crashes during vector search queries. #lance4587
Fixed BTree Prewarm Offset Overflow: Resolved an offset overflow issue when prewarming BTree indices, preventing crashes during startup.
Azure Cache Isolation: Fixed a critical bug where databases with identical names in different Azure storage accounts were sharing a cache, preventing potential data corruption. #enterprise
Fixed incorrect boolean filter results: Fixed bugs in negative filters (NOT EQUAL) that could cause missing rows or duplicates, ensuring accurate query results.
Fixed performance regression in indexed point lookups: Resolved a regression that caused efficient indexed point lookups to incorrectly fall back to slow full table scans.
Performance improvements across vector search and indexing and enhanced Cloud UI.
HNSW-Accelerated Partition Computation: Partition computation is now accelerated with HNSW (Hierarchical Navigable Small World), cutting end-to-end indexing time by up to 50%. The optimization maintains high recall while significantly reducing CPU and memory usage during index creation. lance#4089
Up to 500× Faster Range Queries: Range queries like “value >= 1000 and value < 2000” on 1M int32 values now execute in 100µs instead of 50ms, dramatically boosting hybrid search performance. lance#4248
Faster L2 Distance Computation: >10% speedup in vector search by optimizing common-dimension batch L2 operations. lance#4321
B-tree Index Prewarm: Frequently accessed index pages are now proactively cached in memory, improving query latency. lance#4235
Faster Merge Insert Updates: Improved update-only operations with optimized join strategy—speeding up data merges with conditional logic. lance#4253
Improved Cloud Load Balancing: Better tenant isolation and fault tolerance across query nodes.
Streaming Ingestion with Automatic Index Optimization: Automatic index updates during streaming ingestion for consistent performance. No block on other operations on the table, such as compaction.
Storage Handle Reuse: Reduced overhead for bulk table creation by fixing excessive object store handle creation. lancedb#2505
GCP Autoscaling Support: Enabled autoscaling in GCP deployments to automatically adjust resources based on demand, ensuring optimal performance and cost efficiency for customer’s workloads. #enterprise
Session-Based Cache Control: Python and TypeScript users can now customize caching behavior per session—ideal for large datasets and enterprise deployments. lancedb#2530 . Specifically:
Automatic Conflict Resolution for Updates: Update operations now support retries with exponential backoff to handle concurrent writes. lance#4167
Multi-Vector Support (JavaScript): Added multivector support to the JavaScript/TypeScript SDK. lancedb#2527
Ngram Tokenizer for FTS: Flexible tokenization for full-text search, supporting languages and use cases with partial or fuzzy matches. lancedb#2507
Index Creation Stability: Fixed errors when entire FTS posting lists were deleted. lance#4156
Token Set Remapping: Ensures proper index consistency when updating FTS data. lance#4180
Phrase Query Precision Fix: Addressed floating point precision issues to avoid missed results; also fixed decompression edge cases. lance#4223
Phrase Query Error Message Fix: Returns more informative error when phrase queries lack position support. lance#4342
B-tree Redundant Page Loads: Eliminated duplicate page loads for better scalar index performance. lance#4246
Filtered Read Pagination Fix: Respects offset/limit for pagination even when rows are deleted.
lance#4351
Schema Alignment with Missing Columns: Fixed a Node.js bug where schema alignment would fail when using embedding functions with Arrow table inputs that had missing columns. lancedb#2516
Python nprobes Fix: Resolves validation errors when setting both min and max nprobes. lancedb#2556
Empty List Table Creation Fix: Fixed crashes when creating tables from empty lists with predefined schemas.
Dataset Version Race Condition: Prevents version rollbacks during concurrent queries. lancedb#2479
Case-Insensitive Filter Comparison Fix: Ensures accurate matching for string filters regardless of text case. lance#4278
More advanced features added to Full-text Search and optimized BYOC deployment.
Full-Text Search (FTS) Enhancements:
Expanded FTS capabilities with:
SHOULD, MUST, and MUST_NOT for expressive, intuitive search. (Python users can also use AND/OR or &/|.)slop parameter, allowing matches where terms are close together but not necessarily adjacent or in exact order, enabling typo-tolerant and flexible phrase search.Native Helm Chart Support:
Added native Helm chart deployment for Kubernetes, streamlining BYOC (Bring Your Own Cloud) deployments and improving infrastructure management. #enterprise
KNN Scan Pushdown Optimization:
Improved vector search performance and reduced memory usage by supporting KNN scan pushdown. #enterprise
Query Resource Limits:
Introduced concurrent request limits and scan row constraints to prevent resource exhaustion and maintain system stability under high load. #enterprise
Improved Vector Search with Selective Filters:
Split nprobes into minimum_nprobes and maximum_nprobes for more efficient vector search. The system starts with minimum_nprobes and increases up to maximum_nprobes if not enough results are found.
lancedb#2430
Cloud Guardrails:
Enforced API payload limits (100MB) to prevent heavy workloads from degrading cloud service quality, with extra checks on merge_insert to avoid introducing large workloads.
Embedding Function Error with Existing Vector Column:
Fixed a TypeScript SDK error when adding data that already includes the vector column and a registered embedding function is present.
lancedb#2433
create_table Errors with Existing Tables:
Fixed errors when using create_table with mode=overwrite or exists_ok=true on an existing table.
Indexing Skipped with Certain Compaction Configurations:
Fixed an issue where indexing criteria were not included in lance_agent, causing the index to not be created as expected under certain compaction settings.
Failed Login After Changing Account:
Fixed a login failure that occurred when a user signed up for LanceDB Cloud, dropped out, and then rejoined an organization with the same email via an invite.
Column Disordering in KNN Scanning:
Fixed an error in plans that union indexed and unindexed data, where the KNNScan node returned data in a different order than its output schema. #enterprise
Divide-by-Zero on Empty Table:
Fixed an issue where creating an index failed on an empty table, either after deleting the last row or when creating an index on an already empty table.
Revamped LanceDB Cloud onboarding, added Umap visualization and improved performance for upsert
merge_insert for better control over long-running upserts
\[lancedb#2378\]
grpc.concurrency_limit_per_connection setting in the plan executor for fine-grained control.large_binary column: Users can now filter on large binary columns in their queries.
\[lance#3797\]
LABEL_LIST columns, ensuring scalar indices are updated correctly on data changes.merge_insert: Any index fragments associated with modified data are now properly removed during a horizontal merge_insert, preventing index corruption and ensuring indices always reflect the current state of the data.
\[lancedb#3863\]
Enhanced Performance and Improved Version Control
table.tags.create/list/update/delete/checkout: Enables semantic versioning through intuitive tagging instead of numeric versioningwait_for_index: Ensures complete data indexing with configurable wait_timeout.k parameters to prevent overflow.
\[lancedb#2354\]
BETWEEN clause: Improved BETWEEN query handling to return 0 results when start > end instead of panicking.
\[lance#3706\]
Enhanced Full-Text Search and Advanced Query Debugging Features
explain_plan: Diagnose query performance and debug unexpected results by inspecting the execution plan.analyze_plan: Analyze query execution metrics to optimize performance and resource usage. Such metrics include execution time, number of rows processed, I/O stats, and more.restore: Revert to a specific prior version of your dataset and modify it from a verified, stable state.datetime columns, which previously prevented data inspection.Multivector Search ready and Table data preview available in Cloud UI
Drop_index added to SDK: users can remove unused or outdated indexes from your tables.prefilter parameter enforcement: Fixed a bug where the prefilter parameter was not honored in FTS queries.distance_range() compatibility: Fixed errors when performing vector searches with distance_range() on unindexed rows.Support Hamming Distance and GPU based indexing ready
hamming as a distance metric (joining l2, cosine, dot) for binary vector similarity search.list_indices and index_stats now always fetch the latest version of the table by default unless a specific version is explicitly provided.create_index is called to create a vector index on tables with fewer than 256 rows.createTable() failed to correctly save embeddings and for mergeInsert not utilizing saved embeddings.
lancedb#2065
Performant SQL queries at scale and more cost-effective vector search
distance_range() to return search results with a lowerbound, upperbound or a range
\[lance#3326\]
.checkout API: Resolved inconsistencies in the checkout method when specifying the version parameter.
lancedb#1988