Vector search on object storage: Performance at scale without the infrastructure tax

Most vector databases require three systems: raw data in a lake, metadata in a warehouse, embeddings in a vector index. Three copies of state. Three things to keep in sync.

LanceDB stores everything in one table on object storage. Compute nodes are stateless. No RAM constraints at scale.

Tomorrow's AI is being built on LanceDB today

Why teams switch

Compute-storage separation

Data lives on object storage at $0.02/GB/month. Compute scales with query load, not data size. 10 TB of data, one small query node during off-peak. No paying for idle capacity.

One table. Actual data

Embeddings, metadata, and raw blobs in the same table. Not links to S3. Blobs. Vector search, full-text search, and SQL filtering compose into a single query. No round trips.

Write a new column without rewriting the table

Adding a column doesn't rewrite existing data. Zero copy. New embedding model? Access control column? Test two models side by side? Column-level operations, not table-level rewrites.

IVF-based indexing

Inserts go to the appropriate partition without touching others. Deletes handled by lightweight bitmaps. MVCC for concurrent reads and writes. Index rebuilds happen asynchronously.

Comparison

Traditional Vector Database LanceDB
Cost RAM-bound. Replication for availability means 2-3x storage. Thousands/month at 100M vectors. Object storage at $0.02/GB/month. Only hot index in memory. Stateless compute.
Scale What has to stay hot? Index structures, caches, metadata, allocator overhead all in RAM. Large index persisted to disk. Lance format optimized for random access. Small hot set in memory.
Search Embeddings and metadata in vector DB. Raw docs/images in S3. Second call for originals. Vector, full-text, and SQL in one query against one table. Raw blobs stored inline.
Data model Three systems: lake for raw data, warehouse for metadata, vector DB for embeddings. One table. Embedding is a column. Metadata in other columns. Raw binary in another column.
Schema changes New column means rewriting every row. Full rewrite tax every time your app evolves. Column-level operations. Existing columns untouched. Zero copy.
Best for Sub-millisecond p99 at any cost. Small static datasets. Zero infrastructure decisions. Best cost-to-performance at scale. Teams that want to understand and control their infrastructure.

The Power of the Lance Format

Vector Search

  • Fast scans and random access from the same table — no tradeoff
  • Zero-copy access for high throughput without serialization overhead

Multi-Modal

  • Raw data, embeddings, and metadata in one table — not pointers to blob storage
  • No separate metadata store to keep in sync
Vector search on object storage: Performance at scale without the infrastructure tax

Enterprise-Grade Requirements

Security

Granular RBAC, SSO integration, and VPC deployment options.

Governance

Data versioning and time-travel capabilities for auditability.

Support

Dedicated technical account management and guaranteed SLAs.

noize

Talk to Engineering

Or try LanceDB OSS — same code, scales to Cloud.