Full-Text Search Index

LanceDB Cloud and Enterprise provide performant full-text search based on BM25, allowing you to incorporate keyword-based search in your retrieval solutions.

Note

The create_fts_index API returns immediately, but the building of the FTS index is asynchronous.

PythonTypeScript

import lancedb

# Connect to LanceDB
db = lancedb.connect(
  uri="db://your-project-slug",
  api_key="your-api-key",
  region="us-east-1"
)

table_name = "lancedb-cloud-quickstart"
table = db.open_table(table_name)
table.create_fts_index("text")

import * as lancedb from "@lancedb/lancedb"

const db = await lancedb.connect({
  uri: "db://your-project-slug",
  apiKey: "your-api-key",
  region: "us-east-1"
});

const tableName = "lancedb-cloud-quickstart"
const table = openTable(tableName);
await table.createIndex("text", {
  config: lancedb.Index.fts()
});

Check FTS index status using the methods above.

PythonTypeScript

index_name = "text_idx"
table.wait_for_index([index_name])

const indexName = "text_idx"
await table.waitForIndex([indexName], 60)

FTS Configuration Parameters

LanceDB supports the following configurable parameters for full-text search:

Parameter	Type	Default	Description
with_position	bool	True	Store token positions (required for phrase queries)
base_tokenizer	str	"simple"	Text splitting method: - "simple": Split by whitespace/punctuation - "whitespace": Split by whitespace only - "raw": Treat as single token
language	str	"English"	Language for tokenization (stemming/stop words)
max_token_length	int	40	Maximum token size in bytes; tokens exceeding this length are omitted from the index
lower_case	bool	True	Convert tokens to lowercase
stem	bool	False	Apply stemming (e.g., "running" → "run")
remove_stop_words	bool	False	Remove common stop words
ascii_folding	bool	False	Normalize accented characters

Note

The max_token_length parameter helps optimize indexing performance by filtering out non-linguistic content like base64 data and long URLs
When with_position is disabled, phrase queries will not work, but index size is reduced and indexing is faster
ascii_folding is useful for handling international text (e.g., "café" → "cafe")