Full-Text Search Index LanceDB Cloud and Enterprise provide performant full-text search based on BM25, allowing you to incorporate keyword-based search in your retrieval solutions. Note The create_fts_index API returns immediately, but the building of the FTS index is asynchronous. PythonTypeScript import lancedb # Connect to LanceDB db = lancedb.connect( uri="db://your-project-slug", api_key="your-api-key", region="us-east-1" ) table_name = "lancedb-cloud-quickstart" table = db.open_table(table_name) table.create_fts_index("text") import * as lancedb from "@lancedb/lancedb" const db = await lancedb.connect({ uri: "db://your-project-slug", apiKey: "your-api-key", region: "us-east-1" }); const tableName = "lancedb-cloud-quickstart" const table = openTable(tableName); await table.createIndex("text", { config: lancedb.Index.fts() }); Check FTS index status using the methods above. PythonTypeScript index_name = "text_idx" table.wait_for_index([index_name]) const indexName = "text_idx" await table.waitForIndex([indexName], 60) FTS Configuration Parameters LanceDB supports the following configurable parameters for full-text search: Parameter Type Default Description with_position bool True Store token positions (required for phrase queries) base_tokenizer str "simple" Text splitting method: - "simple": Split by whitespace/punctuation - "whitespace": Split by whitespace only - "raw": Treat as single token language str "English" Language for tokenization (stemming/stop words) max_token_length int 40 Maximum token size in bytes; tokens exceeding this length are omitted from the index lower_case bool True Convert tokens to lowercase stem bool False Apply stemming (e.g., "running" → "run") remove_stop_words bool False Remove common stop words ascii_folding bool False Normalize accented characters Note The max_token_length parameter helps optimize indexing performance by filtering out non-linguistic content like base64 data and long URLs When with_position is disabled, phrase queries will not work, but index size is reduced and indexing is faster ascii_folding is useful for handling international text (e.g., "café" → "cafe")