Schema evolution enables non-breaking modifications to a database table’s structure — such as adding columns, altering data types, or dropping fields — to adapt to evolving data requirements without service interruptions. LanceDB supports ACID-compliant schema evolution through granular operations (add/alter/drop columns), allowing you to:
- Iterate Safely: Modify schemas in production with versioned datasets and backward compatibility
- Scale Seamlessly: Handle ML model iterations, regulatory changes, or feature additions
- Optimize Continuously: Remove unused fields or enforce new constraints without downtime
Schema Evolution Operations
LanceDB supports three primary schema evolution operations:
- Adding new columns: Extend your table with additional attributes
- Altering existing columns: Change column names, data types, or nullability
- Dropping columns: Remove unnecessary columns from your schema
Adding New Columns
You can add new columns to a table with the
add_columns
method in Python or
addColumns
in TypeScript/JavaScript.
New columns are populated based on SQL expressions you provide.
Setting Up the Example Table
First, let’s create a sample table with product data to demonstrate schema evolution:
table_name = "schema_evolution_add_example"
data = [
{
"id": 1,
"name": "Laptop",
"price": 1200.00,
"vector": np.random.random(128).tolist(),
},
{
"id": 2,
"name": "Smartphone",
"price": 800.00,
"vector": np.random.random(128).tolist(),
},
{
"id": 3,
"name": "Headphones",
"price": 150.00,
"vector": np.random.random(128).tolist(),
},
{
"id": 4,
"name": "Monitor",
"price": 350.00,
"vector": np.random.random(128).tolist(),
},
{
"id": 5,
"name": "Keyboard",
"price": 80.00,
"vector": np.random.random(128).tolist(),
},
]
table = db.create_table(table_name, data, mode="overwrite")
const tableName = "schema_evolution_add_example";
const data = [
{
id: 1,
name: "Laptop",
price: 1200.0,
vector: Array.from({ length: 128 }, () => Math.random()),
},
{
id: 2,
name: "Smartphone",
price: 800.0,
vector: Array.from({ length: 128 }, () => Math.random()),
},
{
id: 3,
name: "Headphones",
price: 150.0,
vector: Array.from({ length: 128 }, () => Math.random()),
},
{
id: 4,
name: "Monitor",
price: 350.0,
vector: Array.from({ length: 128 }, () => Math.random()),
},
{
id: 5,
name: "Keyboard",
price: 80.0,
vector: Array.from({ length: 128 }, () => Math.random()),
},
];
const table = await db.createTable(tableName, data, { mode: "overwrite" });
Adding Calculated Columns
You can add new columns that are calculated from existing data using SQL expressions:
# Add a discounted price column (10% discount)
table.add_columns({"discounted_price": "cast((price * 0.9) as float)"})
// Add a discounted price column (10% discount)
await table.addColumns([
{ name: "discounted_price", valueSql: "cast((price * 0.9) as float)" },
]);
Adding Columns with Default Values
Add boolean columns with default values for status tracking:
# Add a stock status column with default value
table.add_columns({"in_stock": "cast(true as boolean)"})
// Add a stock status column with default value
await table.addColumns([
{ name: "in_stock", valueSql: "cast(true as boolean)" },
]);
Adding Nullable Columns
Add timestamp columns that can contain NULL values:
# Add a nullable timestamp column
table.add_columns({"last_ordered": "cast(NULL as timestamp)"})
// Add a nullable timestamp column
await table.addColumns([
{ name: "last_ordered", valueSql: "cast(NULL as timestamp)" },
]);
cast(NULL as timestamp)
.
Altering Existing Columns
You can alter columns using the
alter_columns
method in Python or
alterColumns
in TypeScript/JavaScript. This allows you to:
- Rename a column
- Change a column’s data type
- Modify nullability (whether a column can contain NULL values)
Setting Up the Example Table
Create a table with a custom schema to demonstrate column alterations:
table_name = "schema_evolution_alter_example"
data = [
{
"id": 1,
"name": "Laptop",
"price": 1200,
"discount_price": 1080.0,
"vector": np.random.random(128).tolist(),
},
{
"id": 2,
"name": "Smartphone",
"price": 800,
"discount_price": 720.0,
"vector": np.random.random(128).tolist(),
},
{
"id": 3,
"name": "Headphones",
"price": 150,
"discount_price": 135.0,
"vector": np.random.random(128).tolist(),
},
{
"id": 4,
"name": "Monitor",
"price": 350,
"discount_price": 315.0,
"vector": np.random.random(128).tolist(),
},
{
"id": 5,
"name": "Keyboard",
"price": 80,
"discount_price": 72.0,
"vector": np.random.random(128).tolist(),
},
]
schema = pa.schema(
{
"id": pa.int64(),
"name": pa.string(),
"price": pa.int32(),
"discount_price": pa.float64(),
"vector": pa.list_(pa.float32(), 128),
}
)
table = db.create_table(table_name, data, schema=schema, mode="overwrite")
const tableName = "schema_evolution_alter_example";
const data = [
{
id: 1,
name: "Laptop",
price: 1200,
discount_price: 1080.0,
vector: Array.from({ length: 128 }, () => Math.random()),
},
{
id: 2,
name: "Smartphone",
price: 800,
discount_price: 720.0,
vector: Array.from({ length: 128 }, () => Math.random()),
},
{
id: 3,
name: "Headphones",
price: 150,
discount_price: 135.0,
vector: Array.from({ length: 128 }, () => Math.random()),
},
{
id: 4,
name: "Monitor",
price: 350,
discount_price: 315.0,
vector: Array.from({ length: 128 }, () => Math.random()),
},
{
id: 5,
name: "Keyboard",
price: 80,
discount_price: 72.0,
vector: Array.from({ length: 128 }, () => Math.random()),
},
];
const table = await db.createTable(tableName, data, { mode: "overwrite" });
Renaming Columns
Change column names to better reflect their purpose:
# Rename discount_price to sale_price
table.alter_columns({"path": "discount_price", "rename": "sale_price"})
// Rename discount_price to sale_price
await table.alterColumns([
{
path: "discount_price",
rename: "sale_price",
},
]);
Changing Data Types
Convert column data types for better performance or compatibility:
# Change price from int32 to int64 for larger numbers
table.alter_columns({"path": "price", "data_type": pa.int64()})
Making Columns Nullable
Allow columns to contain NULL values:
# Make the name column nullable
table.alter_columns({"path": "name", "nullable": True})
// Make the name column nullable
await table.alterColumns([
{
path: "name",
nullable: true,
},
]);
Multiple Changes at Once
Apply several alterations in a single operation:
# Rename, change type, and make nullable in one operation
table.alter_columns(
{
"path": "sale_price",
"rename": "final_price",
"data_type": pa.float64(),
"nullable": True,
}
)
// Rename and make nullable in one operation
await table.alterColumns([
{
path: "sale_price",
rename: "final_price",
nullable: true,
},
]);
Dropping Columns
You can remove columns using the
drop_columns
method in Python or
dropColumns
in TypeScript/JavaScript.
Setting Up the Example Table
Create a table with temporary columns that we’ll remove:
table_name = "schema_evolution_drop_example"
data = [
{
"id": 1,
"name": "Laptop",
"price": 1200.00,
"temp_col1": "X",
"temp_col2": 100,
"vector": np.random.random(128).tolist(),
},
{
"id": 2,
"name": "Smartphone",
"price": 800.00,
"temp_col1": "Y",
"temp_col2": 200,
"vector": np.random.random(128).tolist(),
},
{
"id": 3,
"name": "Headphones",
"price": 150.00,
"temp_col1": "Z",
"temp_col2": 300,
"vector": np.random.random(128).tolist(),
},
]
table = db.create_table(table_name, data, mode="overwrite")
const tableName = "schema_evolution_drop_example";
const data = [
{
id: 1,
name: "Laptop",
price: 1200.0,
temp_col1: "X",
temp_col2: 100,
vector: Array.from({ length: 128 }, () => Math.random()),
},
{
id: 2,
name: "Smartphone",
price: 800.0,
temp_col1: "Y",
temp_col2: 200,
vector: Array.from({ length: 128 }, () => Math.random()),
},
{
id: 3,
name: "Headphones",
price: 150.0,
temp_col1: "Z",
temp_col2: 300,
vector: Array.from({ length: 128 }, () => Math.random()),
},
];
const table = await db.createTable(tableName, data, { mode: "overwrite" });
Dropping Single Columns
Remove individual columns that are no longer needed:
# Remove the first temporary column
table.drop_columns(["temp_col1"])
// Remove the first temporary column
await table.dropColumns(["temp_col1"]);
Dropping Multiple Columns
Remove several columns at once for efficiency:
# Remove the second temporary column
table.drop_columns(["temp_col2"])
// Remove the second temporary column
await table.dropColumns(["temp_col2"]);
Vector Column Considerations
Vector columns (used for embeddings) have special considerations. When altering vector columns, you should ensure consistent dimensionality.
Converting List to FixedSizeList
A common schema evolution task is converting a generic list column to a fixed-size list for performance:
vector_dim = 768 # Your embedding dimension
table.alter_columns(dict(path="embedding", data_type=pa.list_(pa.float32(), vector_dim)))