Pydantic
is a data validation library in Python.
LanceDB integrates with Pydantic for schema inference, data ingestion, and query result casting.
Using lancedb.pydantic.LanceModel, users can seamlessly
integrate Pydantic with the rest of the LanceDB APIs.
First, import the necessary LanceDB and Pydantic modules:
import lancedb
from lancedb.pydantic import Vector, LanceModelNext, define your Pydantic model by inheriting from LanceModel and specifying your fields including a vector field:
class PersonModel(LanceModel):
name: str
age: int
vector: Vector(2)Set the database connection URL:
url = "./example"Now you can create a table, add data, and perform vector search operations:
db = lancedb.connect(url)
table = db.create_table("person", schema=PersonModel)
table.add(
[
PersonModel(name="bob", age=1, vector=[1.0, 2.0]),
PersonModel(name="alice", age=2, vector=[3.0, 4.0]),
]
)
assert table.count_rows() == 2
person = table.search([0.0, 0.0]).limit(1).to_pydantic(PersonModel)
assert person[0].name == "bob"Vector Field
LanceDB provides a lancedb.pydantic.Vector method to define a
vector Field in a Pydantic Model.
>>> import pydantic
>>> from lancedb.pydantic import Vector
...
>>> class MyModel(pydantic.BaseModel):
... id: int
... url: str
... embeddings: Vector(768)
>>> schema = pydantic_to_schema(MyModel)
>>> assert schema == pa.schema([
... pa.field("id", pa.int64(), False),
... pa.field("url", pa.utf8(), False),
... pa.field("embeddings", pa.list_(pa.float32(), 768))
... ])This example demonstrates how LanceDB automatically converts Pydantic field types to their corresponding Apache Arrow data types. The pydantic_to_schema() function takes a Pydantic model and generates an Arrow schema where:
intfields becomepa.int64()(64-bit integers)strfields becomepa.utf8()(UTF-8 encoded strings)Vector(768)becomespa.list_(pa.float32(), 768)(fixed-size list of 768 float32 values)- The
Falseparameter indicates that the fields are not nullable
Type Conversion
LanceDB automatically convert Pydantic fields to Apache Arrow DataType .
Current supported type conversions:
| Pydantic Field Type | PyArrow Data Type |
|---|---|
int |
pyarrow.int64 |
float |
pyarrow.float64 |
bool |
pyarrow.bool |
str |
pyarrow.utf8() |
list |
pyarrow.List |
BaseModel |
pyarrow.Struct |
Vector(n) |
pyarrow.FixedSizeList(float32, n) |
LanceDB supports to create Apache Arrow Schema from a
pydantic.BaseModel
via lancedb.pydantic.pydantic_to_schema method.
>>> from typing import List, Optional
>>> import pydantic
>>> from lancedb.pydantic import pydantic_to_schema, Vector
>>> class FooModel(pydantic.BaseModel):
... id: int
... s: str
... vec: Vector(1536) # fixed_size_list<item: float32>[1536]
... li: List[int]
...
>>> schema = pydantic_to_schema(FooModel)
>>> assert schema == pa.schema([
... pa.field("id", pa.int64(), False),
... pa.field("s", pa.utf8(), False),
... pa.field("vec", pa.list_(pa.float32(), 1536)),
... pa.field("li", pa.list_(pa.int64()), False),
... ])This example shows a more complex Pydantic model with various field types and demonstrates how LanceDB handles:
- Basic types:
intandstrfields - Vector fields:
Vector(1536)creates a fixed-size list of 1536 float32 values - List fields:
List[int]becomes a variable-length list of int64 values - Schema generation: The
pydantic_to_schema()function automatically converts all these types to their Arrow equivalents