LanceDB Powers Netflix Media Data Lake
To enable the next generation of media analytics and machine learning, we are building the Media Data Lake at Netflix — a data lake designed specifically for media assets at Netflix using LanceDB . We have partnered with our data platform team on integrating LanceDB into our Big Data Platform .
- Media ML Data Engineering, Netflix
A deep dive on Netflix’s Media ML Data Engineering, a new specialization that bridges the gap between traditional data engineering and the unique demands of media-centric machine learning, and how they build the Media Data Lake with LanceDB.
Read the Netflix Tech Blog From Facts & Metrics to Media Machine Learning: Evolving the Data Engineering Function at Netflix
💼 Case Study: How CodeRabbit Leverages LanceDB for AI-Powered Code Reviews
“LanceDB transformed how we handle context at scale. While other vector databases hit cost and performance walls, LanceDB scales effortlessly with our growth—from startup to enterprise. Its multimodal capabilities and deployment flexibility were game-changers, enabling us to deliver the depth of analysis our customers expect while maintaining sub-second response times across millions of code reviews.”
Manage Lance Tables in Any Catalog using Lance Namespace and Spark
Lance Namespace is an open specification built on top of the storage-based Lance table and file format. It provides a standardized way for metadata services like Apache Hive MetaStore, Apache Gravitino, Unity Catalog, AWS Glue Data Catalog, and others to store and manage Lance tables. This means you can seamlessly use Lance tables alongside your existing data lakehouse infrastructure .
🎤 Event Recap!
LanceDB made a small tour around the world in Aug. Started with a workshop with dltHub at Berlin PyData Con. Then a stop in Amsterdam to present at the inaugural Open Lakehouse Meetup with Databricks and DuckDB, followed by a keynote at AI_Dev Con. Our last stop was London started with a meetup generously hosted by AWS London, and we wrapped up the tour with our VLDB workshop on the Lance paper .
📣 Coming up in Sep:
Join us on September 25 for the live webinar: 𝗔𝗽𝗮𝗰𝗵𝗲 𝗦𝗽𝗮𝗿𝗸™ 𝗮𝗻𝗱 𝗟𝗮𝗻𝗰𝗲 𝗦𝗽𝗮𝗿𝗸 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗼𝗿! 🚀 Lance Spark Connector brings Lance’s AI-native multimodal storage to Spark. We’ll cover how Spark can work efficiently with embeddings, images, videos, and documents using Lance’s random access, indexing, and vector/blob support.
📚 Good Reads
Feature Engineering with Geneva
Columnar File Reader in Depth – Structural Encoding
LanceDB WikiSearch: Native Full-Text Search on 41M Wikipedia Docs
🗞️ LanceDB Enterprise Product News
Feature | Description |
---|---|
Faster and more accurate Full-Text Search (FTS) | Complex FTS queries (50–100 terms) now run 3–8x faster with improved relevance and ranking. |
Simpler data loading | Insert, merge, and create tables seamlessly without worrying about dataset size or batch tuning. |
Flexible search results | Support for `limit` and `offset` in both vector and full-text search allows easy pagination of large result sets. |
Better observability for `merge_insert` | Use `explain_plan` and `analyze_plan` to visualize execution and identify performance bottlenecks. |
🤝 Community contributions
GEO Data Type support coming to Lance! ( git ) Thanks to the contributions from our community @ddupg and @jaystarshot , Lance now supports Geo type. Geo index and query optimizations are coming soon too! A shoutout for the individual contributors from Bytedance and Uber for making this possible!
A heartfelt thank you to our community contributors of lance and lancedb this past month: @majin1102 @fangbo @wojiaodoubao @pimdh @ebyhr @yanghua @HaochengLIU @imededin @HubertY @chenghao-guo @lorinlee @vlovich @adrian-wang @ddupg @LeoReeYang @emmanuel-ferdman @adi-ray @yuvalif @Heisenberg208 @mocobeta @MarkMcCaskey @reedloden
🔦 Open Source Releases Spotlight
LanceDB | 0.22.0 | Integration with Lance Namespace, support multi-level namespace management. |
---|---|---|
Lance | 0.35.0 | JSONB data type and index support, Apache OpenDAL integration, lance-tools CLI command, contains_tokens UDF for full text search |
0.34.0 | Shallow clone support, zone map index support, row level conflict resolution for Delete, metadata diff API | |
0.33.0 | File format 2.1 official release (2.1 files written with earlier versions of the library may not be readable due to breaking changes during development). Java transaction commit API for all commit types. | |
Lance Namespace | 0.0.6 - 0.0.14 | Python and Rust SDK release |
Lance Ray | 0.0.1 - 0.0.5 | Integration with Lance Namespace |
Lance Spark | 0.0.2 - 0.0.11 | Support CREATE TABLE with fixed size vector column, support UPDATE and DELETE. |