Blog category:
Engineering
Lance File 2.1 is Now Stable
The 2.1 file version is now stable, learn what that means for you and what's coming next.
Introducing Lance Data Viewer: A Simple Way to Explore Lance Tables
A lightweight open source web UI for exploring Lance datasets, viewing schemas, and browsing table data with vector visualization support.
LanceDB's RaBitQ Quantization for Blazing Fast Vector Search
Introducing RaBitQ quantization in LanceDB for higher compression, faster indexing, and better recall on high‑dimensional embeddings.
Building Semantic Video Recommendations with TwelveLabs and LanceDB
Build semantic video recommendations using TwelveLabs embeddings, LanceDB storage, and Geneva pipelines with Ray.
Setup Real-Time Multimodal AI Analytics with Apache Fluss (incubating) and Lance
Learn how to build real-time multimodal AI analytics by integrating Apache Fluss streaming storage with Lance's AI-optimized lakehouse. This guide demonstrates streaming multimodal data processing for RAG systems and ML workflows.
Productionalize AI Workloads with Lance Namespace, LanceDB, and Ray
Learn how to productionalize AI workloads with Lance Namespace's enterprise stack integration and the scalability of LanceDB and Ray for end-to-end ML pipelines.
LanceDB's Geneva: Scalable Feature Engineering
Learn how to build scalable feature engineering pipelines with Geneva and LanceDB. This demo transforms image data into rich features including captions, embeddings, and metadata using distributed Ray clusters.
LanceDB WikiSearch: Native Full-Text Search on 41M Wikipedia Docs
No more Tantivy! We stress-tested native full-text search in our latest massive-scale search demo. Let's break down how it works and what we did to scale it.
Manage Lance Tables in Any Catalog using Lance Namespace and Spark
Access and manage your Lance tables in Hive, Glue, Unity Catalog, or any catalog service using Lance Namespace with the latest Lance Spark connector.
Columnar File Readers in Depth: Structural Encoding
Deep dive into LanceDB's dual structural encoding approach - mini-block for small data types and full-zip for large multimodal data. Learn how this optimizes compression and random access performance compared to Parquet.
S3 Vectors vs LanceDB: Cost, Latency, and the Hidden Trade-offs
Is it worth the hype? Comparing Amazon S3 Vectors and LanceDB for RAG and agentic systems.
What is the LanceDB Multimodal Lakehouse?
Introducing the Multimodal Lakehouse - a unified platform for managing AI data from raw files to production-ready features, now part of LanceDB Enterprise.
Columnar File Readers in Depth: Repetition & Definition Levels
Explore columnar file readers in depth: repetition & definition levels with practical insights and expert guidance from the LanceDB team.
Columnar File Readers in Depth: Column Shredding
Explore columnar file readers in depth: column shredding with practical insights and expert guidance from the LanceDB team.
Columnar File Readers in Depth: Compression Transparency
Explore columnar file readers in depth: compression transparency with practical insights and expert guidance from the LanceDB team.
A Practical Guide to Training Custom Rerankers
Explore a practical guide to training custom rerankers with practical insights and expert guidance from the LanceDB team.
The Future of Open Source Table Formats: Apache Iceberg and Lance
Explore the future of open source table formats: apache iceberg and lance with practical insights and expert guidance from the LanceDB team.
Lance File 2.1: Smaller and Simpler
Explore lance file 2.1: smaller and simpler with practical insights and expert guidance from the LanceDB team.
RAG with GRPO Fine-Tuned Reasoning Model
Explore rag with grpo fine-tuned reasoning model with practical insights and expert guidance from the LanceDB team.
Creating a FinTech AI Agent From Scratch
Explore fintech ai agent from scratch with practical insights and expert guidance from the LanceDB team.
Chunking Analysis: Which is the right chunking approach for your language?
Explore chunking analysis: which is the right chunking approach for your language? with practical insights and expert guidance from the LanceDB team.
Agentic RAG Using LangGraph: Build an Autonomous Customer Support Agent
Build an autonomous customer support agent using LangGraph and LanceDB that automatically fetches, classifies, drafts, and responds to emails with RAG-powered policy retrieval.
Python Package to convert image datasets to lance type
Explore python package to convert image datasets to lance type with practical insights and expert guidance from the LanceDB team.
Late Interaction & Efficient Multi-modal Retrievers Need More Than a Vector Index
Explore late interaction & efficient multi-modal retrievers need more than a vector index with practical insights and expert guidance from the LanceDB team.
The Case for Random Access I/O
One of the reasons we started the Lance file format and have been investigating new encodings is because we wanted a format with better support for random access.
My Summer Internship Experience at LanceDB
I'm Raunak, a master's student at the University of Illinois, Urbana-Champaign. This summer, I had the opportunity to intern as a Software Engineer at LanceDB, an early-stage startup based in San Francisco.
Columnar File Readers in Depth: APIs and Fusion
The API used to read files has evolved over time, from simple full table reads to batch reads and eventually to iterative record batch readers. Lance takes this a step further to return a stream of read tasks.
Developers, Ditch the Black Box: Welcome to Continue
Remember flipping through coding manuals? Those quickly became relics with the rise of Google and Stack Overflow, a one-stop shop for developer queries.
Columnar File Readers in Depth: Parallelism without Row Groups
Explore columnar file readers in depth: column shredding with practical insights and expert guidance from the LanceDB team.
Benchmarking Cohere Rerankers with LanceDB
Improve retrieval quality by reranking LanceDB results with Cohere and ColBERT. You’ll plug rerankers into vector, FTS, and hybrid search and compare accuracy on real datasets.
Lance v2: A New Columnar Container Format
Explore lance v2: a new columnar container format with practical insights and expert guidance from the LanceDB team.
Effortlessly Loading and Processing Images with Lance: a Code Walkthrough
Working with large image datasets in machine learning can be challenging, often requiring significant computational resources and efficient data-handling techniques.
A Practical Guide to Fine-Tuning Embedding Models
Explore a practical guide to fine-tuning embedding models with practical insights and expert guidance from the LanceDB team.
Columnar File Readers in Depth: Backpressure
Streaming data applications can be tricky. When you can read data faster than you can process the data then bad things tend to happen. The various solutions to this problem are largely classified as backpressure.
Designing a Table Format for ML Workloads
Explore designing a table format for ML workloads with practical insights and expert guidance from the LanceDB team.
GraphRAG: Hierarchical Approach to Retrieval-Augmented Generation
Explore GraphRAG: hierarchical approach to retrieval-augmented-generation with practical insights and expert guidance from the LanceDB team.
Track AI Trends: CrewAI Agents & RAG
This article will teach us how to make an AI Trends Searcher using CrewAI Agents and their Tasks. But before diving into that, let's first understand what CrewAI is and how we can use it for these applications.
Multimodal Myntra Fashion Search Engine Using LanceDB
Build a multimodal fashion search engine with LanceDB and CLIP embeddings. Follow a step‑by‑step workflow to register embeddings, create the table, query by text or image, and ship a Streamlit UI.
Custom Datasets for Efficient LLM Training Using Lance
See about custom datasets for efficient llm training using lance. Get practical steps, examples, and best practices you can use now.
Implementing Corrective RAG in the Easiest Way
Even though text-generation models are good at generating content, they sometimes need to improve in returning facts. This happens because of the way they are trained.
Hybrid Search and Custom Reranking with LanceDB
Combine keyword and vector search for higher‑quality results with LanceDB. This post shows how to run hybrid search and compare rerankers (linear combination, Cohere, ColBERT) with code and benchmarks.
Hybrid Search: RAG for Real-Life Production-Grade Applications
Get about hybrid search: rag for real-life production-grade applications. Get practical steps, examples, and best practices you can use now.
Efficient RAG with Compression and Filtering
Discover about efficient rag with compression and filtering. Get practical steps, examples, and best practices you can use now.
Inverted File Product Quantization (IVF_PQ): Accelerate Vector Search by Creating Indices
Compress vectors with PQ and accelerate retrieval with IVF_PQ in LanceDB. The tutorial explains the concepts, memory savings, and a minimal implementation with search tuning knobs.
Modified RAG: Parent Document & Bigger Chunk Retriever
Get about modified rag: parent document & bigger chunk retriever. Get practical steps, examples, and best practices you can use now.
Search Within an Image with Segment Anything
Get about search within an image with segment anything. Get practical steps, examples, and best practices you can use now.
MemGPT: OS Inspired LLMs That Manage Their Own Memory
Explore about memgpt: os inspired llms that manage their own memory. Get practical steps, examples, and best practices you can use now.
Hybrid Search: Combining BM25 and Semantic Search for Better Results with Langchain
Have you ever thought about how search engines find exactly what you're looking for? They usually use a mix of looking for specific words and understanding the meaning behind them.
Accelerate Vector Search Applications Using OpenVINO & LanceDB
In this article, We use CLIP from OpenAI for Text-to-Image and Image-to-Image searching and we’ll also do a comparative analysis of the Pytorch model, FP16 OpenVINO format, and INT8 OpenVINO format in terms of speedup.
Advanced RAG: Precise Zero-Shot Dense Retrieval with HyDE
In the world of search engines, the quest to find the most relevant information is a constant challenge. Researchers are always on the lookout for innovative ways to improve the effectiveness of search results.
Better RAG with Active Retrieval Augmented Generation FLARE
by Akash A. Get practical steps and examples from 'Better RAG with Active Retrieval Augmented Generation FLARE'.
GPU-Accelerated Indexing in LanceDB
Speed up vector index training in LanceDB with CUDA or Apple Silicon (MPS). See how GPU‑accelerated IVF/PQ training compares to CPU and how to enable it in code.
Reduce Hallucinations from LLM-Powered Agents Using Long-Term Memory
Understand about reduce hallucinations from llm-powered agents using long-term memory. Get practical steps, examples, and best practices you can use now.
Scalable Computer Vision with LanceDB & Voxel51
Explore about scalable computer vision with lancedb & voxel51. Get practical steps, examples, and best practices you can use now.
Lance, Windows. Windows, Lance
It was Spring of 2012. After being an avid user for 2+ years, I finally decided to join Wes Mckinney and work on pandas full time.
My SIMD Is Faster than Yours
An untold story about how we make LanceDB vector search fast. Get practical steps and examples from 'My SIMD is faster than Yours'.
Benchmarking Random Access in Lance
In this short blog post we’ll take you through some simple benchmarks to show the random access performance of Lance format. Get practical steps and examples from 'Benchmarking random access in Lance'.