Multimodal Recipe Agent

Learn how to build a sophisticated AI agent that can understand both text and images to help users discover recipes. This tutorial demonstrates how to combine LanceDB’s multimodal capabilities with PydanticAI to create an intelligent recipe assistant.

What You’ll Build

Colab Tutorial (Sample Data)

Full Demo Application (Real Dataset)

Tutorial Overview

Colab Tutorial (Quick Start)

Interactive notebook covering:

  1. Data Preparation: Work with sample recipe data
  2. Embedding Generation: Create text and image embeddings
  3. LanceDB Setup: Store multimodal data efficiently
  4. Agent Development: Build a PydanticAI agent with custom tools
  5. Testing: Try the agent with sample queries

Full Demo (Complete Application)

Complete codebase including:

  1. Real Dataset: Download and process thousands of recipes
  2. Streamlit Interface: Full chat application with image upload
  3. Production Features: Error handling, logging, and monitoring
  4. Deployment Ready: Complete with all necessary files

Prerequisites

Quick Start

Option 1: Interactive Tutorial (Google Colab)

Open In Colab

Perfect for learning! This Colab notebook provides a step-by-step tutorial with sample data. No setup required - just click and start learning about multimodal agents.

Option 2: Full Demo Application (Local Setup)

Download the Complete Tutorial

Download Tutorial Files

Setup Instructions

code
# 1. Extract the downloaded files to a folder
# 2. Navigate to the folder in terminal
cd multimodal-recipe-agent

# 3. Install dependencies with uv
uv sync

# 4. Download the Kaggle dataset
# Visit: https://www.kaggle.com/datasets/pes12017000148/food-ingredients-and-recipe-dataset-with-images
# Extract recipes.csv to the data/ folder

# 5. Import the dataset
uv run python import.py

# 6. Run the complete Streamlit chat application
uv run streamlit run app.py

Complete experience! This gives you the full Streamlit chat interface with a real recipe dataset. Requires downloading the dataset from Kaggle but provides the complete production-ready application.

Dataset Information

Code Files

This tutorial includes complete, runnable code:

Folder Structure

When you download the tutorial, organize your files like this:

code
multimodal-recipe-agent/
├── multimodal-recipe-agent.ipynb  # Interactive tutorial
├── agent.py                       # Core agent implementation
├── app.py                         # Streamlit chat interface
├── import.py                      # Data processing script
├── pyproject.toml                 # Project configuration
├── uv.lock                        # Dependency lock file
├── README.md                      # Project documentation
└── data/                          # Generated data (created after import)
    ├── recipes.csv               # Recipe dataset
    ├── images/                   # Recipe images
    └── recipes.lance             # LanceDB database

Key Technologies

Ready to build your first multimodal AI agent? Let’s get started!

View Tutorial Notebook