🍳 Multimodal Recipe Agent
A complete AI-powered recipe search application that understands both text and images using LanceDB, PydanticAI, and Streamlit.
Features
Colab Tutorial
- Interactive Learning: Step-by-step notebook with sample recipes
- Core Concepts: Learn multimodal agent development
- No Setup Required: Run directly in your browser
Full Demo Application
- Semantic Recipe Search: Find recipes by describing what you want to cook
- Visual Recipe Discovery: Upload a photo to find similar recipes
- Conversational Interface: Chat with an AI agent about cooking
- Multimodal Storage: Recipe text, images, and vectors stored together in LanceDB
- Production Ready: Complete with error handling and logging
Quick Start
Option 1: Interactive Tutorial (Google Colab)
Perfect for learning! This Colab notebook provides a step-by-step tutorial with sample data. No setup required - just click and start learning about multimodal agents.
Option 2: Full Demo Application (Local Setup)
1. Download and Setup
# Download the tutorial files from GitHub
# Extract all files to a folder named 'multimodal-recipe-agent'
# Navigate to the folder
cd multimodal-recipe-agent
2. Install Dependencies
uv sync
3. Download and Import Full Dataset
First, download the dataset:
- Visit Kaggle Recipe Dataset
- Download the dataset and extract it to your
multimodal-recipe-agent
folder - Ensure the
recipes.csv
file is in thedata/
directory
Then run the import script:
uv run python import.py
This will:
- Process the downloaded recipe dataset from Kaggle
- Generate text and image embeddings for thousands of recipes
- Store everything in a LanceDB database
4. Run the Complete Application
Streamlit Chat App:
uv run streamlit run app.py
Jupyter Notebook Tutorial:
uv run jupyter notebook multimodal-recipe-agent.ipynb
Project Structure
multimodal-recipe-agent/
├── multimodal-recipe-agent.ipynb # Interactive tutorial
├── agent.py # PydanticAI agent implementation
├── app.py # Streamlit chat interface
├── import.py # Data import and processing
├── pyproject.toml # Modern Python project configuration
├── uv.lock # Locked dependency versions
├── README.md # This file
└── data/ # Generated data directory (created after import)
├── recipes.csv # Recipe dataset
├── images/ # Recipe images
└── recipes.lance # LanceDB database
Download Instructions
- Download the tutorial files from the GitHub repository
- Extract all files to a folder named
multimodal-recipe-agent
- Ensure all files are in the same directory - this is important for imports to work
- Navigate to the folder in your terminal before running commands
Usage
Text Search
- Ask questions like “Find me healthy pasta recipes with chicken”
- Search by ingredients: “What can I make with eggs, flour, and milk?”
Image Search
- Upload a photo of a dish in the Streamlit sidebar
- The AI will find similar recipes based on visual similarity
Chat Interface
- Have a conversation with the recipe assistant
- Ask follow-up questions about ingredients or cooking methods
- Get detailed recipe information with images
Key Technologies
- LanceDB: Multimodal vector database for efficient storage and retrieval
- PydanticAI: Modern AI agent framework with type safety
- Sentence Transformers: Text embeddings for semantic search
- CLIP: Vision-language model for image understanding
- Streamlit: Interactive web application framework
Requirements
- Python 3.8+
- CUDA (optional, for GPU acceleration)
How It Works
- Data Import:
import.py
processes recipe data, generates embeddings, and stores everything in LanceDB - AI Agent:
agent.py
creates a PydanticAI agent with tools for searching recipes - Web Interface:
app.py
provides a Streamlit chat interface for interacting with the agent - Tutorial:
multimodal-recipe-agent.ipynb
walks through the implementation step-by-step
Development
This project demonstrates:
- Building AI agents with multimodal capabilities
- Using LanceDB for vector storage and retrieval
- Creating custom tools for PydanticAI agents
- Building conversational interfaces with Streamlit
- Handling both text and image inputs in a single agent
License
This project is part of the LanceDB tutorials and follows the same license terms.