Lesson 8: Setup
Before we build agents, we need infrastructure. This lesson sets up the services that make agents useful—databases for memory, vector stores for knowledge, and observability for debugging.
- The Stack: Why agents need more than just an LLM API key.
- PostgreSQL: Persistent storage for sessions, memories, and structured data.
- Qdrant: Vector database for semantic search and RAG.
- Arize Phoenix: Observability platform to trace and debug agent calls.
- The Glue: Docker Compose to run everything locally with one command.
Why This Stack?
A basic chatbot needs nothing but an API key. An agent needs infrastructure:
| Capability | Requires | Tool |
|---|---|---|
| Remember this conversation | Session storage | PostgreSQL |
| Remember facts about me | Long-term memory | PostgreSQL |
| Answer from my documents | Vector search | Qdrant |
| Debug why it failed | Call tracing | Arize Phoenix |
| Query business data | Relational database | PostgreSQL |
You could use files, SQLite, or in-memory stores for prototypes. We're using production-grade tools from day one so you learn patterns that scale.
Prerequisites
- Docker Desktop installed and running
- Python 3.11+ with uv package manager
- OpenAI API key (or compatible provider like Anthropic, Groq)
Project Setup
Start by creating the project with uv. This gives you a Python project with dependency management and a virtual environment—all lesson scripts will run from here.
uv init agents
cd agents
Install the core dependencies for the entire tutorial upfront. Each library serves a specific purpose across the lessons:
uv add \
agno \
openai \
"psycopg[binary]" \
sqlalchemy \
qdrant-client \
arize-phoenix \
opentelemetry-sdk \
opentelemetry-exporter-otlp \
openinference-instrumentation-agno \
python-dotenv \
"markitdown[pdf]" \
pypdf \
mcp \
yfinance \
"chonkie[semantic]"
| Package | Used For |
|---|---|
agno | Agent framework (all lessons) |
openai | LLM API calls |
psycopg[binary] | PostgreSQL connections (history, memory, tools) |
sqlalchemy | Database engine used by agno internally |
qdrant-client | Vector database for RAG (Lesson 13) |
arize-phoenix | Tracing and observability (all lessons) |
opentelemetry-sdk | Telemetry pipeline |
opentelemetry-exporter-otlp | Export traces to Phoenix via OTLP |
openinference-instrumentation-agno | Auto-trace agno agent runs, tools, memory into Phoenix |
python-dotenv | Load .env file |
markitdown[pdf] | Document conversion for RAG ingestion (Lesson 13) |
pypdf | PDF parsing (Lesson 13) |
mcp | Model Context Protocol server/client (Lesson 12) |
yfinance | Stock data for tool examples (Lesson 12) |
chonkie[semantic] | Semantic chunking for RAG ingestion (Lesson 13) |
Now create the folder structure:
agents/
├── docker-compose.yml
├── tools/
│ ├── init.sql
│ ├── seed_data.sql
│ └── reset_data.py
├── .env
└── (lesson scripts will go here)
mkdir tools
Docker Compose
This file defines three services that start together. One command (docker compose up) gives you a complete backend.
services:
phoenix:
image: arizephoenix/phoenix:latest
ports:
- "6006:6006"
- "4317:4317"
environment:
PHOENIX_SQL_DATABASE_URL: postgresql://ai:ai@postgres:5432/phoenix
depends_on:
postgres:
condition: service_healthy
postgres:
image: postgres:17
ports:
- "5532:5432"
environment:
POSTGRES_USER: ai
POSTGRES_PASSWORD: ai
POSTGRES_DB: ai
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ai"]
interval: 5s
timeout: 5s
retries: 5
volumes:
- pgdata:/var/lib/postgresql/data
- ./tools/init.sql:/docker-entrypoint-initdb.d/init.sql
- ./tools/seed_data.sql:/seed_data.sql
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
- "6334:6334"
volumes:
- qdrant_storage:/qdrant/storage
volumes:
pgdata:
qdrant_storage:
What Each Service Does
PostgreSQL 17 — The workhorse database. We'll use it for:
aidatabase: Agent sessions, memories, and agno's internal tablesphoenixdatabase: Arize Phoenix stores its traces herecrm_demodatabase: Fake CRM data for tool-calling exercises
Port 5532 (not default 5432) avoids conflicts if you have Postgres installed locally.
Qdrant — Vector database optimized for similarity search. When your agent needs to find "relevant documents," it converts the query to a vector and searches Qdrant. We'll use it for RAG in Lesson 13.
Arize Phoenix — Open-source observability for LLM applications. Every agent call, tool invocation, and LLM request gets traced. When something breaks, Phoenix shows you exactly what happened. Access the UI at http://localhost:6006.
Database Initialization
When Postgres starts for the first time, it runs scripts in /docker-entrypoint-initdb.d/ alphabetically. We use two files:
init.sql — Schema Setup
This creates the databases and tables we need:
-- Initialize databases for the AI Agents tutorial
-- Create Phoenix telemetry database
CREATE DATABASE phoenix;
-- Create CRM demo database
CREATE DATABASE crm_demo;
-- Connect to crm_demo and create schema + seed data
\c crm_demo
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
company VARCHAR(100),
industry VARCHAR(50),
created_at DATE DEFAULT CURRENT_DATE
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
product VARCHAR(100),
amount DECIMAL(10,2),
status VARCHAR(20),
created_at DATE DEFAULT CURRENT_DATE
);
\i /seed_data.sql
Three databases serve different purposes:
ai: Created automatically by Postgres (it's the default). Agno stores sessions and memories here.phoenix: Phoenix's trace storage. You won't query this directly.crm_demo: Our playground for tool-calling demos. Simple CRM with customers and orders.
seed_data.sql — Test Data
Realistic-ish data for testing agent tools. Eight companies, various order patterns:
-- Seed data for CRM demo database
INSERT INTO customers (name, email, company, industry, created_at) VALUES
('Alice Johnson', 'alice@acme.com', 'Acme Corp', 'Manufacturing', '2024-01-15'),
('Bob Smith', 'bob@globex.com', 'Globex Inc', 'Technology', '2024-02-20'),
('Carol White', 'carol@initech.com', 'Initech', 'Finance', '2024-03-10'),
('David Lee', 'david@umbrella.com', 'Umbrella Corp', 'Healthcare', '2024-04-05'),
('Emma Davis', 'emma@stark.io', 'Stark Industries', 'Technology', '2024-05-12'),
('Frank Miller', 'frank@wayneent.com', 'Wayne Enterprises', 'Finance', '2024-06-18'),
('Grace Kim', 'grace@cyberdyne.ai', 'Cyberdyne Systems', 'Technology', '2024-07-22'),
('Henry Chen', 'henry@oscorp.com', 'Oscorp', 'Healthcare', '2024-08-30');
INSERT INTO orders (customer_id, product, amount, status, created_at) VALUES
-- Alice (Acme) - big spender, loyal
(1, 'Enterprise License', 5000.00, 'completed', '2024-01-20'),
(1, 'Support Package', 1200.00, 'completed', '2024-02-15'),
(1, 'Training Package', 2500.00, 'completed', '2024-06-10'),
(1, 'Enterprise License', 5000.00, 'completed', '2024-07-01'),
-- Bob (Globex) - growing account
(2, 'Starter License', 500.00, 'completed', '2024-02-25'),
(2, 'Professional License', 2000.00, 'completed', '2024-05-15'),
(2, 'Support Package', 1200.00, 'pending', '2024-09-01'),
-- Carol (Initech) - enterprise, some issues
(3, 'Enterprise License', 5000.00, 'completed', '2024-03-15'),
(3, 'Training Package', 2500.00, 'cancelled', '2024-04-20'),
(3, 'Support Package', 1200.00, 'completed', '2024-05-01'),
(3, 'Consulting', 8000.00, 'completed', '2024-08-15'),
-- David (Umbrella) - new, cautious
(4, 'Starter License', 500.00, 'completed', '2024-04-10'),
(4, 'Professional License', 2000.00, 'pending', '2024-09-05'),
-- Emma (Stark) - tech-savvy, fast mover
(5, 'Enterprise License', 5000.00, 'completed', '2024-05-20'),
(5, 'API Access', 3000.00, 'completed', '2024-06-01'),
(5, 'Support Package', 1200.00, 'completed', '2024-06-15'),
(5, 'Custom Integration', 15000.00, 'pending', '2024-09-10'),
-- Frank (Wayne) - big deal pending
(6, 'Enterprise License', 5000.00, 'completed', '2024-06-25'),
(6, 'Consulting', 8000.00, 'pending', '2024-09-01'),
(6, 'Training Package', 2500.00, 'pending', '2024-09-15'),
-- Grace (Cyberdyne) - AI company, power user
(7, 'Enterprise License', 5000.00, 'completed', '2024-07-28'),
(7, 'API Access', 3000.00, 'completed', '2024-08-05'),
(7, 'Custom Integration', 15000.00, 'completed', '2024-08-20'),
-- Henry (Oscorp) - refund case
(8, 'Professional License', 2000.00, 'completed', '2024-09-01'),
(8, 'Support Package', 1200.00, 'refunded', '2024-09-10');
Each customer has a story: Alice is loyal, Bob is growing, Carol had a cancelled order, Henry got a refund. This variety lets you test agent tools against realistic edge cases.
Reset Script
As you experiment, databases accumulate cruft—old sessions, test memories, failed experiments. This script returns everything to a clean state:
"""
Reset Script: Blank slate for all data.
Usage: uv run tools/reset_data.py
Clears:
- Agent memories, sessions, knowledge (ai database)
- Phoenix traces (phoenix database)
- CRM data (crm_demo database)
- Qdrant vectors
"""
from pathlib import Path
import psycopg
SEED_SQL = Path(__file__).parent / "seed_data.sql"
from qdrant_client import QdrantClient
# AI database - drop all user tables (recreated on next run)
conn = psycopg.connect("postgresql://ai:ai@localhost:5532/ai")
cur = conn.cursor()
cur.execute("""
SELECT schemaname, tablename FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
""")
tables = cur.fetchall()
for schema, table in tables:
cur.execute(f'DROP TABLE IF EXISTS "{schema}"."{table}" CASCADE')
conn.commit()
cur.close()
conn.close()
print(f"Dropped: {len(tables)} tables in ai database")
# CRM data (crm_demo database)
crm_conn = psycopg.connect("postgresql://ai:ai@localhost:5532/crm_demo")
crm_cur = crm_conn.cursor()
crm_cur.execute("TRUNCATE orders, customers RESTART IDENTITY CASCADE;")
crm_cur.execute(SEED_SQL.read_text())
crm_conn.commit()
crm_cur.close()
crm_conn.close()
print("Reset: CRM (8 customers, 25 orders)")
# Qdrant
qdrant = QdrantClient(url="http://localhost:6333")
qdrant.delete_collection("knowledge-demo")
qdrant.delete_collection("knowledge-semantic")
print("Deleted: Qdrant collections")
# Phoenix traces - may not exist on fresh install
phoenix_conn = psycopg.connect("postgresql://ai:ai@localhost:5532/phoenix")
phoenix_cur = phoenix_conn.cursor()
phoenix_cur.execute("""
DO $
BEGIN
TRUNCATE spans, traces, projects, span_annotations, trace_annotations,
project_sessions, project_session_annotations CASCADE;
EXCEPTION WHEN undefined_table THEN
NULL;
END $;
""")
phoenix_conn.commit()
phoenix_cur.close()
phoenix_conn.close()
print("Cleared: Phoenix traces")
print("\nReset complete.")
Run it whenever you want a fresh start:
uv run tools/reset_data.py
Environment Variables
Create a .env file in your project root. Agno and other tools will read from it:
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL_ID=gpt-4o-mini
PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006
Add .env to your .gitignore. Never commit API keys.
Starting the Stack
# Start all services (first run downloads images)
docker compose up -d
# Verify everything is running
docker compose ps
# View logs if something fails
docker compose logs phoenix
After startup, verify each service:
| Service | URL | What to Check |
|---|---|---|
| Phoenix | http://localhost:6006 | Web UI loads |
| Qdrant | http://localhost:6333/dashboard | Dashboard shows "0 collections" |
| Postgres | psql -h localhost -p 5532 -U ai -d crm_demo | Can connect, tables exist |
Stopping and Cleaning Up
# Stop services (data persists in volumes)
docker compose down
# Stop AND delete all data (full reset)
docker compose down -v
The -v flag removes volumes. Use it when you want to start completely fresh, including re-running init.sql.
Troubleshooting
Port conflicts: If 5532, 6006, or 6333 are in use, either stop the conflicting service or change ports in docker-compose.yml.
Postgres won't start: Check logs with docker compose logs postgres. Common issue: corrupted volume. Fix with docker compose down -v && docker compose up -d.
Phoenix shows no traces: Ensure PHOENIX_COLLECTOR_ENDPOINT is set and agents are configured to send telemetry (covered in later lessons).
Qdrant connection refused: The service takes a few seconds to start. Wait and retry.
What's Next
With infrastructure running, you're ready to build agents. In Lesson 9, we'll create a simple stateless agent—no memory, no tools, just LLM in → response out. From there, each lesson adds a capability: history, memory, tools, knowledge, teams.
The stack you just set up will support all of it.