Lesson 8: Setup

Before we build agents, we need infrastructure. This lesson sets up the services that make agents useful—databases for memory, vector stores for knowledge, and observability for debugging.

Topics Covered

The Stack: Why agents need more than just an LLM API key.
PostgreSQL: Persistent storage for sessions, memories, and structured data.
Qdrant: Vector database for semantic search and RAG.
Arize Phoenix: Observability platform to trace and debug agent calls.
The Glue: Docker Compose to run everything locally with one command.

Why This Stack?

A basic chatbot needs nothing but an API key. An agent needs infrastructure:

Capability	Requires	Tool
Remember this conversation	Session storage	PostgreSQL
Remember facts about me	Long-term memory	PostgreSQL
Answer from my documents	Vector search	Qdrant
Debug why it failed	Call tracing	Arize Phoenix
Query business data	Relational database	PostgreSQL

You could use files, SQLite, or in-memory stores for prototypes. We're using production-grade tools from day one so you learn patterns that scale.

Prerequisites

Docker Desktop installed and running
Python 3.11+ with uv package manager
OpenAI API key (or compatible provider like Anthropic, Groq)

Project Setup

Start by creating the project with uv. This gives you a Python project with dependency management and a virtual environment—all lesson scripts will run from here.

uv init agents
cd agents

Install the core dependencies for the entire tutorial upfront. Each library serves a specific purpose across the lessons:

uv add \
  agno \
  openai \
  "psycopg[binary]" \
  sqlalchemy \
  qdrant-client \
  arize-phoenix \
  opentelemetry-sdk \
  opentelemetry-exporter-otlp \
  openinference-instrumentation-agno \
  python-dotenv \
  "markitdown[pdf]" \
  pypdf \
  mcp \
  yfinance \
  "chonkie[semantic]"

Package	Used For
`agno`	Agent framework (all lessons)
`openai`	LLM API calls
`psycopg[binary]`	PostgreSQL connections (history, memory, tools)
`sqlalchemy`	Database engine used by agno internally
`qdrant-client`	Vector database for RAG (Lesson 13)
`arize-phoenix`	Tracing and observability (all lessons)
`opentelemetry-sdk`	Telemetry pipeline
`opentelemetry-exporter-otlp`	Export traces to Phoenix via OTLP
`openinference-instrumentation-agno`	Auto-trace agno agent runs, tools, memory into Phoenix
`python-dotenv`	Load `.env` file
`markitdown[pdf]`	Document conversion for RAG ingestion (Lesson 13)
`pypdf`	PDF parsing (Lesson 13)
`mcp`	Model Context Protocol server/client (Lesson 12)
`yfinance`	Stock data for tool examples (Lesson 12)
`chonkie[semantic]`	Semantic chunking for RAG ingestion (Lesson 13)

Now create the folder structure:

agents/
├── docker-compose.yml
├── tools/
│   ├── init.sql
│   ├── seed_data.sql
│   └── reset_data.py
├── .env
└── (lesson scripts will go here)

mkdir tools

Docker Compose

This file defines three services that start together. One command (docker compose up) gives you a complete backend.

docker-compose.yml
services:
  phoenix:
    image: arizephoenix/phoenix:latest
    ports:
      - "6006:6006"
      - "4317:4317"
    environment:
      PHOENIX_SQL_DATABASE_URL: postgresql://ai:ai@postgres:5432/phoenix
    depends_on:
      postgres:
        condition: service_healthy

  postgres:
    image: postgres:17
    ports:
      - "5532:5432"
    environment:
      POSTGRES_USER: ai
      POSTGRES_PASSWORD: ai
      POSTGRES_DB: ai
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ai"]
      interval: 5s
      timeout: 5s
      retries: 5
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./tools/init.sql:/docker-entrypoint-initdb.d/init.sql
      - ./tools/seed_data.sql:/seed_data.sql

  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_storage:/qdrant/storage

volumes:
  pgdata:
  qdrant_storage:

What Each Service Does

PostgreSQL 17 — The workhorse database. We'll use it for:

ai database: Agent sessions, memories, and agno's internal tables
phoenix database: Arize Phoenix stores its traces here
crm_demo database: Fake CRM data for tool-calling exercises

Port 5532 (not default 5432) avoids conflicts if you have Postgres installed locally.

Qdrant — Vector database optimized for similarity search. When your agent needs to find "relevant documents," it converts the query to a vector and searches Qdrant. We'll use it for RAG in Lesson 13.

Arize Phoenix — Open-source observability for LLM applications. Every agent call, tool invocation, and LLM request gets traced. When something breaks, Phoenix shows you exactly what happened. Access the UI at http://localhost:6006.

Database Initialization

When Postgres starts for the first time, it runs scripts in /docker-entrypoint-initdb.d/ alphabetically. We use two files:

init.sql — Schema Setup

This creates the databases and tables we need:

tools/init.sql
-- Initialize databases for the AI Agents tutorial

-- Create Phoenix telemetry database
CREATE DATABASE phoenix;

-- Create CRM demo database
CREATE DATABASE crm_demo;

-- Connect to crm_demo and create schema + seed data
\c crm_demo

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    company VARCHAR(100),
    industry VARCHAR(50),
    created_at DATE DEFAULT CURRENT_DATE
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(id),
    product VARCHAR(100),
    amount DECIMAL(10,2),
    status VARCHAR(20),
    created_at DATE DEFAULT CURRENT_DATE
);

\i /seed_data.sql

Three databases serve different purposes:

ai: Created automatically by Postgres (it's the default). Agno stores sessions and memories here.
phoenix: Phoenix's trace storage. You won't query this directly.
crm_demo: Our playground for tool-calling demos. Simple CRM with customers and orders.

seed_data.sql — Test Data

Realistic-ish data for testing agent tools. Eight companies, various order patterns:

tools/seed_data.sql
-- Seed data for CRM demo database

INSERT INTO customers (name, email, company, industry, created_at) VALUES
('Alice Johnson', 'alice@acme.com', 'Acme Corp', 'Manufacturing', '2024-01-15'),
('Bob Smith', 'bob@globex.com', 'Globex Inc', 'Technology', '2024-02-20'),
('Carol White', 'carol@initech.com', 'Initech', 'Finance', '2024-03-10'),
('David Lee', 'david@umbrella.com', 'Umbrella Corp', 'Healthcare', '2024-04-05'),
('Emma Davis', 'emma@stark.io', 'Stark Industries', 'Technology', '2024-05-12'),
('Frank Miller', 'frank@wayneent.com', 'Wayne Enterprises', 'Finance', '2024-06-18'),
('Grace Kim', 'grace@cyberdyne.ai', 'Cyberdyne Systems', 'Technology', '2024-07-22'),
('Henry Chen', 'henry@oscorp.com', 'Oscorp', 'Healthcare', '2024-08-30');

INSERT INTO orders (customer_id, product, amount, status, created_at) VALUES
-- Alice (Acme) - big spender, loyal
(1, 'Enterprise License', 5000.00, 'completed', '2024-01-20'),
(1, 'Support Package', 1200.00, 'completed', '2024-02-15'),
(1, 'Training Package', 2500.00, 'completed', '2024-06-10'),
(1, 'Enterprise License', 5000.00, 'completed', '2024-07-01'),
-- Bob (Globex) - growing account
(2, 'Starter License', 500.00, 'completed', '2024-02-25'),
(2, 'Professional License', 2000.00, 'completed', '2024-05-15'),
(2, 'Support Package', 1200.00, 'pending', '2024-09-01'),
-- Carol (Initech) - enterprise, some issues
(3, 'Enterprise License', 5000.00, 'completed', '2024-03-15'),
(3, 'Training Package', 2500.00, 'cancelled', '2024-04-20'),
(3, 'Support Package', 1200.00, 'completed', '2024-05-01'),
(3, 'Consulting', 8000.00, 'completed', '2024-08-15'),
-- David (Umbrella) - new, cautious
(4, 'Starter License', 500.00, 'completed', '2024-04-10'),
(4, 'Professional License', 2000.00, 'pending', '2024-09-05'),
-- Emma (Stark) - tech-savvy, fast mover
(5, 'Enterprise License', 5000.00, 'completed', '2024-05-20'),
(5, 'API Access', 3000.00, 'completed', '2024-06-01'),
(5, 'Support Package', 1200.00, 'completed', '2024-06-15'),
(5, 'Custom Integration', 15000.00, 'pending', '2024-09-10'),
-- Frank (Wayne) - big deal pending
(6, 'Enterprise License', 5000.00, 'completed', '2024-06-25'),
(6, 'Consulting', 8000.00, 'pending', '2024-09-01'),
(6, 'Training Package', 2500.00, 'pending', '2024-09-15'),
-- Grace (Cyberdyne) - AI company, power user
(7, 'Enterprise License', 5000.00, 'completed', '2024-07-28'),
(7, 'API Access', 3000.00, 'completed', '2024-08-05'),
(7, 'Custom Integration', 15000.00, 'completed', '2024-08-20'),
-- Henry (Oscorp) - refund case
(8, 'Professional License', 2000.00, 'completed', '2024-09-01'),
(8, 'Support Package', 1200.00, 'refunded', '2024-09-10');

Each customer has a story: Alice is loyal, Bob is growing, Carol had a cancelled order, Henry got a refund. This variety lets you test agent tools against realistic edge cases.

Reset Script

As you experiment, databases accumulate cruft—old sessions, test memories, failed experiments. This script returns everything to a clean state:

tools/reset_data.py
"""
Reset Script: Blank slate for all data.

Usage: uv run tools/reset_data.py

Clears:
- Agent memories, sessions, knowledge (ai database)
- Phoenix traces (phoenix database)
- CRM data (crm_demo database)
- Qdrant vectors
"""

from pathlib import Path
import psycopg

SEED_SQL = Path(__file__).parent / "seed_data.sql"
from qdrant_client import QdrantClient

# AI database - drop all user tables (recreated on next run)
conn = psycopg.connect("postgresql://ai:ai@localhost:5532/ai")
cur = conn.cursor()
cur.execute("""
    SELECT schemaname, tablename FROM pg_tables
    WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
""")
tables = cur.fetchall()
for schema, table in tables:
    cur.execute(f'DROP TABLE IF EXISTS "{schema}"."{table}" CASCADE')
conn.commit()
cur.close()
conn.close()
print(f"Dropped: {len(tables)} tables in ai database")

# CRM data (crm_demo database)
crm_conn = psycopg.connect("postgresql://ai:ai@localhost:5532/crm_demo")
crm_cur = crm_conn.cursor()
crm_cur.execute("TRUNCATE orders, customers RESTART IDENTITY CASCADE;")
crm_cur.execute(SEED_SQL.read_text())
crm_conn.commit()
crm_cur.close()
crm_conn.close()
print("Reset: CRM (8 customers, 25 orders)")

# Qdrant
qdrant = QdrantClient(url="http://localhost:6333")
qdrant.delete_collection("knowledge-demo")
qdrant.delete_collection("knowledge-semantic")
print("Deleted: Qdrant collections")

# Phoenix traces - may not exist on fresh install
phoenix_conn = psycopg.connect("postgresql://ai:ai@localhost:5532/phoenix")
phoenix_cur = phoenix_conn.cursor()
phoenix_cur.execute("""
    DO $
    BEGIN
        TRUNCATE spans, traces, projects, span_annotations, trace_annotations,
                 project_sessions, project_session_annotations CASCADE;
    EXCEPTION WHEN undefined_table THEN
        NULL;
    END $;
""")
phoenix_conn.commit()
phoenix_cur.close()
phoenix_conn.close()
print("Cleared: Phoenix traces")

print("\nReset complete.")

Run it whenever you want a fresh start:

uv run tools/reset_data.py

Environment Variables

Create a .env file in your project root. Agno and other tools will read from it:

.env
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL_ID=gpt-4o-mini
PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006

tip

Add .env to your .gitignore. Never commit API keys.

Starting the Stack

# Start all services (first run downloads images)
docker compose up -d

# Verify everything is running
docker compose ps

# View logs if something fails
docker compose logs phoenix

After startup, verify each service:

Service	URL	What to Check
Phoenix	http://localhost:6006	Web UI loads
Qdrant	http://localhost:6333/dashboard	Dashboard shows "0 collections"
Postgres	`psql -h localhost -p 5532 -U ai -d crm_demo`	Can connect, tables exist

Stopping and Cleaning Up

# Stop services (data persists in volumes)
docker compose down

# Stop AND delete all data (full reset)
docker compose down -v

The -v flag removes volumes. Use it when you want to start completely fresh, including re-running init.sql.

Troubleshooting

Port conflicts: If 5532, 6006, or 6333 are in use, either stop the conflicting service or change ports in docker-compose.yml.

Postgres won't start: Check logs with docker compose logs postgres. Common issue: corrupted volume. Fix with docker compose down -v && docker compose up -d.

Phoenix shows no traces: Ensure PHOENIX_COLLECTOR_ENDPOINT is set and agents are configured to send telemetry (covered in later lessons).

Qdrant connection refused: The service takes a few seconds to start. Wait and retry.

What's Next

With infrastructure running, you're ready to build agents. In Lesson 9, we'll create a simple stateless agent—no memory, no tools, just LLM in → response out. From there, each lesson adds a capability: history, memory, tools, knowledge, teams.

The stack you just set up will support all of it.

Why This Stack?​

Prerequisites​

Project Setup​

Docker Compose​

What Each Service Does​

Database Initialization​

init.sql — Schema Setup​

seed_data.sql — Test Data​

Reset Script​

Environment Variables​

Starting the Stack​

Stopping and Cleaning Up​

Troubleshooting​

What's Next​