Lesson 13: Knowledge

Ground your agent's responses in your own documents using Retrieval-Augmented Generation (RAG).

Topics Covered

RAG Pipeline: How documents become searchable context.
Document Conversion: Turning PDFs, Word docs, and URLs into text.
Chunking Strategies: Fixed-size vs semantic chunking.
Vector Search: Finding relevant content by meaning.
Knowledge Integration: Connecting it all to your agent.

Why RAG?

LLMs know what they were trained on. They don't know:

Your company's internal docs
That PDF you downloaded yesterday
Content behind a login wall
Anything after their training cutoff

RAG solves this by retrieving relevant documents and injecting them into the prompt. The LLM generates responses grounded in your data, not just its training.

The RAG Pipeline

Ingestion (once per document):
  Document → Convert to text → Chunk → Embed → Store in Qdrant

Query (every user message):
  User question → Embed → Search Qdrant → Retrieve chunks → Inject into prompt → LLM responds

Let's break down each step.

Document Conversion

Before chunking, you need plain text. But documents come in many formats: PDF, Word, PowerPoint, HTML, even YouTube videos.

Tools like MarkItDown and Docling handle this conversion:

Tool	Strengths
MarkItDown	Microsoft's tool. Handles Office formats, PDFs, HTML, YouTube transcripts, images (via OCR), audio (via transcription). Outputs clean Markdown.
Docling	IBM's tool. Strong on complex PDFs with tables, figures, and multi-column layouts. Preserves document structure.

Both solve the same problem: take messy document formats and produce clean, parseable text. For most use cases, MarkItDown is simpler. For complex technical PDFs, Docling often does better.

Chunking: Fixed-Size vs Semantic

Once you have text, you need to split it into chunks small enough to embed and retrieve efficiently. Two approaches:

Fixed-Size Chunking

Split text every N characters with some overlap:

CHUNK_SIZE = 500
CHUNK_OVERLAP = 50

def chunk_text(text, size=CHUNK_SIZE, overlap=CHUNK_OVERLAP):
    """Split text into overlapping chunks."""
    chunks = []
    start = 0
    while start < len(text):
        end = start + size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

Simple and predictable. But it might split mid-sentence or separate related content.

Semantic Chunking

Uses embeddings to find natural topic boundaries. Compares consecutive sentences—when similarity drops below a threshold, that's a good place to split:

SemanticChunking(
    embedder=embedder,
    chunk_size=500,
    similarity_threshold=0.5,
)

Chunks stay coherent because splits happen where meaning changes.

Which to Use?

For short, well-structured documents, the difference is often negligible. Fixed-size is faster and simpler.

For long documents with distinct sections (manuals, reports, books), semantic chunking tends to produce better retrieval because related content stays together.

In practice: start with fixed-size. If retrieval quality disappoints, try semantic.

Ingestion Scripts

Fixed-Size Ingestion

tools/ingest_knowledge.py
"""
Knowledge Ingestion Script

Usage: uv run tools/ingest_knowledge.py <source>

Examples:
  uv run tools/ingest_knowledge.py https://example.com/doc.pdf
  uv run tools/ingest_knowledge.py /path/to/file.docx
  uv run tools/ingest_knowledge.py "https://youtube.com/watch?v=xxx"

Supported: PDF, Word, PowerPoint, Excel, HTML, YouTube, images, audio
"""

import sys
from dotenv import load_dotenv
from markitdown import MarkItDown
from agno.knowledge.embedder.openai import OpenAIEmbedder
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.qdrant import Qdrant
from agno.db.postgres import PostgresDb

load_dotenv()

CHUNK_SIZE = 500
CHUNK_OVERLAP = 50


def chunk_text(text, size=CHUNK_SIZE, overlap=CHUNK_OVERLAP):
    """Split text into overlapping chunks."""
    chunks = []
    start = 0
    while start < len(text):
        end = start + size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks


def main():
    source = sys.argv[1]
    print(f"Ingesting: {source}")

    # Parse document
    md = MarkItDown()
    result = md.convert(source)
    text = result.text_content
    print(f"Parsed: {len(text)} chars")

    # Chunk
    chunks = chunk_text(text)
    print(f"Chunks: {len(chunks)}")

    # Setup knowledge base
    embedder = OpenAIEmbedder(id="text-embedding-3-small")
    vector_db = Qdrant(collection="knowledge-demo", url="http://localhost:6333", embedder=embedder)
    contents_db = PostgresDb(
        db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
        knowledge_table="knowledge_contents",
    )
    knowledge = Knowledge(vector_db=vector_db, contents_db=contents_db)

    # Add chunks
    for i, chunk in enumerate(chunks):
        knowledge.add_content(
            name=f"{source}:chunk:{i}",
            text_content=chunk,
            metadata={"source": source, "chunk_index": i},
        )

    print(f"Done. Added {len(chunks)} chunks to knowledge base.")


main()

How it works:

MarkItDown converts the source (PDF, URL, YouTube, etc.) to text
Text is split into 500-character chunks with 50-char overlap
Each chunk is embedded and stored in Qdrant
Metadata tracks source and position for debugging

Semantic Ingestion

tools/ingest_knowledge_semantic.py
"""
Semantic Knowledge Ingestion Script

Usage: uv run tools/ingest_knowledge_semantic.py <source>

Examples:
  uv run tools/ingest_knowledge_semantic.py https://example.com/doc.pdf
  uv run tools/ingest_knowledge_semantic.py /path/to/file.pdf

Uses semantic chunking (splits at topic boundaries) instead of fixed-size chunks.
"""

import sys
from dotenv import load_dotenv
from agno.knowledge.embedder.openai import OpenAIEmbedder
from agno.knowledge.chunking.semantic import SemanticChunking
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.pdf_reader import PDFReader
from agno.vectordb.qdrant import Qdrant
from agno.db.postgres import PostgresDb

load_dotenv()


def main():
    source = sys.argv[1]
    print(f"Ingesting: {source}")

    embedder = OpenAIEmbedder(id="text-embedding-3-small")
    vector_db = Qdrant(collection="knowledge-semantic", url="http://localhost:6333", embedder=embedder)
    contents_db = PostgresDb(
        db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
        knowledge_table="knowledge_semantic",
    )
    knowledge = Knowledge(vector_db=vector_db, contents_db=contents_db)

    reader = PDFReader(
        chunking_strategy=SemanticChunking(
            embedder=embedder,
            chunk_size=500,
            similarity_threshold=0.5,
        )
    )

    # skip_if_exists=True is default
    if source.startswith("http"):
        knowledge.add_content(url=source, reader=reader)
    else:
        knowledge.add_content(path=source, reader=reader)

    print("Done.")


main()

How it works:

PDFReader handles document parsing
SemanticChunking analyzes sentence embeddings to find topic boundaries
Splits occur where consecutive sentences have similarity below 0.5
Chunks respect chunk_size as a soft maximum

Note: This script uses Agno's built-in PDFReader. For other formats with semantic chunking, you'd need to adapt the approach.

The Agent

Once documents are ingested, connect the knowledge base to your agent:

13-knowledge.py
"""
Lesson 13: Knowledge (RAG)

Agent searches vector database for relevant context before responding. Documents
are chunked, embedded, and stored in Qdrant. On each query, similar chunks are
retrieved and injected into the prompt as context.

Run:    uv run 13-knowledge.py
Try:    "Ingredients for Massaman curry" | "How to make Tom Yum"

Observe in Phoenix (http://localhost:6006):
- Vector search span before LLM call
- Retrieved chunks in context
- Embedding calls for query

Ingest:  uv run tools/ingest_knowledge.py <source>
         Examples: ./recipe.pdf | https://example.com | "https://youtube.com/watch?v=xxx"
Reset:   uv run tools/reset_data.py
"""

import os
from dotenv import load_dotenv
from phoenix.otel import register
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.db.postgres import PostgresDb
from agno.knowledge.embedder.openai import OpenAIEmbedder
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.qdrant import Qdrant

load_dotenv()

register(project_name="13-knowledge", auto_instrument=True, batch=True, verbose=True)

db = PostgresDb(db_url="postgresql+psycopg://ai:ai@localhost:5532/ai")
embedder = OpenAIEmbedder(id="text-embedding-3-small")

vector_db = Qdrant(collection="knowledge-demo", url="http://localhost:6333", embedder=embedder)
contents_db = PostgresDb(
    db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
    knowledge_table="knowledge_contents",
)
knowledge = Knowledge(vector_db=vector_db, contents_db=contents_db)

agent = Agent(
    name="Knowledge Assistant",
    model=OpenAIChat(id=os.getenv("OPENAI_MODEL_ID")),
    instructions="You are a helpful assistant. Answer questions using the knowledge base. Be concise.",
    knowledge=knowledge,
    search_knowledge=True,
    db=db,
    user_id="demo-user",
    enable_user_memories=True,
    add_history_to_context=True,
    num_history_runs=5,
    markdown=True,
)

agent.cli_app(stream=True)

What's New

Knowledge setup:

embedder = OpenAIEmbedder(id="text-embedding-3-small")
vector_db = Qdrant(collection="knowledge-demo", url="http://localhost:6333", embedder=embedder)
contents_db = PostgresDb(...)
knowledge = Knowledge(vector_db=vector_db, contents_db=contents_db)

embedder: Converts text to vectors (same model used for ingestion and queries)
vector_db: Qdrant stores and searches vectors
contents_db: Postgres stores the actual text content
Knowledge: Coordinates both

Agent integration:

knowledge=knowledge,
search_knowledge=True,

knowledge: The knowledge base to search
search_knowledge: Automatically search before each response

Semantic Variant

Same agent, different collection:

13-knowledge-semantic.py
"""
Lesson 13b: Knowledge with Semantic Chunking

Unlike 13a's fixed-size chunks, semantic chunking uses embeddings to find
natural topic boundaries. Splits occur where meaning changes significantly,
keeping related content together.

Run:    uv run 13-knowledge-semantic.py
Try:    "Ingredients for Massaman curry" | "How to make Tom Yum"

Observe in Phoenix (http://localhost:6006):
- Chunks aligned to topic boundaries
- Compare retrieval quality vs 13a

Ingest:  uv run tools/ingest_knowledge_semantic.py <pdf-url-or-path>
Reset:   uv run tools/reset_data.py
"""

import os
from dotenv import load_dotenv
from phoenix.otel import register
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.db.postgres import PostgresDb
from agno.knowledge.embedder.openai import OpenAIEmbedder
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.qdrant import Qdrant

load_dotenv()

register(project_name="13-knowledge-semantic", auto_instrument=True, batch=True, verbose=True)

db = PostgresDb(db_url="postgresql+psycopg://ai:ai@localhost:5532/ai")
embedder = OpenAIEmbedder(id="text-embedding-3-small")

vector_db = Qdrant(collection="knowledge-semantic", url="http://localhost:6333", embedder=embedder)
contents_db = PostgresDb(
    db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
    knowledge_table="knowledge_semantic",
)

knowledge = Knowledge(vector_db=vector_db, contents_db=contents_db)

agent = Agent(
    name="Knowledge Assistant",
    model=OpenAIChat(id=os.getenv("OPENAI_MODEL_ID")),
    instructions="You are a helpful assistant. Answer questions using the knowledge base. Be concise.",
    knowledge=knowledge,
    search_knowledge=True,
    db=db,
    user_id="demo-user",
    enable_user_memories=True,
    add_history_to_context=True,
    num_history_runs=5,
    markdown=True,
)

agent.cli_app(stream=True)

The only differences: collection="knowledge-semantic" and knowledge_table="knowledge_semantic". Different storage, same agent pattern.

Try It

First, ingest some content:

# A PDF from the web
uv run tools/ingest_knowledge.py https://example.com/cookbook.pdf

# A local file
uv run tools/ingest_knowledge.py ./recipes.docx

# A YouTube video (extracts transcript)
uv run tools/ingest_knowledge.py "https://youtube.com/watch?v=dQw4w9WgXcQ"

Then query:

uv run 13-knowledge.py

> What ingredients do I need for Massaman curry?
Based on the knowledge base, Massaman curry requires:
- Chicken or beef
- Massaman curry paste
- Coconut milk
- Potatoes
- Peanuts
- Fish sauce, palm sugar
...

> How long does it take to cook?
According to the recipe, total cook time is about 45 minutes...

The agent retrieves relevant chunks from your documents and uses them to answer.

Observe in Phoenix

Open http://localhost:6006 and look at traces for 13-knowledge.

You'll see new spans:

Embedding call: Your question gets converted to a vector
Vector search: Qdrant finds similar chunks
LLM call: Retrieved chunks appear in the context

Look at the LLM input—you'll see your instructions plus the retrieved document chunks, then the user's question. The LLM answers based on that context.

How Retrieval Works

When you ask "What ingredients do I need?":

Question is embedded using text-embedding-3-small
Qdrant finds chunks with similar embeddings (cosine similarity)
Top-k chunks (default: 5) are retrieved
Chunks are injected into the system prompt as context
LLM generates a response grounded in that context

The quality depends on:

Chunk quality: Do chunks contain coherent, complete information?
Embedding model: Does it capture semantic meaning well?
Top-k setting: Too few misses relevant content, too many adds noise

Key Concepts

Concept	This Lesson
RAG	Retrieve relevant docs, augment the prompt, generate
Chunking	Splitting documents into embeddable pieces
Embedding	Converting text to vectors for similarity search
Vector DB	Qdrant stores and searches embeddings
Retrieval	Finding relevant chunks by semantic similarity

What's Next

Your agent now has memory, tools, and knowledge. In Lesson 14, we combine multiple specialized agents into a team—a leader that coordinates specialists to handle complex tasks.

Why RAG?​

The RAG Pipeline​

Document Conversion​

Chunking: Fixed-Size vs Semantic​

Fixed-Size Chunking​

Semantic Chunking​

Which to Use?​

Ingestion Scripts​

Fixed-Size Ingestion​

Semantic Ingestion​

The Agent​

What's New​

Semantic Variant​

Try It​

Observe in Phoenix​

How Retrieval Works​

Key Concepts​

What's Next​