Lesson 3: Vector DBs Basics

Topics Covered

What a vector database is and its main building blocks (collections, points, payloads).
How payload filters and indexing work in Qdrant.
Fixed constraints for vector size and distance metric.
How HNSW indexing enables fast similarity search.

When you have just a handful of embeddings, you can compare them in memory using Python and numpy. But as soon as you have thousands or millions of them, you need something faster and more organized. Enter a vector database.

A vector database stores your embeddings and lets you search for the most similar ones to a given vector. It’s like a search engine, but instead of matching words, it matches meaning based on vector similarity.

Three popular options are:

Qdrant: Open-source, fast, supports filters and metadata, easy to run locally or in the cloud.
ChromaDB: Open-source, simple to set up, good for quick experiments and smaller projects.
Pinecone: A fully managed cloud service, scales easily, but paid for larger workloads.

Core Parts of a Vector DBs (Qdrant)

Before we start writing code, let’s break down the main building blocks you should understand when working with in Qdrant.

Collection

A collection is like a table in a relational database. It’s where related records (called points) are stored together.

Just like all rows in an RDBMS table follow the same schema, all vectors in a collection share the same size (dimensionality, e.g., 384 or 768) and must use the same distance metric (cosine, dot, or Euclidean), defined when creating a collection.

Collections help keep your data organized. For example, you might store all user profile embeddings in one collection, and product description embeddings in another. This separation keeps searches faster and avoids mixing vectors that were generated by different models or for different purposes.

Point

A point is a single record in a collection. It has three parts:

ID: a unique identifier (integer or string).
Vector: the embedding itself (list of numbers).
Payload: optional metadata that describes the point (e.g., {"type": "note", "user": "alice"}).

Example:

{
  "id": "3f9a1b82-8c2d-4a46-b6c5-9d2a6e84c1f5",
  "vector": [0.12, -0.09, ...],
  "payload": {
    "appliance": "dishwasher",
    "manufacturer": "Zanussi",
    "source_document": "https://example.eu-central-1.amazonaws.com"
  }
}

Payload

Each point in a vector database can include a payload - extra metadata stored alongside the vector. This can be anything in a JSON-like format, such as tags, categories, user IDs, timestamps, descriptions, or even a link to a source document in object storage.

The payload isn’t just for labels. You can use it to filter searches, for example, only looking at points where the appliance is "dishwasher" or narrowing results to a specific user or topic. This is especially useful when your database holds mixed content and you want to search within a smaller subset.

To make filtering faster, Qdrant can create indexes on payload fields, much like a traditional database. Once indexed, queries run faster because the database doesn’t have to read every point from disk. If you often filter by a field like type or user, indexing it can significantly speed up searches.

Constraints to Remember

Vector Size is Fixed
All vectors in a collection must have the same number of dimensions.
If your model outputs 384 dimensions, every point in that collection must have exactly 384 numbers.

Distance Metric is Fixed
When you create a collection, you choose how similarity is measured:

Cosine → compares direction only (good for semantic meaning).
Dot product → compares direction + length (good for normalized embeddings, often faster).
Euclidean → compares straight-line distance (good when absolute position matters).

You cannot change this later for the same collection.

How Vector Databases Find Matches in Milliseconds

A payload index in Qdrant helps speed up filtering, but it’s only part of the picture. To actually find the nearest vectors efficiently, Qdrant uses a vector index — a special data structure optimized for high-dimensional similarity search.

Comparing to Relational Databases

In a relational database:

You store rows with columns (e.g., name, age).
You can create an index (e.g., a B-tree) on a column like age so the database can find all rows where age > 30 quickly without performing a full table scan.

In a vector database:

Instead of columns, we have vectors (often hundreds of numbers each).
There’s no simple ordering to sort vectors for "closest" match, so a B-tree won’t help.
We use approximate nearest neighbor (ANN) algorithms that can jump directly to likely matches without checking every vector.

NSW and HNSW: The Basics

Qdrant uses HNSW (Hierarchical Navigable Small World), one of the fastest and most accurate ANN algorithms.

NSW (Navigable Small World)

Imagine every vector as a city on a huge map. Between some of these cities, there are "roads" that connect them, but only to the ones nearby. When the database searches for the closest match, it doesn’t visit every city. Instead, it starts in one city, looks around, and follows the roads toward places that seem closer to the target. At each stop, it repeats the process, getting nearer and nearer until it reaches the closest city it can find.

HNSW (Hierarchical NSW)

HNSW improves NSW by adding multiple layers:

Top layer: only a few points with long “express” roads — you can move quickly across the space.
Middle layers: more points, shorter roads — you get closer to your target area.
Bottom layer: dense connections — you do the final fine-grained search.

This hierarchy makes it possible to:

Search millions of vectors in milliseconds.
Get results that are nearly as accurate as checking every single vector.

Other ANN Techniques You Might Hear About

While Qdrant uses HNSW, other vector databases and libraries might use:

IVF (Inverted File Index) – splits the space into clusters, searches only a few clusters.
PQ / OPQ (Product Quantization) – compresses vectors to save memory, trades some accuracy for speed.
Graph-based variants – like NSG or Vamana, similar ideas to NSW/HNSW with different optimizations.

Key takeaway

In relational DBs, indexes speed up lookups by using sorted keys and tree structures.

In vector DBs, indexes like HNSW speed up similarity search by building a graph of connections between points, allowing the search to hop quickly toward the nearest neighbors instead of scanning all points.

Exercise 3

Now it’s time to put everything we’ve learned into practice. In this exercise, we’ll:

Run a local Qdrant instance using Docker.
Generate embeddings for a dummy dataset (using the models we’ve discussed).
Store those embeddings in Qdrant with useful payload data.
Run a few sample searches to see how it all works together.

By the end, you’ll see how the key pieces - embeddings, embedding models, Ollama, and now vector databases - fit together into a complete workflow.

Start a Local Qdrant Instance

We’ll start by pulling and running the latest Qdrant Docker image. To keep your data persistent and easy to inspect, it’s a good idea to mount a dedicated directory from your host machine into the container. This way, you can later explore the files and directory structure that Qdrant creates.

In this example, we’ll mount /data/qdrant on the host to /qdrant/storage inside the container:

docker run -p 6333:6333 \
           -p 6334:6334 \
           -v "/data/qdrant:/qdrant/storage" \
           qdrant/qdrant

Qdrant will now be running locally and available at:

HTTP API: http://localhost:6333
gRPC API: http://localhost:6334

You can also open the built-in dashboard at http://localhost:6333/dashboard to explore the user interface and get familiar with its features.

Add python dependencies

Use uv to add the client libraries you need. qdrant_client is the official Python client for Qdrant.

uv add qdrant_client

Create and Populate a Collection

Now that we have a local instance running, let’s proceed with populating the database. In the script we first probe the embedding model to find out the vector size, then (re-)create a Qdrant collection with matching settings.

probe_vector_size() sends a sample string to Ollama and checks the length of the returned vector. Ollama does not expose at the time of writing this tutorial to query the vector size the embedding model will generate.
create_collection(...) deletes any existing appliances collection and creates a new one with:
- Vector size set to the probed dimension.
- Distance metric set to cosine (a good default for semantic similarity search).

After that, the script generates dummy records, converts them to embeddings, and uploads them to Qdrant in batches.

main.py
#!/usr/bin/env python3
import uuid
import random
import ollama
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

QDRANT_URL = "http://localhost:6333"
COLLECTION = "appliances"
EMBED_MODEL = "bge-m3:latest"  # or "nomic-embed-text"

APPLIANCES = [
    "dishwasher", "washing_machine", "dryer", "refrigerator", "microwave",
    "oven", "cooktop", "coffee_machine", "vacuum_cleaner", "tv",
]
MANUFACTURERS = ["Miele", "Amica", "Zanussi", "Bosch", "Samsung", "LG", "Whirlpool", "Siemens", "Electrolux", "Beko"]
COLORS = ["white", "black", "silver", "stainless steel", "red", "blue", "matte grey"]

def probe_vector_size():
    out = ollama.embed(model=EMBED_MODEL, input="probe")
    return len(out["embeddings"][0])

def create_collection(client, dim):
    try:
        client.delete_collection(collection_name=COLLECTION)
    except Exception:
        pass
    client.create_collection(
        collection_name=COLLECTION,
        vectors_config=VectorParams(size=dim, distance=Distance.COSINE),
    )

def make_record():
    return {
        "id": str(uuid.uuid4()),
        "appliance": random.choice(APPLIANCES),
        "manufacturer": random.choice(MANUFACTURERS),
        "color": random.choice(COLORS),
    }

def main(n=200, batch_size=50):
    print("Connecting to Qdrant…")
    client = QdrantClient(url=QDRANT_URL)

    print("Probing embedding size via Ollama…")
    dim = probe_vector_size()
    print(f"Embedding size: {dim}")

    print("Creating collection…")
    create_collection(client, dim)

    print(f"Generating {n} records and embeddings…")
    for start in range(0, n, batch_size):
        end = min(start + batch_size, n)
        batch = [make_record() for _ in range(end - start)]

        texts = [f"{r['manufacturer']} {r['appliance']} in {r['color']}" for r in batch]
        emb = ollama.embed(model=EMBED_MODEL, input=texts)["embeddings"]

        points = [
            PointStruct(
                id=r["id"],
                vector=vec,
                payload={
                    "appliance": r["appliance"],
                    "manufacturer": r["manufacturer"],
                    "color": r["color"],
                },
            )
            for r, vec in zip(batch, emb)
        ]

        client.upsert(collection_name=COLLECTION, points=points, wait=True)
        print(f"Inserted {end} / {n}")

    stats = client.count(COLLECTION, exact=True)
    print(f"Done. Points in collection: {stats.count}")

if __name__ == "__main__":
    main(n=200, batch_size=50)

View the Data in Qdrant UI

With the script complete, let’s confirm that our points were added.

Open your browser and go to http://localhost:6333/dashboard
From the left menu, select Collections. Examine the collections you have and review their properties, such as size, dimensionality, and distance metric.
Click on the appliances collection and examine the points in the newly created collection.

Summary

In this lesson, we covered the core concepts behind vector databases and why they are essential for working with large numbers of embeddings.
You learned about the key components in Qdrant — collections, points, and payloads — and how they parallel concepts from relational databases while being specialized for high-dimensional similarity search.

We also explored how payloads allow you to store metadata alongside vectors and how payload indexes speed up filtered searches.
The lesson introduced approximate nearest neighbor (ANN) algorithms, focusing on Qdrant’s HNSW index, which makes it possible to search millions of vectors in milliseconds by navigating a multi-layer graph of connections.

Finally, we applied these concepts in a practical exercise:

Running a local Qdrant instance.
Probing an embedding model to determine vector size.
Creating and populating a collection with generated appliance data and their embeddings.
Viewing and exploring the stored points in the Qdrant dashboard.

By the end, you should understand both the theory (how a vector DB works and what makes it fast) and the practice (how to set up and populate one for semantic search).

Core Parts of a Vector DBs (Qdrant)​

Collection​

Point​

Payload​

How Vector Databases Find Matches in Milliseconds​

Comparing to Relational Databases​

NSW and HNSW: The Basics​

NSW (Navigable Small World)​

HNSW (Hierarchical NSW)​

Other ANN Techniques You Might Hear About​

Exercise 3​

Start a Local Qdrant Instance​

Add python dependencies​

Create and Populate a Collection​

View the Data in Qdrant UI​

Summary​