Lesson 4: Adding Traditional Storage

Topics Covered

Why vector databases should only store embeddings and small metadata.
How traditional storage complements a vector database for large data.
Linking vector DB results to full records via shared identifiers.
Combining Qdrant for semantic search with SQLite for detailed data retrieval.

A vector database is great for storing embeddings - the numeric vectors that represent the meaning of your data - and an optional payload with small bits of metadata (like tags, categories, or a short description). This is all it needs to do similarity search quickly.

What it’s not designed for is storing large files or big chunks of raw data. Things like entire books, long documents, logs, full-resolution images, audio files or videos simply wont fit in the storage structure offered by a vector database.

Instead, you usually keep the heavy data in traditional storage, for example, in a relational database, a document store, or object storage like S3. In the vector DB, you store only the embedding and just enough metadata to find the right record. Then, when you get a search match, you use that metadata (like an ID or a URL) to go back and fetch the full data from traditional storage.

The graph below depicts the query process.

Let’s rebuild the application to handle this scenario. Since this course doesn’t cover the basics of relational or object storage, we’ll assume you already have a good understanding of both. For simplicity, we’ll use SQLite as our rich data storage. Let’s move straight to an exercise to demonstrate this.

warning

This is not a reference architecture. It’s a mental exercise to show how you might handle storing data depending on its type. Always make sure that the storage solution you choose fits your specific scenario.

Excercise 4

Generate Test Data

In this step, we create a Python script that generates realistic appliance data, stores it in SQLite for full product details, and in Qdrant for fast vector search.

The workflow is:

Setup: Connect to SQLite and Qdrant, probe the embedding model in Ollama to get the vector size, and Create a matching Qdrant collection and an SQLite table.
Generate Records: Using predefined templates, generate 10000 dummy-definitions of an appliance. Each record includes attributes like appliance type, manufacturer, year, features, and a detailed description generated from templates; descriptions are passed to the embedding model to produce numerical vectors.
Store Data in Batches: Store full records in SQLite and the corresponding embeddings with selected metadata in Qdrant. Data is processed in batches to avoid memory issues and improve performance and each batch is embedded and inserted into both storages.

By the end, we have:

A SQLite database holding rich appliance descriptions and structured attributes.
A Qdrant collection with embeddings and key metadata, ready for semantic search.

create_data.py (Using SQLite)
#!/usr/bin/env python3
import random
import uuid
import sqlite3
import ollama
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from tqdm import tqdm

# ---- Init ----
print("Initializing connections...")
client = QdrantClient(url="http://localhost:6333")
db = sqlite3.connect('/Users/kradecki/data/sqlite/appliances.db')

EMBED_MODEL = 'bge-m3:latest'

print("Loading embedding model via Ollama...")
# get one sample embedding to detect vector size
_probe = ollama.embed(model=EMBED_MODEL, input="probe")
VECTOR_SIZE = len(_probe["embeddings"][0])
print(f"Model ready. Vector size: {VECTOR_SIZE}")

# Templates
APPLIANCES = [
    "dishwasher", "washing_machine", "dryer", "washer_dryer", "refrigerator",
    "freezer", "fridge_freezer", "microwave", "oven", "cooktop",
    "range", "hood", "coffee_machine", "kettle", "toaster",
    "blender", "mixer", "food_processor", "slow_cooker", "air_fryer",
    "pressure_cooker", "rice_cooker", "vacuum_cleaner", "robot_vacuum", "steam_mop",
    "iron", "air_purifier", "humidifier", "dehumidifier", "tv"
]

MANUFACTURERS = [
    "Miele", "Amica", "Zanussi", "Bosch", "Samsung",
    "LG", "Whirlpool", "Siemens", "Electrolux", "Beko"
]

COLORS = ["white", "black", "silver", "stainless steel", "red", "blue", "matte grey"]

CAPACITIES = [
    "7kg", "8kg", "9kg", "10kg",
    "12 place settings", "14 place settings",
    "250L", "300L", "400L", "500L",
    "500W", "1000W", "1500W", "2000W"
]

FEATURES = [
    "energy-saving mode", "WiFi connectivity", "touch controls",
    "quiet operation", "steam function", "fast cycle", "child lock",
    "self-cleaning", "adjustable shelves", "delay start timer",
    "anti-allergy program", "eco mode", "smartphone app control",
    "quick reheat", "grill function"
]

USE_CASES = [
    "families", "small apartments", "professional kitchens",
    "students", "large households", "office use",
    "holiday homes", "shared flats", "mobile homes"
]


MODELS = [
    "ProLine 500", "EcoFresh X2", "MaxWash 9000", "TurboHeat S3", "FreshKeep Pro",
    "QuickBake 300", "SteamMaster 7", "AirClean Plus", "CoolZone 400", "DryFast 5"
]

YEARS = [str(y) for y in range(2015, 2025)]
ENERGY_CLASSES = ["A+++", "A++", "A+", "A", "B", "C", "D"]
NOISE_DB = [str(n) for n in range(35, 71)]
WIDTH_CM = [str(w) for w in (45, 50, 55, 60, 70, 80, 90)]
HEIGHT_CM = [str(h) for h in (50, 60, 85, 150, 170, 180, 200)]
DEPTH_CM = [str(d) for d in (40, 45, 55, 60, 65, 70)]
PRICES = [f"{p} EUR" for p in range(50, 1501, 50)]
CONDITIONS = ["new", "like new", "excellent", "good", "fair"]
WARRANTY_MONTHS = [str(m) for m in (0, 3, 6, 12, 18, 24)]
HOURS_USED = [str(h) for h in (50, 100, 200, 500, 1000, 2000)]
LOCATIONS = ["Berlin", "Munich", "Hamburg", "Cologne", "Warsaw", "Gdansk",
             "Paris", "Madrid", "Rome", "Amsterdam", "Vienna", "Prague"]
DELIVERY_OPTIONS = ["courier", "local delivery", "pickup only", "delivery within 50km",
                    "free shipping", "shipping at buyer's cost"]

TEMPLATES = [
    "{manufacturer} {appliance} {model} in {color}, {year} release, {capacity}. Clean unit in {condition} condition with {warranty_months}-month warranty. Key features: {feature}, {feature2}, plus easy controls. Rated {energy_class} for low bills; noise ~{noise_db} dB. Size: {width_cm}×{height_cm}×{depth_cm} cm. Suits {use_case}. Price {price}. Pickup {location} or {delivery}.",
    "For sale: {manufacturer} {appliance} {model}, {color}, {capacity}. Great for {use_case}. You get {feature} and {feature2}, simple setup, and steady performance. Energy class {energy_class}; quiet at about {noise_db} dB. Used {hours_used} hours. Comes with manual and power cord. Can show working video. {delivery} or pickup in {location}. Price {price}.",
    "{manufacturer}'s {appliance} handles daily tasks with {feature} and {feature2}. Finish: {color}. Capacity: {capacity}. Good for {use_case}. Low energy draw ({energy_class}) and calm noise level ({noise_db} dB). Dimensions {width_cm}/{height_cm}/{depth_cm} cm. {condition} condition; cleaned and tested. Invoice available. Price {price}. Pickup {location}; shipping via {delivery} on request.",
    "Listing: {manufacturer} {appliance}, model {model}, {year}. Color {color}, capacity {capacity}. Fast start, clear display, and modes for {use_case}. Main perks: {feature}, {feature2}. Power efficient ({energy_class}). Noise about {noise_db} dB. Includes original accessories if needed. Stored indoors. Price {price}. Meet in {location} or send with {delivery}.",
    "Gently used {manufacturer} {appliance} in {color}. Big {capacity} capacity, ideal for {use_case}. Features include {feature} and {feature2}. Simple to keep clean, sturdy build. Energy label {energy_class}; quiet run ~{noise_db} dB. Size {width_cm}×{height_cm}×{depth_cm} cm. Condition: {condition}. Warranty {warranty_months} months. Price {price}. Pickup {location}/ship {delivery}.",
    "{appliance} by {manufacturer}, {model} {year}. Color: {color}. {capacity} capacity with quick cycles and presets for {use_case}. Comes with {feature} and also {feature2}. Efficient ({energy_class}), noise {noise_db} dB. I can demo on pickup. Freshly wiped down, no odors. Price {price}. Location {location}; {delivery} possible at buyer's cost.",
    "Selling my {manufacturer} {appliance} ({model}). Finish {color}, capacity {capacity}. Great for {use_case}. Has {feature}, {feature2}, and a clear front panel. Runs smooth; energy class {energy_class}. Measured around {noise_db} dB in normal mode. Few cosmetic marks only. Price {price}. Pickup {location}. Can ship via {delivery} with careful packing.",
    "Upgrade your setup with this {manufacturer} {appliance}. {color} body, roomy {capacity}. Strong at {use_case} thanks to {feature} and {feature2}. Uses little power ({energy_class}); steady noise around {noise_db} dB. Size {width_cm}/{height_cm}/{depth_cm}. Includes cable, quick guide, and mounting parts if needed. {condition} condition. Price {price}. {delivery} or collect in {location}.",
    "Clean {manufacturer} {appliance} {model}, {color}. Capacity {capacity}. Works great; tested on all main modes. Highlights: {feature}, {feature2}. Power rating {energy_class}. Quiet level near {noise_db} dB. Good for {use_case}. Light signs of use only. Comes from smoke-free home. Price {price}. Located {location}; {delivery} available.",
    "{manufacturer} {appliance} with {capacity} capacity in {color}. Simple menu, quick start, and safe lock. Feature set: {feature}, {feature2}. Saves power ({energy_class}) and stays around {noise_db} dB. Fits most spaces: {width_cm}×{height_cm}×{depth_cm} cm. Perfect for {use_case}. Still under warranty ({warranty_months} months). Asking {price}. Pickup {location} or ship {delivery}.",
    "Well-kept {manufacturer} {appliance}, model {model}. Color {color}, capacity {capacity}. Ready for {use_case}. You'll find {feature} plus {feature2}. Low running cost ({energy_class}). Noise reading {noise_db} dB on standard program. Includes cleaning kit and manual PDF. Price {price}. Meet in {location}. Courier via {delivery} possible.",
    "Owner sale: {appliance} by {manufacturer}, {year}. {color} finish, {capacity}. Strong daily driver for {use_case}. Key options: {feature}, {feature2}. Rated {energy_class}. Noise about {noise_db} dB measured with app. Clean tray and seals. Minor scuffs that don't affect use. Price {price}. {delivery} or collection in {location}.",
    "Refreshed and tested {manufacturer} {appliance}. Capacity {capacity}, good for {use_case}. Comes in {color}. Has {feature} and {feature2}. Runs cool and efficient ({energy_class}); steady noise near {noise_db} dB. Size: {width_cm}/{height_cm}/{depth_cm} cm. Includes basic parts; extra rack available on request. Price {price}. Pickup {location} or ship via {delivery}.",
    "Compact yet capable {manufacturer} {appliance} {model}. {color} shell, {capacity}. Modes for {use_case}. Standout features: {feature}, {feature2}. Energy class {energy_class}; quiet profile {noise_db} dB. Clean interior, no limescale spots. {condition} condition. Price {price}. Based in {location}. Can arrange {delivery}.",
    "High-capacity {manufacturer} {appliance} ideal for {use_case}. {capacity}, {color}. Feature pack includes {feature} and {feature2}. Uses less power ({energy_class}) and keeps noise near {noise_db} dB. Fits cabinets with {width_cm} cm width. Comes with hose/cable set. One owner. Price {price}. Pickup {location}; {delivery} at buyer's request.",
    "{manufacturer} {appliance} — model {model}, year {year}. {color} colorway, {capacity}. Clear display, quick program, and safe stop. Features: {feature}, {feature2}. Good rating ({energy_class}). Noise {noise_db} dB in normal use. Great for {use_case}. Fresh filter. Price {price}. I can send more photos or a test clip. {location} / {delivery}.",
    "Selling due to remodel: {appliance} by {manufacturer}. {color}, {capacity}. Works 100%, no leaks or errors. Includes {feature} and {feature2}. Efficient ({energy_class}); measured {noise_db} dB. Ideal for {use_case}. Comes with receipt copy; {warranty_months} months left. Price {price}. Collect {location} or ship {delivery}.",
    "{manufacturer} {appliance} with user-friendly panel. {color}, capacity {capacity}. Strong set for {use_case}. Helpful extras: {feature}, {feature2}. Power label {energy_class}; noise around {noise_db} dB. Outer case has minor marks only. Cleaned before sale. Price {price}. {location} pickup preferred; {delivery} possible.",
    "Neat {appliance} from {manufacturer}, {model}. Finish {color}. {capacity} capacity handles daily loads for {use_case}. Includes {feature} plus {feature2}. Runs on low power ({energy_class}) and stays close to {noise_db} dB. Size {width_cm}×{height_cm}×{depth_cm}. Comes with starter kit. Fair price {price}. {location} or {delivery}.",
    "Good value {manufacturer} {appliance}. {color} front, {capacity}. Best for {use_case}. Has {feature} and {feature2}; simple to maintain. Energy class {energy_class}, noise {noise_db} dB. {condition} condition with normal wear. Price {price}. Meet in {location}. Shipping with {delivery} if needed."
]

def setup_sqlite():
    print("Setting up SQLite table...")
    db.execute("DROP TABLE IF EXISTS appliances")
    db.execute("""
        CREATE TABLE appliances (
            id TEXT PRIMARY KEY,
            appliance TEXT,
            manufacturer TEXT,
            model TEXT,
            year TEXT,
            energy_class TEXT,
            noise_db TEXT,
            width_cm TEXT,
            height_cm TEXT,
            depth_cm TEXT,
            price TEXT,
            condition TEXT,
            warranty_months TEXT,
            hours_used TEXT,
            location TEXT,
            delivery TEXT,
            description TEXT,
            color TEXT,
            capacity TEXT,
            feature TEXT,
            feature2 TEXT,
            use_case TEXT
        )
    """)
    print("SQLite table created successfully!")

def setup_qdrant():
    print("Setting up Qdrant collection...")
    collection_name = "appliances"
    try:
        client.delete_collection(collection_name=collection_name)
        print("Deleted existing collection")
    except Exception:
        pass
    client.create_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(size=VECTOR_SIZE, distance=Distance.COSINE),
    )
    # index for manufacturer
    client.create_payload_index(
        collection_name=collection_name,
        field_name="manufacturer",
        field_schema="keyword"
    )
    print("Qdrant collection created successfully!")

def pick_two_features():
    f1, f2 = random.sample(FEATURES, 2)
    return f1, f2

def make_record():
    appliance = random.choice(APPLIANCES)
    manufacturer = random.choice(MANUFACTURERS)
    color = random.choice(COLORS)
    capacity = random.choice(CAPACITIES)
    feature, feature2 = pick_two_features()
    use_case = random.choice(USE_CASES)
    model_name = random.choice(MODELS)
    year = random.choice(YEARS)
    energy = random.choice(ENERGY_CLASSES)
    noise = random.choice(NOISE_DB)
    width = random.choice(WIDTH_CM)
    height = random.choice(HEIGHT_CM)
    depth = random.choice(DEPTH_CM)
    price = random.choice(PRICES)
    condition = random.choice(CONDITIONS)
    warranty = random.choice(WARRANTY_MONTHS)
    hours_used = random.choice(HOURS_USED)
    location = random.choice(LOCATIONS)
    delivery = random.choice(DELIVERY_OPTIONS)
    template = random.choice(TEMPLATES)

    description = template.format(
        manufacturer=manufacturer,
        appliance=appliance.replace("_", " "),
        model=model_name,
        year=year,
        energy_class=energy,
        noise_db=noise,
        width_cm=width,
        height_cm=height,
        depth_cm=depth,
        price=price,
        condition=condition,
        warranty_months=warranty,
        hours_used=hours_used,
        location=location,
        delivery=delivery,
        color=color,
        capacity=capacity,
        feature=feature,
        feature2=feature2,
        use_case=use_case
    )

    return {
        "id": str(uuid.uuid4()),
        "appliance": appliance,
        "manufacturer": manufacturer,
        "model": model_name,
        "year": year,
        "energy_class": energy,
        "noise_db": noise,
        "width_cm": width,
        "height_cm": height,
        "depth_cm": depth,
        "price": price,
        "condition": condition,
        "warranty_months": warranty,
        "hours_used": hours_used,
        "location": location,
        "delivery": delivery,
        "description": description,
        "color": color,
        "capacity": capacity,
        "feature": feature,
        "feature2": feature2,
        "use_case": use_case
    }

def generate_appliance_data(n=10000, batch_size=100):
    print(f"Generating {n} appliance records...")
    setup_sqlite()
    setup_qdrant()

    for batch_start in tqdm(range(0, n, batch_size), desc="Processing batches"):
        batch_end = min(batch_start + batch_size, n)
        batch_records = []
        batch_descriptions = []

        for _ in range(batch_end - batch_start):
            record = make_record()
            batch_records.append(record)
            batch_descriptions.append(record["description"])

        # --- Ollama batch embeddings ---
        emb_out = ollama.embed(model=EMBED_MODEL, input=batch_descriptions)
        embeddings = emb_out["embeddings"]

        db.executemany("""
            INSERT INTO appliances (
                id, appliance, manufacturer, model, year, energy_class, noise_db,
                width_cm, height_cm, depth_cm, price, condition, warranty_months,
                hours_used, location, delivery, description, color, capacity, feature, feature2, use_case
            ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, [
            (
                r["id"], r["appliance"], r["manufacturer"], r["model"], r["year"], r["energy_class"], r["noise_db"],
                r["width_cm"], r["height_cm"], r["depth_cm"], r["price"], r["condition"], r["warranty_months"],
                r["hours_used"], r["location"], r["delivery"], r["description"], r["color"], r["capacity"],
                r["feature"], r["feature2"], r["use_case"]
            )
            for r in batch_records
        ])

        points = [
            PointStruct(
                id=record["id"],
                vector=embedding,
                payload={
                    "appliance": record["appliance"],
                    "manufacturer": record["manufacturer"],
                    "year": record["year"],
                    "energy_class": record["energy_class"],
                    "color": record["color"]
                }
            )
            for record, embedding in zip(batch_records, embeddings)
        ]

        try:
            client.upsert(collection_name="appliances", wait=True, points=points)
        except Exception as e:
            print(f"Error upserting batch: {e}")

    db.commit()
    print(f"Successfully generated and stored {n} appliance records!")
    sqlite_count = db.execute("SELECT COUNT(*) FROM appliances").fetchone()[0]
    
    try:
        qdrant_count = client.count(collection_name="appliances", exact=True)
        vector_count = qdrant_count.count
    except Exception as e:
        print(f"Error getting Qdrant count: {e}")
        vector_count = "Error"
    
    print(f"SQLite records: {sqlite_count}")
    print(f"Qdrant vectors: {vector_count}")

if __name__ == "__main__":
    generate_appliance_data(n=10000, batch_size=200)
    db.close()
    print("Data generation complete!")

Querying Data

When your dataset is already generated and stored in both SQLite and Qdrant, you can run semantic searches against it using the script below. The script takes a natural language query, embeds it with Ollama, queries Qdrant for the closest matching vectors, and then looks up full product details in the local SQLite database. It also supports optional manufacturer filtering, JSON output, and performance timing.

query_data.py
#!/usr/bin/env python3
import argparse
import json
import logging
import time
from typing import List, Optional, Dict, Any

import sqlite3
import ollama
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue

# ---- Config ----
QDRANT_URL = "http://localhost:6333"
SQLITE_PATH = "/Users/kradecki/data/sqlite/appliances.db"
COLLECTION = "appliances"
EMBED_MODEL = "bge-m3:latest"


class Timer:
    def __init__(self):
        self.t0 = time.perf_counter()
        self.last = self.t0
        self.spans: Dict[str, float] = {}

    def mark(self, name: str):
        now = time.perf_counter()
        self.spans[name] = now - self.last
        self.last = now

    def total(self) -> float:
        return time.perf_counter() - self.t0

    def as_ms(self) -> Dict[str, int]:
        d = {k: int(v * 1000) for k, v in self.spans.items()}
        d["total_ms"] = int(self.total() * 1000)
        return d


def build_filter(manufacturers: Optional[List[str]]) -> Optional[Filter]:
    if not manufacturers:
        return None
    fixed = [m.strip().title() for m in manufacturers if m.strip()]
    should = [FieldCondition(key="manufacturer", match=MatchValue(value=m)) for m in fixed]
    return Filter(should=should) if should else None


def search_qdrant(client: QdrantClient, query_vec: List[float], top_k: int,
                  mfilter: Optional[Filter]):
    res = client.query_points(
        collection_name=COLLECTION,
        query=query_vec,
        query_filter=mfilter,
        with_payload=True,
        limit=top_k,
    ).points
    return res


def fetch_sqlite_rows(db, ids: List[str]):
    if not ids:
        return {}

    placeholders = ",".join(["?"] * len(ids))
    query = f"""
        SELECT id, appliance, manufacturer, model, year, energy_class, noise_db,
               width_cm, height_cm, depth_cm, price, condition, warranty_months,
               hours_used, location, delivery, description, color, capacity, feature, feature2, use_case
        FROM appliances
        WHERE id IN ({placeholders})
    """
    rows = db.execute(query, ids).fetchall()
    out = {r[0]: {
        "id": r[0],
        "appliance": r[1],
        "manufacturer": r[2],
        "model": r[3],
        "year": r[4],
        "energy_class": r[5],
        "noise_db": r[6],
        "width_cm": r[7],
        "height_cm": r[8],
        "depth_cm": r[9],
        "price": r[10],
        "condition": r[11],
        "warranty_months": r[12],
        "hours_used": r[13],
        "location": r[14],
        "delivery": r[15],
        "description": r[16],
        "color": r[17],
        "capacity": r[18],
        "feature": r[19],
        "feature2": r[20],
        "use_case": r[21],
    } for r in rows}
    return out


def main():
    parser = argparse.ArgumentParser(description="Semantic search over appliances with optional brand filter.")
    parser.add_argument("text", help="Query text, e.g. 'clean dishes'")
    parser.add_argument("--manufacturer", "-m", action="append",
                        help="Filter by manufacturer. Can be used multiple times, e.g. -m Bosch -m Miele")
    parser.add_argument("--top", "-k", type=int, default=5, help="Top K results (default: 5)")
    parser.add_argument("--json", action="store_true", help="Print JSON only (results)")
    parser.add_argument("--trace", action="store_true", help="Print timing breakdown")
    parser.add_argument("--log-level", default="WARNING",
                        help="logging level (DEBUG, INFO, WARNING, ERROR)")
    args = parser.parse_args()

    logging.basicConfig(level=getattr(logging, args.log_level.upper(), logging.WARNING),
                        format="%(levelname)s %(message)s")

    t = Timer()

    # Init
    client = QdrantClient(url=QDRANT_URL)
    db = sqlite3.connect(SQLITE_PATH)
    t.mark("init")

    # Embed via Ollama
    emb_out = ollama.embed(model=EMBED_MODEL, input=args.text)
    embedding = emb_out["embeddings"][0]
    t.mark("embed_query")

    # Qdrant
    mfilter = build_filter(args.manufacturer)
    hits = search_qdrant(client, embedding, args.top, mfilter)
    t.mark("qdrant_query")

    # SQLite
    ids = [str(h.id) for h in hits]
    sqlite_rows = fetch_sqlite_rows(db, ids)
    t.mark("sqlite_fetch")

    # Build results
    results: List[Dict[str, Any]] = []
    for h in hits:
        rid = str(h.id)
        row = sqlite_rows.get(rid)
        results.append({
            "id": rid,
            "score": h.score,
            "payload": h.payload,
            "record": row,
        })
    db.close()
    t.mark("build_and_close")

    timings_ms = t.as_ms()

    if args.json:
        out = {"results": results, "timings_ms": timings_ms if args.trace else None}
        print(json.dumps(out, indent=2, ensure_ascii=False))
        return

    # Pretty print
    if not results:
        print("No results.")
    else:
        print(f"\nQuery: {args.text}")
        if args.manufacturer:
            print(f"Filter: {', '.join([m.strip() for m in args.manufacturer])}")
        print(f"Top {args.top}:\n")
        for i, r in enumerate(results, 1):
            rec = r["record"] or {}
            desc = rec.get("description", "(no description)")
            manu = rec.get("manufacturer") or (r["payload"] or {}).get("manufacturer")
            app = rec.get("appliance") or (r["payload"] or {}).get("appliance")
            print(f"{i}. score={r['score']:.4f}  id={r['id']}")
            print(f"   {manu} {app} - {desc}\n")

    if args.trace:
        print("Timings (ms):")
        for k, v in timings_ms.items():
            print(f"  {k}: {v}")


if __name__ == "__main__":
    main()

How to test it

Make sure:

Your SQLite database (e.g. ~/data/sqlite/appliances.db) exists and has data from the generator script.
Qdrant is running and contains the same appliances collection populated with matching IDs.

Run:

uv run query_data.py "best dishwasher" --trace

Summary

In this lesson, we demonstrated how to combine vector search with traditional storage to handle both semantic lookups and detailed record retrieval.
We used Qdrant to store embeddings along with minimal metadata for fast similarity searches, and SQLite to store complete product information such as descriptions, specifications, and pricing.

This architecture ensures:

Speed: Qdrant handles the heavy-lifting for semantic similarity.
Rich detail: SQLite holds the complete record that’s too large for vector DB payloads.
Flexibility: You can filter queries (e.g., by manufacturer) and control the output format.

The pattern shown here scales to other storage types like PostgreSQL, MongoDB, or S3 — the key is keeping a shared identifier between the vector database and the traditional storage so they can be joined during queries.

Excercise 4​

Generate Test Data​

Querying Data​

How to test it​

Summary​

Excercise 4

Generate Test Data

Querying Data

How to test it

Summary