Lesson 4: How to Get Better Answers?
Topics Covered
- RAG (Retrieval Augmented Generation): Connecting to external data.
- Fine-Tuning: Specializing the model's weights.
- Prompt Engineering: Optimizing how we ask questions.
What a model knows depends on its training data and cutoff date. But how can we improve its answers? There are three main methods.
1. RAG (Retrieval Augmented Generation)
RAG allows a model to go out and perform a search for new data that wasn't in its training set and incorporate those results back into its answer.
How it works:
- Retrieval: The system searches a corpus of your documents (PDFs, Wikis, etc.) using vector embeddings to find semantically similar information.
- Augmentation: It adds this retrieved information to your original prompt.
- Generation: The model generates a response based on this enriched context.
Pros:
- Access to up-to-date, domain-specific information.
- Reduces hallucinations by grounding answers in facts.
Cons:
- Performance cost: Retrieval adds latency.
- Processing cost: Requires infrastructure to manage vector embeddings and databases.
2. Fine-Tuning
Fine-Tuning takes an existing broad model and gives it additional specialized training on a focused dataset.
How it works:
- Updates the model's internal parameters (weights) using a specialized dataset (e.g., thousands of technical support logs).
- Uses supervised learning (input-output pairs) to teach the model to recognize domain-specific patterns.
Pros:
- Deep Domain Expertise: Great for specialized tasks.
- Faster Inference: No need to search external databases; knowledge is "baked in".
Cons:
- Training Complexity: Requires thousands of high-quality examples.
- Maintenance: Updating knowledge requires re-training.
- Catastrophic Forgetting: Risk of the model losing general capabilities.
3. Prompt Engineering
Sometimes the model already knows the answer, but we need to ask better. Prompt Engineering involves crafting inputs to guide the model's attention to relevant patterns.
How it works:
- Direction: By including examples, context, or format instructions, you direct the model's attention mechanisms.
- Activation: Techniques like "think step-by-step" activate reasoning patterns learned during training.
Pros:
- Immediate Results: No infrastructure changes or training required.
- Flexibility: easy to test and iterate.
Cons:
- Trial and Error: Finding the perfect prompt is an art.
- Limited Context: Cannot teach the model truly new or private information.
Combining Them
In practice, these methods are often used together:
- RAG retrieves specific cases.
- Prompt Engineering ensures proper formatting.
- Fine-Tuning helps master specific policies.
| Method | Strength | Trade-off |
|---|---|---|
| RAG | Up-to-date, external knowledge. | Higher latency & infrastructure cost. |
| Fine-Tuning | Deep domain expertise. | High training cost & maintenance. |
| Prompt Engineering | Flexible & immediate. | Cannot extend knowledge base. |