Module 4 · Putting LLMs to Work — Prompting, RAG & Agents

Grounding Models with RAG

65 min

Learning objectives

Explain why retrieval reduces hallucination and stale answers
Describe conceptually how a RAG pipeline retrieves and uses context
Identify situations where RAG is the right tool versus where it is not

The problem RAG solves

An LLM only knows what it learned during training, frozen at some cutoff date, and it does not know your private documents at all. Ask it about last week's policy update or a customer's contract and it will either decline or, worse, confidently invent an answer. Retrieval-Augmented Generation fixes this by fetching the relevant real text and handing it to the model before it answers.

Retrieval-Augmented Generation (RAG) — A pattern that retrieves relevant documents at query time and inserts them into the prompt so the model answers from grounded source text.

Analogy

RAG turns a closed-book exam into an open-book exam. Instead of forcing the model to recall everything from memory, you let it look up the relevant pages first — and answer from what's actually in front of it.

How it works, conceptually

You don't usually search documents by exact keywords. Instead, each chunk of your documents is converted into an embedding — a list of numbers that captures its meaning. The user's question is embedded the same way, and the system finds the chunks whose embeddings are closest in meaning. Those chunks are pasted into the prompt as context, and the model is asked to answer using them.

Embedding — A numeric vector that represents the meaning of text so that similar meanings sit close together, enabling semantic search.

Prepare: split source documents into chunks and store their embeddings in a vector index.
Retrieve: embed the user's question and find the most semantically similar chunks.
Augment: insert those chunks into the prompt as supporting context.
Generate: ask the model to answer using that context, ideally citing it.

Example — Grounding the prompt

The retrieved policy text is supplied directly, and the model is instructed to stay within it.

Use ONLY the context below to answer. If the answer is not in
the context, say you don't know.

Context:
"""
Refunds are available within 30 days of purchase with a receipt.
Digital goods are non-refundable once downloaded.
"""

Question: Can a customer return a downloaded e-book after 10 days?

RAG reduces hallucination because the model answers from supplied source text rather than guessing from memory — and you can show users the sources.

When RAG is and isn't the right tool

Use RAG when	RAG is a poor fit when
Answers depend on private or frequently-changing documents	The task is reasoning or transformation, not fact lookup
You need citations to a trusted source	There is no reliable knowledge source to retrieve from
Knowledge updates often and retraining is impractical	The needed facts are general and stable enough that the base model already knows them

Watch out

RAG is only as good as what it retrieves. If the retrieval step surfaces the wrong or outdated chunk, the model will confidently answer from bad context. Garbage in, grounded garbage out — retrieval quality is the make-or-break step.

Knowledge check

Quick practice — not part of your exam score.

What is the primary reason RAG reduces hallucination?

In a typical RAG pipeline, what role do embeddings play?

For which task is RAG the LEAST appropriate solution?

← Prompting Fundamentals: Getting What You Asked For Tools, Function-Calling & AI Agents →