Module 3 · Generative AI & LLMs — Foundations

Capabilities & Hard Limits: Hallucination, Context Windows & Knowledge Cutoffs

65 min

Learning objectives

Explain why hallucination is a structural property of LLMs, not a rare bug
Describe what a context window is and the practical consequences of its limit
Explain knowledge cutoffs and how retrieval or tools extend a model's reach
Apply practical habits to use LLMs responsibly given these limits

Genuine capabilities

LLMs are remarkably good at a real set of tasks: drafting and rewriting text, summarizing long documents, translating, extracting structured information from messy text, explaining concepts, brainstorming, and generating or debugging code. These strengths are real and valuable. The skill of a practitioner is pairing these strengths with an honest grasp of the limits below.

LLMs excel where fluent language transformation is the job and a human can verify the result. They are riskiest where unverifiable facts are taken at face value.

Hallucination: confident and wrong

Because an LLM generates what is statistically plausible rather than what is verified, it will sometimes produce fluent, confident output that is simply false — a fake citation, an invented statistic, a non-existent function name. This is called hallucination. It is not a malfunction you can fully patch away; it is a direct consequence of next-token prediction with no built-in fact-checker.

Hallucination — Fluent, confident model output that is factually wrong or fabricated, arising because the model predicts plausible text rather than verified truth.

Watch out

Hallucinations are most dangerous precisely because they sound authoritative. Never treat an LLM's factual claims, citations, names, numbers, or quotes as reliable without independent verification — especially in legal, medical, financial, or safety-critical contexts.

Example — The fabricated citation

Ask an LLM for sources on a niche topic and it may return real-looking references — plausible authors, a believable journal, a tidy year — that do not exist. The format is learned from millions of real citations; the specific facts are invented to fit. This has led to real-world sanctions for professionals who filed AI-generated fake case law.

The context window: a limited working memory

A model can only consider so much text at once — the context window, measured in tokens, covering the prompt and the generated response together. Modern windows are large (often hundreds of thousands of tokens), but they are finite. Exceed the window and the earliest content falls out of view: the model effectively 'forgets' the start of a very long conversation or document.

Context window — The maximum number of tokens a model can attend to at once, including both the input prompt and the output it generates.

Analogy

The context window is like a desk, not a filing cabinet. Only what fits on the desk is in view; push new papers on and old ones slide off the edge. The model has no memory of anything that has dropped off — unless you put it back on the desk.

Knowledge cutoff: frozen at training time

An LLM's built-in knowledge stops at its training cutoff — the date after which it saw no new data. Ask about events past that date and it will not know, and may confidently guess. To work with current or private information, the model must be given that text directly: pasted into the prompt, supplied via retrieval (RAG), or fetched through a connected tool such as web search.

Knowledge cutoff — The point in time after which a model has no training data; it has no inherent knowledge of later events unless that information is provided to it.

Limit	Root cause	Practical mitigation
Hallucination	Predicts plausible text, no fact-checker	Verify claims; cite sources you control; ground with retrieval
Context window	Finite tokens in view at once	Summarize/chunk long inputs; re-supply key facts
Knowledge cutoff	No training data past a date	Provide current info via prompt, RAG, or tools

Treat factual output as a draft to verify, not an answer to trust.
Give the model the source material instead of relying on its memory.
For long tasks, summarize earlier context so key facts stay in the window.
Match the stakes to the safeguards: higher risk demands tighter human review.

Knowledge check

Quick practice — not part of your exam score.

Why is hallucination considered a structural property of LLMs rather than a simple bug?

A user pastes a 500-page document that exceeds the model's context window. What is the most likely consequence?

To get reliable answers about an event that happened after a model's knowledge cutoff, the best approach is to:

← Inside a Large Language Model: Tokens, Transformers & Next-Token Prediction Beyond Text: Images, Audio & Multimodal Models →