Module 8 · Responsible & Safe AI

Bias, Fairness & Transparency

65 min

Learning objectives

Identify the main sources of bias in AI systems and trace them to data, labels, or design choices
Explain why fairness involves genuine tradeoffs and cannot be reduced to a single metric
Distinguish explainability from interpretability and describe why transparency matters

Where bias actually comes from

AI systems learn patterns from data. If the data reflects historical or social inequities, the model will faithfully reproduce — and often amplify — them. Bias is rarely the result of malicious intent; it is usually the predictable consequence of skewed data, flawed labels, or design decisions made without thinking about who could be harmed.

Algorithmic bias — Systematic, unfair differences in a system's outputs across groups of people, traceable to data, labels, or design rather than deliberate intent.

Data bias — the training data over- or under-represents certain groups (e.g., mostly light-skinned faces in a vision dataset).
Label bias — the 'ground truth' itself encodes human prejudice (e.g., past hiring decisions used as labels).
Sampling/selection bias — the data collected does not match the population the model will serve.
Aggregation bias — one model is forced to serve groups that actually behave differently.
Deployment bias — a model is used in a context it was never designed or validated for.

Example — The biased hiring model

A company trains a résumé-screening model on ten years of its own hiring decisions. Historically it hired few women for technical roles, so the model learns to down-rank résumés that signal 'woman' — for example, mentioning a women's college or a women's sports team. The model is statistically 'accurate' at predicting past decisions, yet it automates and scales a discriminatory pattern.

Watch out

Removing the sensitive attribute (e.g., deleting 'gender' from the data) does NOT remove bias. Models infer protected attributes from proxies — postal code, school name, hobbies — so naive blindness can hide the problem while leaving it intact.

Fairness is a tradeoff, not a checkbox

There is no single correct definition of fairness. Demographic parity (equal selection rates across groups) and equalized odds (equal error rates across groups) are both reasonable, but they are mathematically incompatible except in trivial cases. Choosing a fairness criterion is an ethical and contextual decision, not a purely technical one.

Fairness notion	What it requires	Tension
Demographic parity	Equal positive-outcome rates across groups	May ignore real differences in qualified base rates
Equalized odds	Equal true-positive AND false-positive rates across groups	Hard to meet alongside demographic parity
Equal opportunity	Equal true-positive rates across groups (a relaxation of equalized odds)	Can still allow unequal false-positive rates
Individual fairness	Similar individuals treated similarly	Requires a defensible notion of 'similar'

You cannot satisfy all fairness definitions at once. A responsible team chooses, documents, and justifies which definition fits the context — and accepts the tradeoff.

Transparency and explainability

Stakeholders increasingly have a right to understand AI decisions that affect them. Explainability is about giving a human a faithful, understandable reason for an output. Interpretability is a stronger, model-level property — whether the model's internal workings are themselves understandable (a short decision tree is interpretable; a billion-parameter network is not).

Analogy

Explainability is like a doctor explaining a diagnosis in plain language even though the underlying biology is complex. Interpretability is like being able to read the entire medical chart and follow every step yourself.

Explainability (XAI) — The degree to which a human can understand why an AI system produced a particular output.

Knowledge check

Quick practice — not part of your exam score.

A résumé-screening model trained on a company's past hiring decisions begins down-ranking women. The sensitive attribute 'gender' was never included as a feature. What is the most accurate explanation?

Why can a team generally NOT satisfy demographic parity and equalized odds simultaneously?

Which best distinguishes explainability from interpretability?

← Build vs. Buy & Measuring ROI Privacy, Security & Misuse →