Module 8 · Responsible & Safe AI
Bias, Fairness & Transparency
65 min
Learning objectives
- Identify the main sources of bias in AI systems and trace them to data, labels, or design choices
- Explain why fairness involves genuine tradeoffs and cannot be reduced to a single metric
- Distinguish explainability from interpretability and describe why transparency matters
Where bias actually comes from
AI systems learn patterns from data. If the data reflects historical or social inequities, the model will faithfully reproduce — and often amplify — them. Bias is rarely the result of malicious intent; it is usually the predictable consequence of skewed data, flawed labels, or design decisions made without thinking about who could be harmed.
Algorithmic bias — Systematic, unfair differences in a system's outputs across groups of people, traceable to data, labels, or design rather than deliberate intent.
- Data bias — the training data over- or under-represents certain groups (e.g., mostly light-skinned faces in a vision dataset).
- Label bias — the 'ground truth' itself encodes human prejudice (e.g., past hiring decisions used as labels).
- Sampling/selection bias — the data collected does not match the population the model will serve.
- Aggregation bias — one model is forced to serve groups that actually behave differently.
- Deployment bias — a model is used in a context it was never designed or validated for.
Example — The biased hiring model
A company trains a résumé-screening model on ten years of its own hiring decisions. Historically it hired few women for technical roles, so the model learns to down-rank résumés that signal 'woman' — for example, mentioning a women's college or a women's sports team. The model is statistically 'accurate' at predicting past decisions, yet it automates and scales a discriminatory pattern.
Watch out
Removing the sensitive attribute (e.g., deleting 'gender' from the data) does NOT remove bias. Models infer protected attributes from proxies — postal code, school name, hobbies — so naive blindness can hide the problem while leaving it intact.
Fairness is a tradeoff, not a checkbox
There is no single correct definition of fairness. Demographic parity (equal selection rates across groups) and equalized odds (equal error rates across groups) are both reasonable, but they are mathematically incompatible except in trivial cases. Choosing a fairness criterion is an ethical and contextual decision, not a purely technical one.
| Fairness notion | What it requires | Tension |
|---|---|---|
| Demographic parity | Equal positive-outcome rates across groups | May ignore real differences in qualified base rates |
| Equalized odds | Equal true-positive AND false-positive rates across groups | Hard to meet alongside demographic parity |
| Equal opportunity | Equal true-positive rates across groups (a relaxation of equalized odds) | Can still allow unequal false-positive rates |
| Individual fairness | Similar individuals treated similarly | Requires a defensible notion of 'similar' |
You cannot satisfy all fairness definitions at once. A responsible team chooses, documents, and justifies which definition fits the context — and accepts the tradeoff.
Transparency and explainability
Stakeholders increasingly have a right to understand AI decisions that affect them. Explainability is about giving a human a faithful, understandable reason for an output. Interpretability is a stronger, model-level property — whether the model's internal workings are themselves understandable (a short decision tree is interpretable; a billion-parameter network is not).
Analogy
Explainability is like a doctor explaining a diagnosis in plain language even though the underlying biology is complex. Interpretability is like being able to read the entire medical chart and follow every step yourself.
Explainability (XAI) — The degree to which a human can understand why an AI system produced a particular output.
Knowledge check
Quick practice — not part of your exam score.
A résumé-screening model trained on a company's past hiring decisions begins down-ranking women. The sensitive attribute 'gender' was never included as a feature. What is the most accurate explanation?
Why can a team generally NOT satisfy demographic parity and equalized odds simultaneously?
Which best distinguishes explainability from interpretability?
Sign in to track your progress and mark lessons complete.
Sign in