Module 2 · How Machine Learning Works
Inside a Model: Features, Training & Inference
60 min
Learning objectives
- Explain what features are and why feature quality drives model quality
- Describe training as adjusting parameters to reduce error, and define the loss function's role
- Differentiate the training phase from the inference phase
What a model actually sees: features
A model doesn't see a house, a customer, or an email — it sees numbers describing them. Those descriptive inputs are called features. For a house: square footage, number of bedrooms, ZIP code, age. The model learns how these features relate to the thing you want to predict. Choosing and preparing good features is often the single biggest lever on whether a model works.
Feature — A measurable, model-readable input describing one aspect of an example (e.g., square footage, word count, day of week).
Garbage in, garbage out. The most powerful algorithm cannot rescue weak or irrelevant features. Practitioners spend much of their time getting features right.
Parameters: the knobs the model turns
Inside a model are numbers called parameters (also called weights). You can picture them as tunable knobs. Each parameter controls how much a feature pushes the prediction one way or another. A simple model might have a handful of parameters; a large language model has billions. Learning is nothing more — and nothing less — than finding good settings for these knobs.
Analogy
Imagine a soundboard with hundreds of sliders. At first they're set randomly and the mix sounds terrible. Training is like an engineer nudging each slider, listening, and nudging again until the mix sounds right. The 'sound' here is how close the model's predictions are to the truth.
Training: getting less wrong, step by step
Training starts with the parameters set to random or default values, so the model's first predictions are mostly wrong. The system measures how wrong using a loss function — a single number for total error. It then nudges the parameters in the direction that reduces the loss, makes new predictions, measures again, and repeats. Over many passes through the data, the loss shrinks and predictions improve.
Loss function — A formula that scores how far a model's predictions are from the correct answers. Training minimizes this score.
Example — One training step, plainly
Predict a house price → it says $300k, actual is $400k → loss is large → adjust the knobs so square footage counts a bit more → predict again → now $370k → loss is smaller. Repeat across thousands of houses until the knobs stop improving much.
Inference: putting the trained model to work
Once training is done, the parameters are frozen and the model is deployed. Inference is the act of feeding it a new, unseen input and getting a prediction back. Inference is usually fast and cheap compared to training. The expensive, one-time learning happens during training; inference is what happens millions of times in production.
| Phase | What happens | Frequency & cost |
|---|---|---|
| Training | Parameters adjusted to fit data; loss minimized | Done occasionally; computationally expensive |
| Inference | Frozen model predicts on new inputs | Done constantly in production; relatively cheap |
Watch out
A model only knows the patterns present in its training data. If real-world conditions drift away from that data (new customer behavior, a new product line), accuracy quietly degrades — a problem called data drift. Trained once does not mean correct forever.
Knowledge check
Quick practice — not part of your exam score.
During training, what is the model actually adjusting?
What is the role of the loss function in machine learning?
Which statement best describes the relationship between training and inference?
Sign in to track your progress and mark lessons complete.
Sign in