Module 2 · How Machine Learning Works

Inside a Model: Features, Training & Inference

60 min

Learning objectives

Explain what features are and why feature quality drives model quality
Describe training as adjusting parameters to reduce error, and define the loss function's role
Differentiate the training phase from the inference phase

What a model actually sees: features

A model doesn't see a house, a customer, or an email — it sees numbers describing them. Those descriptive inputs are called features. For a house: square footage, number of bedrooms, ZIP code, age. The model learns how these features relate to the thing you want to predict. Choosing and preparing good features is often the single biggest lever on whether a model works.

Feature — A measurable, model-readable input describing one aspect of an example (e.g., square footage, word count, day of week).

Garbage in, garbage out. The most powerful algorithm cannot rescue weak or irrelevant features. Practitioners spend much of their time getting features right.

Parameters: the knobs the model turns

Inside a model are numbers called parameters (also called weights). You can picture them as tunable knobs. Each parameter controls how much a feature pushes the prediction one way or another. A simple model might have a handful of parameters; a large language model has billions. Learning is nothing more — and nothing less — than finding good settings for these knobs.

Analogy

Imagine a soundboard with hundreds of sliders. At first they're set randomly and the mix sounds terrible. Training is like an engineer nudging each slider, listening, and nudging again until the mix sounds right. The 'sound' here is how close the model's predictions are to the truth.

Training: getting less wrong, step by step

Training starts with the parameters set to random or default values, so the model's first predictions are mostly wrong. The system measures how wrong using a loss function — a single number for total error. It then nudges the parameters in the direction that reduces the loss, makes new predictions, measures again, and repeats. Over many passes through the data, the loss shrinks and predictions improve.

Loss function — A formula that scores how far a model's predictions are from the correct answers. Training minimizes this score.

Example — One training step, plainly

Predict a house price → it says $300k, actual is $400k → loss is large → adjust the knobs so square footage counts a bit more → predict again → now $370k → loss is smaller. Repeat across thousands of houses until the knobs stop improving much.

Inference: putting the trained model to work

Once training is done, the parameters are frozen and the model is deployed. Inference is the act of feeding it a new, unseen input and getting a prediction back. Inference is usually fast and cheap compared to training. The expensive, one-time learning happens during training; inference is what happens millions of times in production.

Phase	What happens	Frequency & cost
Training	Parameters adjusted to fit data; loss minimized	Done occasionally; computationally expensive
Inference	Frozen model predicts on new inputs	Done constantly in production; relatively cheap

Watch out

A model only knows the patterns present in its training data. If real-world conditions drift away from that data (new customer behavior, a new product line), accuracy quietly degrades — a problem called data drift. Trained once does not mean correct forever.

Knowledge check

Quick practice — not part of your exam score.

During training, what is the model actually adjusting?

What is the role of the loss function in machine learning?

Which statement best describes the relationship between training and inference?

← Learning from Data: Supervised, Unsupervised & Reinforcement Learning Neural Networks & Deep Learning, Intuitively →