Quiz — Hyperparameters & Tuning

1 of 5

What is the difference between a parameter and a hyperparameter?

The model learns weights and biases. The engineer chooses hyperparameters. This distinction matters for reproducibility — hyperparameters must be version-controlled because the same training script with different hyperparameters produces a different model.

2 of 5

You train with lr=0.1 and the loss oscillates wildly. You switch to lr=0.0001 and the loss barely moves after 50 epochs. What should you try next?

Learning rate is the most important hyperparameter. Too high = oscillation/divergence; too low = slow or no progress. 1e-3 is the Adam default for a reason — it's a sane starting point for most problems.

3 of 5

You train a phishing detector with batch_size=1024 on 10,000 samples (~10 updates per epoch). What's the likely problem?

Tiny batch counts per epoch produce very smooth gradients that find sharp minima — areas of low training loss but poor generalisation. Smaller batches add gradient noise that helps the model escape sharp minima and find flatter, better-generalising ones.

4 of 5

Your grid search tests 3 widths × 3 depths × 3 learning rates × 3 batch sizes = 81 models, each taking 2 minutes. What's a faster alternative that often finds equally good hyperparameters?

Random search wins because grid search wastes effort on unimportant dimensions (e.g. testing 3 batch sizes when only learning rate matters). Random search explores more unique values per dimension in the same number of trials — usually finding better configurations faster.

5 of 5

Why must you document and version-control your hyperparameters?

Hyperparameters are part of the model's identity. If your IDS recall drops from 92% to 78% after a retrain, the first question is 'what did we change?'. Without versioned hyperparameters, you can't answer that — and you can't roll back.

End-of-lesson Quiz

Quiz complete