End-of-lesson Quiz
5 questions · Hyperparameters & Tuning
1
of 5
What is the difference between a parameter and a hyperparameter?
The model learns weights and biases. The engineer chooses hyperparameters. This distinction matters for reproducibility — hyperparameters must be version-controlled because the same training script with different hyperparameters produces a different model.
2
of 5
You train with
lr=0.1 and the loss oscillates wildly. You switch to lr=0.0001 and the loss barely moves after 50 epochs. What should you try next?
Learning rate is the most important hyperparameter. Too high = oscillation/divergence; too low = slow or no progress.
1e-3 is the Adam default for a reason — it's a sane starting point for most problems.
3
of 5
You train a phishing detector with
batch_size=1024 on 10,000 samples (~10 updates per epoch). What's the likely problem?
Tiny batch counts per epoch produce very smooth gradients that find sharp minima — areas of low training loss but poor generalisation. Smaller batches add gradient noise that helps the model escape sharp minima and find flatter, better-generalising ones.
4
of 5
Your grid search tests 3 widths × 3 depths × 3 learning rates × 3 batch sizes = 81 models, each taking 2 minutes. What's a faster alternative that often finds equally good hyperparameters?
Random search wins because grid search wastes effort on unimportant dimensions (e.g. testing 3 batch sizes when only learning rate matters). Random search explores more unique values per dimension in the same number of trials — usually finding better configurations faster.
5
of 5
Why must you document and version-control your hyperparameters?
Hyperparameters are part of the model's identity. If your IDS recall drops from 92% to 78% after a retrain, the first question is 'what did we change?'. Without versioned hyperparameters, you can't answer that — and you can't roll back.