Step 2: Learning Rate Sensitivity

The most important knob to turn

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

The most important hyperparameter

Learning rate controls how far the optimizer moves the weights each update step. It is universally considered the single most impactful hyperparameter. Getting it wrong makes everything else irrelevant.

# The weight update rule
new_weight = old_weight - learning_rate * gradient

Learning rate	Step size	Behaviour	Loss curve pattern
`0.0001`	Tiny	Very slow convergence	Loss crawls down gradually, may never reach minimum in budget
`0.001`	Normal	Usually converges well	Smooth, steady decrease -- reaches minimum efficiently
`0.01`	Bigger	Faster start, may oscillate	Quick initial drop then bouncing near minimum
`0.1`	Huge	Often diverges	Loss explodes or oscillates wildly -- never converges

The experiment: three learning rates

Train the same model architecture three times, changing only the learning rate, and plot the loss curves:

from tensorflow import keras
import matplotlib.pyplot as plt

learning_rates = [0.001, 0.01, 0.1]
histories = {}

for lr in learning_rates:
    model = keras.Sequential([
        keras.layers.Dense(64, activation='relu', input_shape=(10,)),
        keras.layers.Dense(32, activation='relu'),
        keras.layers.Dense(1, activation='sigmoid'),
    ])
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss='binary_crossentropy',
        metrics=['accuracy'],
    )
    histories[lr] = model.fit(
        X_train, y_train,
        validation_data=(X_val, y_val),
        epochs=50, batch_size=32, verbose=0,
    )

# Plot all three loss curves
for lr, h in histories.items():
    plt.plot(h.history['val_loss'], label=f'lr={lr}')
plt.xlabel('Epoch')
plt.ylabel('Validation Loss')
plt.legend()
plt.title('Learning Rate Comparison')
plt.show()

Diagnosing learning rate from the loss curve

Symptom	Diagnosis	Fix
Loss barely moves after many epochs	Learning rate too low	Increase by 3-10x
Loss decreases smoothly, then plateaus	Learning rate about right	Keep it; consider lr scheduling
Loss oscillates but trends downward	Learning rate slightly high	Reduce by 2-3x
Loss spikes up or diverges to NaN	Learning rate much too high	Reduce by 10x or more

Practical starting points

Optimizer	Default learning rate	Search range
Adam	0.001	[0.0001, 0.001, 0.01]
SGD	0.01	[0.001, 0.01, 0.1]
RMSprop	0.001	[0.0001, 0.001, 0.01]

Rule of thumb: Start with the optimizer's default. If loss is unstable, reduce by 3x. If loss is too slow, increase by 3x. Three tries usually finds a working range.

Think Deeper

Try this:

You train a threat classifier with lr=0.1 and the loss oscillates wildly, never converging. You switch to lr=0.0001 and the loss barely moves after 50 epochs. What should you try, and why?

Try lr=0.001 — the Adam optimizer default. Learning rate 0.1 overshoots the loss minimum (too large steps), while 0.0001 takes steps too small to make progress in your epoch budget. In security ML, time matters: you need the model retrained and deployed before the threat landscape shifts. The middle ground balances convergence speed with stability.

Cybersecurity tie-in: In security ML, you often need to retrain models on shifting distributions (new attack types appear, old ones fade). A learning rate that worked last month may not work on this month's data. Building the habit of checking loss curves after every retrain ensures your threat detection model actually converged rather than silently failing.

← Previous ← → to navigate Next →