Step 4: Early Stopping

Stop training at the right moment

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

Why early stopping?

Setting epochs=200 without early stopping means the model trains for the full 200 epochs -- even when val_loss stopped improving at epoch 25. The extra 175 epochs waste computation and allow the model to slowly overfit as it memorises noise in the training data.

ApproachEpochs trainedRisk
Fixed epochs (too few)10Model has not converged -- underfitting
Fixed epochs (too many)200Model overfits past the optimum
Early stopping25 (automatic)Stops at the best point automatically

How EarlyStopping works

Keras monitors a metric (typically val_loss) and counts consecutive epochs without improvement. When the count reaches patience, training stops.

early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',          # metric to watch
    patience=5,                  # stop after 5 non-improving epochs
    restore_best_weights=True,   # rewind to the best epoch's weights
    min_delta=0.001,             # minimum change to count as improvement
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=200,                  # set high -- early stopping controls actual count
    batch_size=32,
    callbacks=[early_stop],
    verbose=0,
)

Key parameters

ParameterPurposeRecommended value
monitorWhich metric to watch'val_loss' (most common)
patienceHow many non-improving epochs to wait5-20 (higher = more tolerant of plateaus)
restore_best_weightsRoll back to the best epoch when stoppingTrue (always)
min_deltaMinimum improvement to reset the counter0.001 (small improvements still count)

Reading the training report

After training with early stopping, inspect the history to understand what happened:

# How many epochs actually ran?
actual_epochs = len(history.history['loss'])
print(f"Trained for {actual_epochs} epochs (max was 200)")

# Find the best epoch
val_losses = history.history['val_loss']
best_epoch = val_losses.index(min(val_losses)) + 1
print(f"Best val_loss at epoch {best_epoch}: {min(val_losses):.4f}")

# With restore_best_weights=True, the model already uses best epoch's weights
final_val_loss = model.evaluate(X_val, y_val, verbose=0)
print(f"Current model val_loss: {final_val_loss[0]:.4f}")
Loading...
Loading...
Loading...

Think Deeper

You set patience=5 and the model's best val_loss was at epoch 25. Training stopped at epoch 30. With restore_best_weights=True, which epoch's weights does the model use for prediction?

Epoch 25 — the epoch with the best validation loss. Without restore_best_weights=True, the model would use the epoch 30 weights, which are worse. In security ML, this matters because those last 5 epochs of overfitting could cause the model to miss new attack patterns it had previously learned to detect.
Cybersecurity tie-in: Early stopping is essential for security ML pipelines that retrain automatically. Without it, an automated nightly retrain of a phishing classifier could silently overfit -- performing perfectly on yesterday's phishing emails but missing tomorrow's new campaigns. Early stopping with restore_best_weights=True ensures every retrain produces the best-generalising model, not the most-memorised one.

Loading...