Why early stopping?
Setting epochs=200 without early stopping means the model trains for the full 200 epochs -- even when val_loss stopped improving at epoch 25. The extra 175 epochs waste computation and allow the model to slowly overfit as it memorises noise in the training data.
| Approach | Epochs trained | Risk |
|---|---|---|
| Fixed epochs (too few) | 10 | Model has not converged -- underfitting |
| Fixed epochs (too many) | 200 | Model overfits past the optimum |
| Early stopping | 25 (automatic) | Stops at the best point automatically |
How EarlyStopping works
Keras monitors a metric (typically val_loss) and counts consecutive epochs without improvement. When the count reaches patience, training stops.
early_stop = keras.callbacks.EarlyStopping(
monitor='val_loss', # metric to watch
patience=5, # stop after 5 non-improving epochs
restore_best_weights=True, # rewind to the best epoch's weights
min_delta=0.001, # minimum change to count as improvement
)
history = model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=200, # set high -- early stopping controls actual count
batch_size=32,
callbacks=[early_stop],
verbose=0,
)
Key parameters
| Parameter | Purpose | Recommended value |
|---|---|---|
monitor | Which metric to watch | 'val_loss' (most common) |
patience | How many non-improving epochs to wait | 5-20 (higher = more tolerant of plateaus) |
restore_best_weights | Roll back to the best epoch when stopping | True (always) |
min_delta | Minimum improvement to reset the counter | 0.001 (small improvements still count) |
Reading the training report
After training with early stopping, inspect the history to understand what happened:
# How many epochs actually ran?
actual_epochs = len(history.history['loss'])
print(f"Trained for {actual_epochs} epochs (max was 200)")
# Find the best epoch
val_losses = history.history['val_loss']
best_epoch = val_losses.index(min(val_losses)) + 1
print(f"Best val_loss at epoch {best_epoch}: {min(val_losses):.4f}")
# With restore_best_weights=True, the model already uses best epoch's weights
final_val_loss = model.evaluate(X_val, y_val, verbose=0)
print(f"Current model val_loss: {final_val_loss[0]:.4f}")
Loading...
Loading...
Loading...
Think Deeper
Try this:
You set patience=5 and the model's best val_loss was at epoch 25. Training stopped at epoch 30. With restore_best_weights=True, which epoch's weights does the model use for prediction?
Epoch 25 — the epoch with the best validation loss. Without
restore_best_weights=True, the model would use the epoch 30 weights, which are worse. In security ML, this matters because those last 5 epochs of overfitting could cause the model to miss new attack patterns it had previously learned to detect.
Cybersecurity tie-in: Early stopping is essential for security ML pipelines that retrain automatically. Without it, an automated nightly retrain of a phishing classifier could silently overfit -- performing perfectly on yesterday's phishing emails but missing tomorrow's new campaigns. Early stopping with
restore_best_weights=True ensures every retrain produces the best-generalising model, not the most-memorised one.