Step 1: Demonstrate Overfitting

Build a deliberately oversized network

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

Model capacity and overfitting

A network's capacity is how many distinct functions it can represent. More parameters = higher capacity = ability to fit more complex patterns. But if capacity far exceeds the complexity of the real signal, the model memorises the training data instead of learning the underlying pattern.

ScenarioParametersTraining samplesRatioRisk
Balanced1,00010,0000.1Low
Borderline10,00010,0001.0Medium
Overfit134,0001,60084Severe

The overfit architecture

In this exercise, you deliberately build a network that is far too large for the dataset: three Dense(256) layers on only 1,600 training samples.

LayerSizeActivationParameters
Input10----
Dense256relu10 x 256 + 256 = 2,816
Dense256relu256 x 256 + 256 = 65,792
Dense256relu256 x 256 + 256 = 65,792
Output1sigmoid256 x 1 + 1 = 257
Total~134,657

Reading the diverging loss curves

The telltale sign of overfitting is when training loss keeps decreasing but validation loss starts increasing. The gap between these two curves is the overfitting gap.

# Build the overfit model
model = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=(10,)),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid'),
])

model.compile(optimizer='adam', loss='binary_crossentropy',
              metrics=['accuracy'])

# Train and capture history
history = model.fit(X_train, y_train,
                    validation_data=(X_val, y_val),
                    epochs=50, batch_size=32, verbose=0)
# Plot diverging loss curves
import matplotlib.pyplot as plt

plt.plot(history.history['loss'], label='Training loss')
plt.plot(history.history['val_loss'], label='Validation loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Overfitting: training loss drops, val loss rises')
plt.show()

Measuring the overfitting gap

Quantify overfitting by comparing training and validation metrics at the end of training:

# Numerical overfitting evidence
train_loss = history.history['loss'][-1]
val_loss   = history.history['val_loss'][-1]
gap = val_loss - train_loss
print(f"Train loss: {train_loss:.4f}")
print(f"Val loss:   {val_loss:.4f}")
print(f"Gap:        {gap:.4f}  (larger = more overfit)")
Loading...
Loading...
Loading...

Think Deeper

A network has 134,000 parameters but only 1,600 training samples. What ratio does that give, and why is it a problem for a security ML model?

84 parameters per sample — the network can memorise every training example, including noise. In security, an overfit IDS memorises exact attack signatures from training but misses novel attack variants because it never learned the underlying pattern (e.g. high connection rate + many failed logins = brute force).
Cybersecurity tie-in: An overfit intrusion detection model memorises specific attack signatures from training data (e.g. "exactly 47 SYN packets to port 445 in 2.3 seconds"). It catches those exact patterns perfectly but fails on novel attack variants that differ even slightly. Regularisation ensures the model learns the general pattern ("rapid repeated connection attempts to sensitive ports") instead.

Loading...