Step 1: Demonstrate Overfitting

Build a deliberately oversized network

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

Model capacity and overfitting

A network's capacity is how many distinct functions it can represent. More parameters = higher capacity = ability to fit more complex patterns. But if capacity far exceeds the complexity of the real signal, the model memorises the training data instead of learning the underlying pattern.

Scenario	Parameters	Training samples	Ratio	Risk
Balanced	1,000	10,000	0.1	Low
Borderline	10,000	10,000	1.0	Medium
Overfit	134,000	1,600	84	Severe

The overfit architecture

In this exercise, you deliberately build a network that is far too large for the dataset: three Dense(256) layers on only 1,600 training samples.

Layer	Size	Activation	Parameters
Input	10	--	--
Dense	256	relu	10 x 256 + 256 = 2,816
Dense	256	relu	256 x 256 + 256 = 65,792
Dense	256	relu	256 x 256 + 256 = 65,792
Output	1	sigmoid	256 x 1 + 1 = 257
Total			~134,657

Reading the diverging loss curves

The telltale sign of overfitting is when training loss keeps decreasing but validation loss starts increasing. The gap between these two curves is the overfitting gap.

# Build the overfit model
model = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=(10,)),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid'),
])

model.compile(optimizer='adam', loss='binary_crossentropy',
              metrics=['accuracy'])

# Train and capture history
history = model.fit(X_train, y_train,
                    validation_data=(X_val, y_val),
                    epochs=50, batch_size=32, verbose=0)

# Plot diverging loss curves
import matplotlib.pyplot as plt

plt.plot(history.history['loss'], label='Training loss')
plt.plot(history.history['val_loss'], label='Validation loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Overfitting: training loss drops, val loss rises')
plt.show()

Measuring the overfitting gap

Quantify overfitting by comparing training and validation metrics at the end of training:

# Numerical overfitting evidence
train_loss = history.history['loss'][-1]
val_loss   = history.history['val_loss'][-1]
gap = val_loss - train_loss
print(f"Train loss: {train_loss:.4f}")
print(f"Val loss:   {val_loss:.4f}")
print(f"Gap:        {gap:.4f}  (larger = more overfit)")

Think Deeper

Try this:

A network has 134,000 parameters but only 1,600 training samples. What ratio does that give, and why is it a problem for a security ML model?

84 parameters per sample — the network can memorise every training example, including noise. In security, an overfit IDS memorises exact attack signatures from training but misses novel attack variants because it never learned the underlying pattern (e.g. high connection rate + many failed logins = brute force).

Cybersecurity tie-in: An overfit intrusion detection model memorises specific attack signatures from training data (e.g. "exactly 47 SYN packets to port 445 in 2.3 seconds"). It catches those exact patterns perfectly but fails on novel attack variants that differ even slightly. Regularisation ensures the model learns the general pattern ("rapid repeated connection attempts to sensitive ports") instead.

← Previous ← → to navigate Next →