Step 1: Overfitting Demo

Watch train vs validation accuracy diverge as depth grows

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

The Train/Validation/Test Split

When tuning hyperparameters (like tree depth), you cannot use the test set — that would leak test information into your model selection. Instead, use a three-way split:

Set	Portion	Purpose
Training	60%	The model learns from this data
Validation	20%	Used to choose hyperparameters (e.g., best depth)
Test	20%	Final evaluation ONLY — never touched during tuning

Two parameters used below: random_state=42 makes the split reproducible (covered in Lesson 1.2), and stratify=y tells sklearn to keep the class proportions equal in both halves — if your data is 90% benign / 10% attack, both train and test will also be 90/10 instead of randomly drifting. This is critical for imbalanced security datasets.

from sklearn.model_selection import train_test_split

# First split: separate test set
X_temp, X_test, y_temp, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)

# Second split: training and validation from remainder
X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.25, random_state=42, stratify=y_temp)
# 0.25 of 80% = 20% of total

The Overfitting Diagnostic

Sweep max_depth from 1 to 20 and record training and validation accuracy at each depth:

from sklearn.tree import DecisionTreeClassifier

train_scores, val_scores = [], []
depths = range(1, 21)

for d in depths:
    tree = DecisionTreeClassifier(max_depth=d, random_state=42)
    tree.fit(X_train, y_train)
    train_scores.append(tree.score(X_train, y_train))
    val_scores.append(tree.score(X_val, y_val))

# Plot both curves
plt.plot(depths, train_scores, 'b-', label='Training')
plt.plot(depths, val_scores, 'r--', label='Validation')
plt.xlabel('max_depth')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

Reading the Divergence

Depth	Train Acc	Val Acc	Gap	Diagnosis
1	~0.65	~0.65	~0%	Underfitting — too simple
3	~0.93	~0.92	~1%	Learning real patterns
5	~0.99	~0.97	~2%	Sweet spot — best validation
10	1.00	~0.95	~5%	Starting to overfit
20	1.00	~0.94	~6%	Overfitting — memorising noise

The overfitting point is where the gap between training and validation grows while validation accuracy stops improving or drops. Pick the depth just before this happens.

Finding the Sweet Spot Programmatically

# Find depth with highest validation accuracy
best_depth = depths[np.argmax(val_scores)]
best_val = max(val_scores)
best_train = train_scores[np.argmax(val_scores)]

print(f"Best depth:  {best_depth}")
print(f"Train acc:   {best_train:.3f}")
print(f"Val acc:     {best_val:.3f}")
print(f"Gap:         {best_train - best_val:.3f}")

Think Deeper

Try this:

Your intrusion detector scores 100% on training data and 74% on validation data. The security team says 'the model works.' What do you tell them?

The 26-point gap is a massive overfitting signal. The model has memorised the training attacks but will fail on new attack variants it has never seen. In production, it would miss novel threats while appearing to work in testing. You need to reduce model complexity (lower max_depth, add regularisation) and retrain until the gap narrows to 2-3 points -- even if training accuracy drops to 97%.

Cybersecurity tie-in: An overfit intrusion detector looks great in the lab (100% on training data) but fails in production on new attack variants. The diagnostic plot is your evidence: if the gap between training and validation is large, the model has memorised training attacks instead of learning general attack patterns. Always check this before deploying a security model.

← Previous ← → to navigate Next →