Depth Controls Complexity
Every additional level lets the tree create finer distinctions:
| Depth | Behaviour | Risk |
|---|---|---|
| 1 | Single yes/no question | Underfit — too simple |
| 3–5 | Captures major patterns | Good generalisation |
| 10+ | Tiny leaves for individual samples | Overfit — memorises data |
| None | Grows until all leaves are pure | Severe overfitting |
The Overfitting Signal
An unlimited tree achieves ~100% training accuracy but fails on new data. The gap between training and test accuracy is the overfitting indicator:
| Underfit (depth=1) | Good (depth=5) | Overfit (depth=15) | |
|---|---|---|---|
| Train accuracy | 65% | 99% | 100% |
| Test accuracy | 65% | 97% | 94% |
| Gap | 0% | 2% | 6% |
Pick the depth just before the gap starts widening — that's where generalisation is best.
Finding the Sweet Spot
train_scores, test_scores = [], []
for depth in range(1, 21):
model = DecisionTreeClassifier(max_depth=depth, random_state=42)
model.fit(X_train, y_train)
train_scores.append(model.score(X_train, y_train))
test_scores.append(model.score(X_test, y_test))
plt.plot(range(1, 21), train_scores, label='Train', marker='o')
plt.plot(range(1, 21), test_scores, label='Test', marker='s')
plt.xlabel('max_depth')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Depth vs Accuracy — find the sweet spot')
plt.show()
The test curve typically rises, plateaus, then gently declines. The plateau is your target depth.
Other Ways to Control Overfitting
| Parameter | Effect |
|---|---|
min_samples_split=10 | Nodes with fewer than 10 samples won't split further |
min_samples_leaf=5 | Every leaf must have at least 5 samples |
max_features='sqrt' | Only consider √n features at each split (adds randomness) |
These are regularisation techniques — they constrain the model to prevent memorisation.
Loading...
Loading...
Loading...
Think Deeper
Try this:
Plot training and test accuracy for depths 1–20. At what depth does the gap between them start growing fast?
Typically around depth 5–7. Training accuracy keeps rising toward 100%, but test accuracy plateaus or drops. The growing gap is the overfitting signal. In production security models, you'd pick the depth just before the gap starts widening — maximising generalisation to new, unseen traffic.
Cybersecurity tie-in: An overfit model memorises your training traffic patterns.
When an attacker uses a slightly different technique, the model fails because it learned noise, not signal.
Generalisation is the goal — a model that works on traffic it has never seen before.