Step 3: Feature Importance

Which features drive predictions?

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

Feature Importance

After training, model.feature_importances_ tells you how much each feature contributed to the tree's decisions.

Each value is the total information gain that feature contributed across all splits, normalised so they sum to 1.0:

importances = model.feature_importances_
for name, imp in sorted(zip(FEATURES, importances),
                        key=lambda x: x[1], reverse=True):
    bar = '█' * int(imp * 40)
    print(f"  {name:25s} {bar} {imp:.3f}")

A feature near the root (where it improves the split most) accumulates more importance.

Why These Features Matter for Network Security

Feature	High value suggests
`connection_rate`	Port scanning or brute-force attack
`bytes_sent`	Data exfiltration (large outbound transfer)
`unique_dest_ports`	Port scanning (probing many services)
`duration_seconds`	Low-and-slow attacks vs quick scans
`failed_conns`	Brute-force or malformed exploit attempts

Visualising Feature Importance

import matplotlib.pyplot as plt

# Sort features by importance
sorted_idx = importances.argsort()[::-1]
plt.figure(figsize=(10, 5))
plt.barh(range(len(FEATURES)),
         importances[sorted_idx],
         align='center')
plt.yticks(range(len(FEATURES)),
           [FEATURES[i] for i in sorted_idx])
plt.xlabel('Feature Importance (MDI)')
plt.title('Which features drive predictions?')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

What Importance Does NOT Tell You

Whether the relationship is positive or negative
Whether the effect is linear or non-linear
Whether the feature would be important in a different model type
Correlated features split importance between them — each looks less important individually

Think Deeper

Try this:

If you remove the top feature (connection_rate) and retrain, what happens to accuracy? Does the second feature become more important?

Accuracy drops because you removed the most discriminative signal. The second feature (bytes_sent) absorbs some of the lost signal and its importance score increases. This reveals feature redundancy — correlated features can partially substitute for each other.

Cybersecurity tie-in: Feature importance guides feature selection. If bytes_received has near-zero importance, you might drop it to simplify the model and reduce the data you need to collect. In production, less data collection = lower storage and processing cost.

← Previous ← → to navigate Next →