Step 3: Feature Importance

Which features drive predictions?

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

Feature Importance

After training, model.feature_importances_ tells you how much each feature contributed to the tree's decisions.

Each value is the total information gain that feature contributed across all splits, normalised so they sum to 1.0:

importances = model.feature_importances_
for name, imp in sorted(zip(FEATURES, importances),
                        key=lambda x: x[1], reverse=True):
    bar = '█' * int(imp * 40)
    print(f"  {name:25s} {bar} {imp:.3f}")

A feature near the root (where it improves the split most) accumulates more importance.

Why These Features Matter for Network Security

FeatureHigh value suggests
connection_ratePort scanning or brute-force attack
bytes_sentData exfiltration (large outbound transfer)
unique_dest_portsPort scanning (probing many services)
duration_secondsLow-and-slow attacks vs quick scans
failed_connsBrute-force or malformed exploit attempts

Visualising Feature Importance

import matplotlib.pyplot as plt

# Sort features by importance
sorted_idx = importances.argsort()[::-1]
plt.figure(figsize=(10, 5))
plt.barh(range(len(FEATURES)),
         importances[sorted_idx],
         align='center')
plt.yticks(range(len(FEATURES)),
           [FEATURES[i] for i in sorted_idx])
plt.xlabel('Feature Importance (MDI)')
plt.title('Which features drive predictions?')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

What Importance Does NOT Tell You

  • Whether the relationship is positive or negative
  • Whether the effect is linear or non-linear
  • Whether the feature would be important in a different model type
  • Correlated features split importance between them — each looks less important individually
Loading...
Loading...
Loading...

Think Deeper

If you remove the top feature (connection_rate) and retrain, what happens to accuracy? Does the second feature become more important?

Accuracy drops because you removed the most discriminative signal. The second feature (bytes_sent) absorbs some of the lost signal and its importance score increases. This reveals feature redundancy — correlated features can partially substitute for each other.
Cybersecurity tie-in: Feature importance guides feature selection. If bytes_received has near-zero importance, you might drop it to simplify the model and reduce the data you need to collect. In production, less data collection = lower storage and processing cost.

Loading...