Step 2: Train and Read the Tree

Visualise the tree and extract learned rules

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

Training the Tree

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(
    max_depth=4,          # limits depth to prevent overfitting
    criterion='gini',     # splitting criterion (or 'entropy')
    random_state=42
)
model.fit(X_train, y_train)
ParameterWhat it controls
max_depthMaximum levels; None = grow until pure
criterion'gini' (default) or 'entropy'
min_samples_splitMinimum samples to attempt a split
min_samples_leafMinimum samples in each leaf

Visualising the Tree

plot_tree() renders the tree as a matplotlib figure where you can see every decision:

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

plt.figure(figsize=(20, 10))
plot_tree(model,
          feature_names=FEATURES,
          class_names=CLASS_NAMES,
          filled=True,           # colour by majority class
          rounded=True,
          fontsize=10)
plt.tight_layout()
plt.show()

Each node shows: the split rule, Gini impurity, sample count, and class distribution.

Reading the Tree as Text

from sklearn.tree import export_text

rules = export_text(model, feature_names=FEATURES)
print(rules)

Output looks like:

|--- connection_rate <= 50.50
|   |--- class: benign
|--- connection_rate >  50.50
|   |--- unique_dest_ports <= 20.50
|   |   |--- bytes_sent <= 100000.00
|   |   |   |--- class: benign
|   |   |--- bytes_sent >  100000.00
|   |   |   |--- class: exfiltration
|   |--- unique_dest_ports >  20.50
|   |   |--- class: port_scan

You can turn these rules directly into firewall policies or SIEM alerts.

Loading...
Loading...
Loading...

Think Deeper

Export the tree as text and find the first rule. Could you explain this rule to a non-technical SOC analyst?

Example: 'If connection_rate > 50.5 and unique_dest_ports > 20, classify as port_scan.' This is why decision trees are valuable in security — you can explain every prediction to a human. Try doing that with a neural network. Interpretability builds trust with analysts and auditors.
Cybersecurity tie-in: Tree rules can be directly translated into detection rules. connection_rate > 50 AND unique_dest_ports > 20 → ALERT: possible port scan is a rule your SIEM can execute. The model writes detection logic for you.

Loading...