Step 2: Train and Read the Tree

Visualise the tree and extract learned rules

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

Training the Tree

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(
    max_depth=4,          # limits depth to prevent overfitting
    criterion='gini',     # splitting criterion (or 'entropy')
    random_state=42
)
model.fit(X_train, y_train)

Parameter	What it controls
`max_depth`	Maximum levels; None = grow until pure
`criterion`	'gini' (default) or 'entropy'
`min_samples_split`	Minimum samples to attempt a split
`min_samples_leaf`	Minimum samples in each leaf

Visualising the Tree

plot_tree() renders the tree as a matplotlib figure where you can see every decision:

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

plt.figure(figsize=(20, 10))
plot_tree(model,
          feature_names=FEATURES,
          class_names=CLASS_NAMES,
          filled=True,           # colour by majority class
          rounded=True,
          fontsize=10)
plt.tight_layout()
plt.show()

Each node shows: the split rule, Gini impurity, sample count, and class distribution.

Reading the Tree as Text

from sklearn.tree import export_text

rules = export_text(model, feature_names=FEATURES)
print(rules)

Output looks like:

|--- connection_rate <= 50.50
|   |--- class: benign
|--- connection_rate >  50.50
|   |--- unique_dest_ports <= 20.50
|   |   |--- bytes_sent <= 100000.00
|   |   |   |--- class: benign
|   |   |--- bytes_sent >  100000.00
|   |   |   |--- class: exfiltration
|   |--- unique_dest_ports >  20.50
|   |   |--- class: port_scan

You can turn these rules directly into firewall policies or SIEM alerts.

Think Deeper

Try this:

Export the tree as text and find the first rule. Could you explain this rule to a non-technical SOC analyst?

Example: 'If connection_rate > 50.5 and unique_dest_ports > 20, classify as port_scan.' This is why decision trees are valuable in security — you can explain every prediction to a human. Try doing that with a neural network. Interpretability builds trust with analysts and auditors.

Cybersecurity tie-in: Tree rules can be directly translated into detection rules. connection_rate > 50 AND unique_dest_ports > 20 → ALERT: possible port scan is a rule your SIEM can execute. The model writes detection logic for you.

← Previous ← → to navigate Next →