Training the Tree
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(
max_depth=4, # limits depth to prevent overfitting
criterion='gini', # splitting criterion (or 'entropy')
random_state=42
)
model.fit(X_train, y_train)
| Parameter | What it controls |
|---|---|
max_depth | Maximum levels; None = grow until pure |
criterion | 'gini' (default) or 'entropy' |
min_samples_split | Minimum samples to attempt a split |
min_samples_leaf | Minimum samples in each leaf |
Visualising the Tree
plot_tree() renders the tree as a matplotlib figure where you can see every decision:
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(20, 10))
plot_tree(model,
feature_names=FEATURES,
class_names=CLASS_NAMES,
filled=True, # colour by majority class
rounded=True,
fontsize=10)
plt.tight_layout()
plt.show()
Each node shows: the split rule, Gini impurity, sample count, and class distribution.
Reading the Tree as Text
from sklearn.tree import export_text
rules = export_text(model, feature_names=FEATURES)
print(rules)
Output looks like:
|--- connection_rate <= 50.50
| |--- class: benign
|--- connection_rate > 50.50
| |--- unique_dest_ports <= 20.50
| | |--- bytes_sent <= 100000.00
| | | |--- class: benign
| | |--- bytes_sent > 100000.00
| | | |--- class: exfiltration
| |--- unique_dest_ports > 20.50
| | |--- class: port_scan
You can turn these rules directly into firewall policies or SIEM alerts.
Loading...
Loading...
Loading...
Think Deeper
Try this:
Export the tree as text and find the first rule. Could you explain this rule to a non-technical SOC analyst?
Example: 'If connection_rate > 50.5 and unique_dest_ports > 20, classify as port_scan.' This is why decision trees are valuable in security — you can explain every prediction to a human. Try doing that with a neural network. Interpretability builds trust with analysts and auditors.
Cybersecurity tie-in: Tree rules can be directly translated into detection rules.
connection_rate > 50 AND unique_dest_ports > 20 → ALERT: possible port scan
is a rule your SIEM can execute. The model writes detection logic for you.