Step 3: Train and Evaluate

Scaling, fitting, confusion matrix, classification report

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

Feature Scaling

Logistic regression uses gradient descent to find the best weights. If features are on very different scales (e.g., url_length ranges 10–250 while has_at_symbol is 0/1), the optimiser converges slowly.

StandardScaler transforms each feature to have mean=0 and std=1:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)   # fit AND transform
X_test_scaled  = scaler.transform(X_test)         # transform only!

Critical rule: fit the scaler on training data only. If you fit on the full dataset, you leak test distribution information into the model.

Training the Classifier

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

model = LogisticRegression(random_state=42)   # 42 keeps the run reproducible (see Lesson 1.2)
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)
print(classification_report(y_test, y_pred,
                            target_names=['Legitimate', 'Phishing']))

Reading the Classification Report

MetricWhat it measuresSecurity meaning
PrecisionOf those flagged as phishing, how many really are?Low precision = too many false alarms
RecallOf all actual phishing, how many did we catch?Low recall = threats getting through
F1Harmonic mean of precision and recallBalanced overall score
AccuracyTotal correct / total predictionsMisleading when classes are imbalanced

The Confusion Matrix

A 2×2 table showing every possible outcome:

Predicted: LegitPredicted: Phishing
Actually LegitTrue Negative (TN)False Positive (FP) — false alarm
Actually PhishingFalse Negative (FN) — missed threatTrue Positive (TP) — caught it

In security, False Negatives are usually worse than False Positives. A missed phishing email can lead to a breach; a false alarm just wastes an analyst's time.

Loading...
Loading...
Loading...

Think Deeper

Your model has 95% accuracy but only 60% recall on phishing. Your boss says '95% is great'. What do you tell them?

60% recall means 40% of phishing URLs get through to users. If 100 phishing emails arrive daily, 40 reach inboxes. Accuracy is misleading when classes are imbalanced — the model gets credit for correctly labelling the easy majority class. Recall is the metric that matters when missing a positive is dangerous.
Cybersecurity tie-in: The confusion matrix is the language of security ML. When a vendor claims "99% detection rate", ask: what's the false positive rate? And at what threshold? Without these numbers, the claim is meaningless.

Loading...