Feature Scaling
Logistic regression uses gradient descent to find the best weights. If features are on very different scales (e.g., url_length ranges 10–250 while has_at_symbol is 0/1), the optimiser converges slowly.
StandardScaler transforms each feature to have mean=0 and std=1:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train) # fit AND transform
X_test_scaled = scaler.transform(X_test) # transform only!
Critical rule: fit the scaler on training data only. If you fit on the full dataset, you leak test distribution information into the model.
Training the Classifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
model = LogisticRegression(random_state=42) # 42 keeps the run reproducible (see Lesson 1.2)
model.fit(X_train_scaled, y_train)
y_pred = model.predict(X_test_scaled)
print(classification_report(y_test, y_pred,
target_names=['Legitimate', 'Phishing']))
Reading the Classification Report
| Metric | What it measures | Security meaning |
|---|---|---|
| Precision | Of those flagged as phishing, how many really are? | Low precision = too many false alarms |
| Recall | Of all actual phishing, how many did we catch? | Low recall = threats getting through |
| F1 | Harmonic mean of precision and recall | Balanced overall score |
| Accuracy | Total correct / total predictions | Misleading when classes are imbalanced |
The Confusion Matrix
A 2×2 table showing every possible outcome:
| Predicted: Legit | Predicted: Phishing | |
|---|---|---|
| Actually Legit | True Negative (TN) | False Positive (FP) — false alarm |
| Actually Phishing | False Negative (FN) — missed threat | True Positive (TP) — caught it |
In security, False Negatives are usually worse than False Positives. A missed phishing email can lead to a breach; a false alarm just wastes an analyst's time.
Think Deeper
Your model has 95% accuracy but only 60% recall on phishing. Your boss says '95% is great'. What do you tell them?