Step 5: Evaluate and Detect

MSE, RMSE, R² — build a security baseline

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

Error Metrics for Regression

After fitting a model, you need to quantify how wrong it is. Four metrics are standard:

Metric	Units	Interpretation
MSE	ms²	Penalises large errors heavily; hard to interpret directly
RMSE	ms	Same units as target; "average error magnitude"
MAE	ms	Robust to outliers; typical error size
R²	0–1	Fraction of variance explained; 1.0 = perfect

For our server model: RMSE ≈ 15 ms means predictions are off by ~15 ms on average. R² ≈ 0.97 means the model explains 97% of response time variance.

Key Code Pattern

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import numpy as np

y_pred = model.predict(X_test)

mse  = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae  = mean_absolute_error(y_test, y_pred)
r2   = r2_score(y_test, y_pred)

print(f"MSE:  {mse:.1f} ms²")
print(f"RMSE: {rmse:.1f} ms")
print(f"MAE:  {mae:.1f} ms")
print(f"R²:   {r2:.3f}")

Residual Analysis

A residual is actual − predicted. Examining residuals reveals model quality:

Pattern	Meaning	Action
Random around 0	Good fit	None needed
Systematic curve	Non-linear relationship	Try polynomial features
Fan shape	Heteroscedasticity	Transform target (log)
Large individual residuals	Outliers	Investigate those rows

In security, large positive residuals (actual >> predicted) are the most interesting — the server is slower than expected.

Building a Security Baseline

Turn the regression model into an anomaly detector in 4 steps:

Fit the model on historical normal traffic
Compute residuals on new observations
Calculate σ (standard deviation of training residuals)
Alert when residual > kσ (typically k=3)

# Compute training residuals and threshold
train_residuals = y_train - model.predict(X_train)
sigma = np.std(train_residuals)
threshold = 3 * sigma

# Flag anomalies in test set
test_residuals = y_test - y_pred
anomalies = test_residuals > threshold
print(f"Threshold (3σ): {threshold:.1f} ms")
print(f"Anomalies: {anomalies.sum()} / {len(y_test)}")

Under a normal distribution, only 0.3% of observations naturally fall beyond 3σ — roughly 3 false alarms per 1,000 legitimate observations.

Think Deeper

Try this:

At k=2σ, you flag ~7% of observations. At k=3σ, you flag ~2%. A SOC analyst can handle 10 alerts per day from this system. Which threshold should you choose?

If normal volume is 1440 observations/day (one per minute), k=2σ produces ~100 alerts and k=3σ produces ~29. Neither fits 10 alerts. You'd need k≈3.5σ. This is threshold tuning — balancing detection rate against analyst capacity. Too many alerts = alert fatigue = real threats get ignored.

Cybersecurity tie-in: This is exactly how statistical anomaly detection works in production. Your IDS/IPS baselines network behaviour, then flags deviations. The kσ threshold controls the tradeoff between catching real attacks and drowning analysts in false positives.

← Previous ← → to navigate Next →