Step 5: Evaluate and Detect

MSE, RMSE, R² — build a security baseline

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

Error Metrics for Regression

After fitting a model, you need to quantify how wrong it is. Four metrics are standard:

MetricUnitsInterpretation
MSEms²Penalises large errors heavily; hard to interpret directly
RMSEmsSame units as target; "average error magnitude"
MAEmsRobust to outliers; typical error size
0–1Fraction of variance explained; 1.0 = perfect

For our server model: RMSE ≈ 15 ms means predictions are off by ~15 ms on average. R² ≈ 0.97 means the model explains 97% of response time variance.

Key Code Pattern

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import numpy as np

y_pred = model.predict(X_test)

mse  = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae  = mean_absolute_error(y_test, y_pred)
r2   = r2_score(y_test, y_pred)

print(f"MSE:  {mse:.1f} ms²")
print(f"RMSE: {rmse:.1f} ms")
print(f"MAE:  {mae:.1f} ms")
print(f"R²:   {r2:.3f}")

Residual Analysis

A residual is actual − predicted. Examining residuals reveals model quality:

PatternMeaningAction
Random around 0Good fitNone needed
Systematic curveNon-linear relationshipTry polynomial features
Fan shapeHeteroscedasticityTransform target (log)
Large individual residualsOutliersInvestigate those rows

In security, large positive residuals (actual >> predicted) are the most interesting — the server is slower than expected.

Building a Security Baseline

Turn the regression model into an anomaly detector in 4 steps:

  1. Fit the model on historical normal traffic
  2. Compute residuals on new observations
  3. Calculate σ (standard deviation of training residuals)
  4. Alert when residual > kσ (typically k=3)
# Compute training residuals and threshold
train_residuals = y_train - model.predict(X_train)
sigma = np.std(train_residuals)
threshold = 3 * sigma

# Flag anomalies in test set
test_residuals = y_test - y_pred
anomalies = test_residuals > threshold
print(f"Threshold (3σ): {threshold:.1f} ms")
print(f"Anomalies: {anomalies.sum()} / {len(y_test)}")

Under a normal distribution, only 0.3% of observations naturally fall beyond 3σ — roughly 3 false alarms per 1,000 legitimate observations.

Loading...
Loading...
Loading...

Think Deeper

At k=2σ, you flag ~7% of observations. At k=3σ, you flag ~2%. A SOC analyst can handle 10 alerts per day from this system. Which threshold should you choose?

If normal volume is 1440 observations/day (one per minute), k=2σ produces ~100 alerts and k=3σ produces ~29. Neither fits 10 alerts. You'd need k≈3.5σ. This is threshold tuning — balancing detection rate against analyst capacity. Too many alerts = alert fatigue = real threats get ignored.
Cybersecurity tie-in: This is exactly how statistical anomaly detection works in production. Your IDS/IPS baselines network behaviour, then flags deviations. The kσ threshold controls the tradeoff between catching real attacks and drowning analysts in false positives.

Loading...