Error Metrics for Regression
After fitting a model, you need to quantify how wrong it is. Four metrics are standard:
| Metric | Units | Interpretation |
|---|---|---|
| MSE | ms² | Penalises large errors heavily; hard to interpret directly |
| RMSE | ms | Same units as target; "average error magnitude" |
| MAE | ms | Robust to outliers; typical error size |
| R² | 0–1 | Fraction of variance explained; 1.0 = perfect |
For our server model: RMSE ≈ 15 ms means predictions are off by ~15 ms on average. R² ≈ 0.97 means the model explains 97% of response time variance.
Key Code Pattern
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import numpy as np
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"MSE: {mse:.1f} ms²")
print(f"RMSE: {rmse:.1f} ms")
print(f"MAE: {mae:.1f} ms")
print(f"R²: {r2:.3f}")
Residual Analysis
A residual is actual − predicted. Examining residuals reveals model quality:
| Pattern | Meaning | Action |
|---|---|---|
| Random around 0 | Good fit | None needed |
| Systematic curve | Non-linear relationship | Try polynomial features |
| Fan shape | Heteroscedasticity | Transform target (log) |
| Large individual residuals | Outliers | Investigate those rows |
In security, large positive residuals (actual >> predicted) are the most interesting — the server is slower than expected.
Building a Security Baseline
Turn the regression model into an anomaly detector in 4 steps:
- Fit the model on historical normal traffic
- Compute residuals on new observations
- Calculate σ (standard deviation of training residuals)
- Alert when residual > kσ (typically k=3)
# Compute training residuals and threshold
train_residuals = y_train - model.predict(X_train)
sigma = np.std(train_residuals)
threshold = 3 * sigma
# Flag anomalies in test set
test_residuals = y_test - y_pred
anomalies = test_residuals > threshold
print(f"Threshold (3σ): {threshold:.1f} ms")
print(f"Anomalies: {anomalies.sum()} / {len(y_test)}")
Under a normal distribution, only 0.3% of observations naturally fall beyond 3σ — roughly 3 false alarms per 1,000 legitimate observations.
Think Deeper
At k=2σ, you flag ~7% of observations. At k=3σ, you flag ~2%. A SOC analyst can handle 10 alerts per day from this system. Which threshold should you choose?