Step 4: Fit and Predict

Slope, intercept, and the regression line

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

What model.fit() Does

When you call LinearRegression().fit(X_train, y_train), sklearn finds the slope and intercept that minimise the total squared distance between each actual value and the model's prediction.

y_predicted = slope * x + intercept

After .fit(), the model stores:

  • model.coef_[0] — the slope (weight)
  • model.intercept_ — the intercept (bias)

Physical Meaning

For our server model: response_time_ms = slope × requests_per_second + intercept

ParameterValuePhysical meaning
Intercept~29.5 msBaseline response time at 0 rps (network stack overhead)
Slope~1.82 ms/rpsAdditional milliseconds per extra request per second

Example: At 100 rps: 1.82 × 100 + 29.5 = 211.5 ms predicted.

This interpretability is why linear regression remains a valuable baseline — you can explain to a SOC analyst why an alert fired.

Key Code Pattern

from sklearn.linear_model import LinearRegression
import numpy as np

model = LinearRegression()
model.fit(X_train, y_train)

print(f"Slope:     {model.coef_[0]:.2f} ms per rps")
print(f"Intercept: {model.intercept_:.2f} ms")

# Predict for specific loads
new_loads = np.array([[50], [100], [150]])
predictions = model.predict(new_loads)
for load, pred in zip([50, 100, 150], predictions):
    print(f"At {load} rps: {pred:.1f} ms")

# Predict for the full test set
y_pred = model.predict(X_test)

Visualising the Regression Line

Overlay the model's predictions on the scatter plot to see the fit:

x_line = np.linspace(X.min(), X.max(), 200).reshape(-1, 1)
y_line = model.predict(x_line)

plt.scatter(X_test, y_test, alpha=0.4, label="Actual")
plt.plot(x_line, y_line, color="red", label="Predicted")
plt.xlabel("Requests per second")
plt.ylabel("Response time (ms)")
plt.legend()
plt.show()

Points above the line have longer-than-predicted response times — worth investigating.

Loading...
Loading...
Loading...

Think Deeper

The slope is ~1.82 ms per rps. If you doubled the server's CPU, what would you expect to happen to the slope? What about the intercept?

A faster server would have a lower slope (less added latency per extra request) and possibly a lower intercept (less baseline overhead). The model's parameters have physical meaning — that's the power of interpretable models in security. You can explain to a SOC analyst why the alert fired.
Cybersecurity tie-in: The slope and intercept have physical meaning you can explain to a non-technical stakeholder. "Each extra 10 rps adds 18 ms of latency" is actionable. When the actual response time deviates from the prediction, something changed — and in security, unexplained changes deserve investigation.

Loading...