What model.fit() Does
When you call LinearRegression().fit(X_train, y_train), sklearn finds the slope and intercept that minimise the total squared distance between each actual value and the model's prediction.
y_predicted = slope * x + intercept
After .fit(), the model stores:
model.coef_[0]— the slope (weight)model.intercept_— the intercept (bias)
Physical Meaning
For our server model: response_time_ms = slope × requests_per_second + intercept
| Parameter | Value | Physical meaning |
|---|---|---|
| Intercept | ~29.5 ms | Baseline response time at 0 rps (network stack overhead) |
| Slope | ~1.82 ms/rps | Additional milliseconds per extra request per second |
Example: At 100 rps: 1.82 × 100 + 29.5 = 211.5 ms predicted.
This interpretability is why linear regression remains a valuable baseline — you can explain to a SOC analyst why an alert fired.
Key Code Pattern
from sklearn.linear_model import LinearRegression
import numpy as np
model = LinearRegression()
model.fit(X_train, y_train)
print(f"Slope: {model.coef_[0]:.2f} ms per rps")
print(f"Intercept: {model.intercept_:.2f} ms")
# Predict for specific loads
new_loads = np.array([[50], [100], [150]])
predictions = model.predict(new_loads)
for load, pred in zip([50, 100, 150], predictions):
print(f"At {load} rps: {pred:.1f} ms")
# Predict for the full test set
y_pred = model.predict(X_test)
Visualising the Regression Line
Overlay the model's predictions on the scatter plot to see the fit:
x_line = np.linspace(X.min(), X.max(), 200).reshape(-1, 1)
y_line = model.predict(x_line)
plt.scatter(X_test, y_test, alpha=0.4, label="Actual")
plt.plot(x_line, y_line, color="red", label="Predicted")
plt.xlabel("Requests per second")
plt.ylabel("Response time (ms)")
plt.legend()
plt.show()
Points above the line have longer-than-predicted response times — worth investigating.
Think Deeper
The slope is ~1.82 ms per rps. If you doubled the server's CPU, what would you expect to happen to the slope? What about the intercept?