Step 6: The Scaling Problem

StandardScaler vs MinMaxScaler

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

The scaling problem

bytes_per_second ranges from 0 to 100,000+. port_risk_score ranges from 1 to 5. Without scaling, the model thinks bytes_per_second is 20,000x more important.

1x (no outlier)

StandardScaler (x - mean) / std

MinMaxScaler (x - min) / (max - min)

Loading...
Loading...
Loading...

Think Deeper

You have one connection that transferred 100,000 bytes/sec while all others are under 5,000. What happens with MinMaxScaler?

MinMaxScaler maps min→0 and max→1. That one outlier becomes 1.0, and all normal traffic gets compressed into 0.00–0.05. The model can barely distinguish normal connections. StandardScaler handles this better — the outlier gets a high z-score, but normal data stays spread around 0.
Cybersecurity tie-in: Network data has extreme outliers — one data exfiltration event can transfer millions of bytes while normal traffic stays under 10KB. StandardScaler handles this gracefully; MinMaxScaler gets crushed by the outlier.

Loading...