Step 1: The Raw Log

What a firewall export looks like

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

A firewall just exported this log

200 connections from the last few hours. Each row is one network connection. Can you spot which columns a machine learning model can use?

timestamp src_ip dst_ip src_port dst_port protocol bytes_sent bytes_recv packets duration_str action
2024-01-15 08:00:00 192.168.6.180 39.220.169.247 63069 53 TCP 264 1011 36 15.98s ALLOW
2024-01-15 08:02:00 192.168.7.189 26.98.49.153 51703 443 TCP 679 6160 47 71.95s ALLOW
2024-01-15 08:04:00 192.168.4.103 152.12.59.250 63461 3389 TCP 2764 9974 43 1.13s ALLOW
2024-01-15 08:06:00 192.168.9.211 135.56.35.173 62386 8080 TCP 2470 478 40 1.20s BLOCK
2024-01-15 08:08:00 192.168.6.75 20.64.7.238 64061 443 UDP 2043 2758 35 6.24s ALLOW
2024-01-15 08:10:00 192.168.7.117 144.141.203.115 54712 80 TCP 518 4003 46 30.19s ALLOW
2024-01-15 08:12:00 192.168.3.104 215.142.91.98 53441 443 TCP 2703 3927 48 3.64s ALLOW
2024-01-15 08:14:00 192.168.7.131 66.31.214.191 49998 80 TCP 3541 3828 48 6.70s ALLOW
2024-01-15 08:16:00 192.168.5.53 86.50.152.186 60106 22 TCP 476 1630 37 1.06s ALLOW
2024-01-15 08:18:00 192.168.1.88 63.189.124.150 62855 8080 TCP 1043 4530 41 21.61s ALLOW
Loading...
Loading...
Loading...

Think Deeper

Look at the 'duration_str' column. What would happen if you passed '2.34s' to a multiplication operation?

You'd get a TypeError — Python can't multiply a string by a number. The 's' suffix makes it a string, not a float. This is why parsing is the first step: strip the suffix, convert to float, then you can compute bytes_per_second.
Cybersecurity tie-in: Every SIEM and firewall exports logs like this. The raw export has IP addresses, timestamps, protocol strings — none of which sklearn can process. Feature engineering is the bridge between SOC data and ML models.

Loading...