Step 2: Transformation Plan

Which columns need work?

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

Build the transformation plan

Every column needs a strategy. Click each column to classify it: use directly, needs transformation, or drop.

timestamp string
e.g. 2024-01-15 08:00:00
Click to classify
src_ip string
e.g. 192.168.3.42
Click to classify
dst_ip string
e.g. 185.23.44.102
Click to classify
src_port int
e.g. 52481
Click to classify
dst_port int
e.g. 443
Click to classify
protocol string
e.g. TCP
Click to classify
bytes_sent int
e.g. 3421
Click to classify
bytes_recv int
e.g. 15230
Click to classify
packets int
e.g. 42
Click to classify
duration_str string
e.g. 2.34s
Click to classify
action string
e.g. ALLOW
Click to classify
0 / 11 classified
Loading...
Loading...
Loading...

Think Deeper

Which column should you NEVER use as a feature? Why?

The action column (ALLOW/BLOCK). That's the label — what you're trying to predict. Using it as a feature is called data leakage: the model gets the answer as input. In production, you wouldn't know the action before the model decides.
Cybersecurity tie-in: A transformation plan is like an incident response playbook — you decide before the data arrives how each field will be handled. In production ML pipelines, this plan becomes a preprocessing module that runs on every new batch of logs.

Loading...