Step 2: Transformation Plan

Which columns need work?

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

Build the transformation plan

Every column needs a strategy. Click each column to classify it: use directly, needs transformation, or drop.

timestamp string

e.g. 2024-01-15 08:00:00

Click to classify

src_ip string

e.g. 192.168.3.42

Click to classify

dst_ip string

e.g. 185.23.44.102

Click to classify

src_port int

e.g. 52481

Click to classify

dst_port int

e.g. 443

Click to classify

protocol string

e.g. TCP

Click to classify

bytes_sent int

e.g. 3421

Click to classify

bytes_recv int

e.g. 15230

Click to classify

packets int

e.g. 42

Click to classify

duration_str string

e.g. 2.34s

Click to classify

action string

e.g. ALLOW

Click to classify

0 / 11 classified

The transformation plan

Column	Type	Action
timestamp	string	Extract: hour_of_day, is_business_hours
src_ip	string	Extract: is_private, subnet
dst_ip	string	Extract: is_private, known-bad lookup
src_port	int	Use directly (or drop — ephemeral)
dst_port	int	Map to port_risk_score
protocol	string	One-hot encode (TCP/UDP/ICMP)
bytes_sent	int	Use directly
bytes_recv	int	Use directly
packets	int	Use directly
duration_str	string	Strip 's' suffix → float
action	string	Drop — this is the label

5 columns are ready. 6 need transformation. The next steps will show you how to transform each one.

Think Deeper

Try this:

Which column should you NEVER use as a feature? Why?

The action column (ALLOW/BLOCK). That's the label — what you're trying to predict. Using it as a feature is called data leakage: the model gets the answer as input. In production, you wouldn't know the action before the model decides.

Cybersecurity tie-in: A transformation plan is like an incident response playbook — you decide before the data arrives how each field will be handled. In production ML pipelines, this plan becomes a preprocessing module that runs on every new batch of logs.

← Previous ← → to navigate Next →