Step 2: Add Dropout

Randomly silence neurons to regularise

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

What Dropout does

During each training step, Dropout(rate) randomly sets a fraction of neuron outputs to zero. With rate=0.3, 30% of neurons are silenced each batch -- different neurons each time. This forces the network to build redundant representations instead of relying on any single neuron.

Property	Detail
Training behaviour	Randomly zero out `rate` fraction of outputs per batch
Inference behaviour	All neurons active (no dropout)
Output scaling	Remaining outputs scaled by `1/(1 - rate)` to preserve expected magnitude
Typical values	0.2 -- 0.5 (higher = stronger regularisation)

Dropout as implicit ensemble

Each training step with dropout uses a different random subset of neurons -- effectively training a different sub-network each time. Over thousands of steps, the model trains an exponential number of overlapping sub-networks. At inference time, using all neurons approximates the ensemble average of all these sub-networks.

Concept	Without Dropout	With Dropout(0.3)
Active neurons	All neurons every step	~70% random subset each step
Co-adaptation	Neurons can become co-dependent	Each neuron must be independently useful
Effective models trained	1 model	Exponentially many sub-networks
Overfitting risk	High with excess capacity	Reduced significantly

Adding Dropout to the overfit model

Place a Dropout layer after each hidden Dense layer. The model architecture stays the same width, but dropout prevents it from memorising.

model = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=(10,)),
    keras.layers.Dropout(0.3),       # drop 30% after first hidden layer
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dropout(0.3),       # drop 30% after second hidden layer
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dropout(0.3),       # drop 30% after third hidden layer
    keras.layers.Dense(1, activation='sigmoid'),
])

model.compile(optimizer='adam', loss='binary_crossentropy',
              metrics=['accuracy'])

Comparing dropout rates

Different rates trade off regularisation strength against model capacity:

Dropout rate	Effect	When to use
`0.1`	Light regularisation	Small models, large datasets
`0.3`	Standard regularisation	Good default starting point
`0.5`	Strong regularisation	Very large models, limited data
`0.8`	Aggressive -- may underfit	Rarely used; only for extreme overfitting

Think Deeper

Try this:

A SOC deploys a model with Dropout(0.5). During inference on live traffic, are neurons still being dropped? What would happen if they were?

No — Dropout is automatically disabled during inference (Keras handles this via the training flag). If neurons were still dropped during inference, predictions would be random and inconsistent — the same packet could be classified as malicious one second and benign the next. That is unacceptable for production alerting.

Cybersecurity tie-in: Dropout acts like training a security team where random analysts sit out each exercise -- everyone must learn every role. In ML terms, no single neuron becomes a "single point of failure" for detecting a specific attack type. If one neuron pathway is missing, others compensate. This makes the model more robust to adversarial evasion attempts that target specific features.

← Previous ← → to navigate Next →