Step 2: Add Dropout

Randomly silence neurons to regularise

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

What Dropout does

During each training step, Dropout(rate) randomly sets a fraction of neuron outputs to zero. With rate=0.3, 30% of neurons are silenced each batch -- different neurons each time. This forces the network to build redundant representations instead of relying on any single neuron.

PropertyDetail
Training behaviourRandomly zero out rate fraction of outputs per batch
Inference behaviourAll neurons active (no dropout)
Output scalingRemaining outputs scaled by 1/(1 - rate) to preserve expected magnitude
Typical values0.2 -- 0.5 (higher = stronger regularisation)

Dropout as implicit ensemble

Each training step with dropout uses a different random subset of neurons -- effectively training a different sub-network each time. Over thousands of steps, the model trains an exponential number of overlapping sub-networks. At inference time, using all neurons approximates the ensemble average of all these sub-networks.

ConceptWithout DropoutWith Dropout(0.3)
Active neuronsAll neurons every step~70% random subset each step
Co-adaptationNeurons can become co-dependentEach neuron must be independently useful
Effective models trained1 modelExponentially many sub-networks
Overfitting riskHigh with excess capacityReduced significantly

Adding Dropout to the overfit model

Place a Dropout layer after each hidden Dense layer. The model architecture stays the same width, but dropout prevents it from memorising.

model = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=(10,)),
    keras.layers.Dropout(0.3),       # drop 30% after first hidden layer
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dropout(0.3),       # drop 30% after second hidden layer
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dropout(0.3),       # drop 30% after third hidden layer
    keras.layers.Dense(1, activation='sigmoid'),
])

model.compile(optimizer='adam', loss='binary_crossentropy',
              metrics=['accuracy'])

Comparing dropout rates

Different rates trade off regularisation strength against model capacity:

Dropout rateEffectWhen to use
0.1Light regularisationSmall models, large datasets
0.3Standard regularisationGood default starting point
0.5Strong regularisationVery large models, limited data
0.8Aggressive -- may underfitRarely used; only for extreme overfitting
Loading...
Loading...
Loading...

Think Deeper

A SOC deploys a model with Dropout(0.5). During inference on live traffic, are neurons still being dropped? What would happen if they were?

No — Dropout is automatically disabled during inference (Keras handles this via the training flag). If neurons were still dropped during inference, predictions would be random and inconsistent — the same packet could be classified as malicious one second and benign the next. That is unacceptable for production alerting.
Cybersecurity tie-in: Dropout acts like training a security team where random analysts sit out each exercise -- everyone must learn every role. In ML terms, no single neuron becomes a "single point of failure" for detecting a specific attack type. If one neuron pathway is missing, others compensate. This makes the model more robust to adversarial evasion attempts that target specific features.

Loading...