Step 3: Build & Train a CNN

Full CNN on MNIST digit classification

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

Complete CNN architecture

The standard pattern for image classification stacks Conv+Pool blocks (feature extraction) followed by Dense layers (classification):

Layer	Output shape	Role
Input	(28, 28, 1)	Greyscale image
Conv2D(32, (3,3), relu)	(26, 26, 32)	Detect low-level features: edges, corners
MaxPooling2D(2,2)	(13, 13, 32)	Downsample, retain strongest activations
Conv2D(64, (3,3), relu)	(11, 11, 64)	Combine features: curves, digit parts
MaxPooling2D(2,2)	(5, 5, 64)	Further downsample
Flatten()	(1600,)	Convert 2D maps to 1D for Dense layers
Dense(128, relu)	(128,)	Classification reasoning
Dense(10, softmax)	(10,)	One probability per digit class (0-9)

Building the CNN in Keras

from tensorflow import keras

model = keras.Sequential([
    keras.layers.Conv2D(32, (3, 3), activation='relu',
                        input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),

    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),

    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax'),
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()  # check the parameter counts per layer

How Conv layers stack: the feature hierarchy

Each successive Conv layer detects higher-level patterns by combining the features from the layer below:

Layer depth	Detects	Receptive field
Conv layer 1	Edges, lines, simple textures	3x3 pixels (local)
Conv layer 2	Corners, curves, shapes	~7x7 pixels (combining layer 1 patterns)
Conv layer 3+	Object parts, complex structures	Increasingly larger regions

CNN vs Dense baseline on MNIST

Training both on the same data with similar training time:

Model	Parameters	Test accuracy	Spatial awareness
Dense (784 -> 128 -> 10)	~101,770	~97.8%	None
CNN (Conv32 -> Conv64 -> Dense128)	~34,826	~99.2%	Full

The CNN achieves higher accuracy with fewer parameters because weight sharing and spatial awareness let it learn more efficiently from image data.

Think Deeper

Try this:

Your CNN hits 99.2% on MNIST but your Dense baseline hits 97.8%. Is the 1.4% gap worth the added complexity? When would it be?

For MNIST digits, probably not — 97.8% is already excellent. But in security, that 1.4% gap could mean hundreds of missed detections at scale. If your NIDS processes 1 million events/day, 1.4% fewer false negatives = 14,000 more correctly caught threats. The gap matters when the cost of misclassification is high.

Cybersecurity tie-in: The same Conv-Pool-Dense architecture used for MNIST applies directly to malware image classification. Convert binary files to greyscale images, then train a CNN to distinguish malware families. The Conv layers learn structural patterns (packed sections, code caves, import tables) that are visually consistent within a family even when the binary is slightly modified.

← Previous ← → to navigate Next →