Step 3: Build & Train a CNN

Full CNN on MNIST digit classification

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

Complete CNN architecture

The standard pattern for image classification stacks Conv+Pool blocks (feature extraction) followed by Dense layers (classification):

LayerOutput shapeRole
Input(28, 28, 1)Greyscale image
Conv2D(32, (3,3), relu)(26, 26, 32)Detect low-level features: edges, corners
MaxPooling2D(2,2)(13, 13, 32)Downsample, retain strongest activations
Conv2D(64, (3,3), relu)(11, 11, 64)Combine features: curves, digit parts
MaxPooling2D(2,2)(5, 5, 64)Further downsample
Flatten()(1600,)Convert 2D maps to 1D for Dense layers
Dense(128, relu)(128,)Classification reasoning
Dense(10, softmax)(10,)One probability per digit class (0-9)

Building the CNN in Keras

from tensorflow import keras

model = keras.Sequential([
    keras.layers.Conv2D(32, (3, 3), activation='relu',
                        input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),

    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),

    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax'),
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()  # check the parameter counts per layer

How Conv layers stack: the feature hierarchy

Each successive Conv layer detects higher-level patterns by combining the features from the layer below:

Layer depthDetectsReceptive field
Conv layer 1Edges, lines, simple textures3x3 pixels (local)
Conv layer 2Corners, curves, shapes~7x7 pixels (combining layer 1 patterns)
Conv layer 3+Object parts, complex structuresIncreasingly larger regions

CNN vs Dense baseline on MNIST

Training both on the same data with similar training time:

ModelParametersTest accuracySpatial awareness
Dense (784 -> 128 -> 10)~101,770~97.8%None
CNN (Conv32 -> Conv64 -> Dense128)~34,826~99.2%Full

The CNN achieves higher accuracy with fewer parameters because weight sharing and spatial awareness let it learn more efficiently from image data.

Loading...
Loading...
Loading...

Think Deeper

Your CNN hits 99.2% on MNIST but your Dense baseline hits 97.8%. Is the 1.4% gap worth the added complexity? When would it be?

For MNIST digits, probably not — 97.8% is already excellent. But in security, that 1.4% gap could mean hundreds of missed detections at scale. If your NIDS processes 1 million events/day, 1.4% fewer false negatives = 14,000 more correctly caught threats. The gap matters when the cost of misclassification is high.
Cybersecurity tie-in: The same Conv-Pool-Dense architecture used for MNIST applies directly to malware image classification. Convert binary files to greyscale images, then train a CNN to distinguish malware families. The Conv layers learn structural patterns (packed sections, code caves, import tables) that are visually consistent within a family even when the binary is slightly modified.

Loading...