Complete CNN architecture
The standard pattern for image classification stacks Conv+Pool blocks (feature extraction) followed by Dense layers (classification):
| Layer | Output shape | Role |
|---|---|---|
| Input | (28, 28, 1) | Greyscale image |
| Conv2D(32, (3,3), relu) | (26, 26, 32) | Detect low-level features: edges, corners |
| MaxPooling2D(2,2) | (13, 13, 32) | Downsample, retain strongest activations |
| Conv2D(64, (3,3), relu) | (11, 11, 64) | Combine features: curves, digit parts |
| MaxPooling2D(2,2) | (5, 5, 64) | Further downsample |
| Flatten() | (1600,) | Convert 2D maps to 1D for Dense layers |
| Dense(128, relu) | (128,) | Classification reasoning |
| Dense(10, softmax) | (10,) | One probability per digit class (0-9) |
Building the CNN in Keras
from tensorflow import keras
model = keras.Sequential([
keras.layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(28, 28, 1)),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Conv2D(64, (3, 3), activation='relu'),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax'),
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.summary() # check the parameter counts per layer
How Conv layers stack: the feature hierarchy
Each successive Conv layer detects higher-level patterns by combining the features from the layer below:
| Layer depth | Detects | Receptive field |
|---|---|---|
| Conv layer 1 | Edges, lines, simple textures | 3x3 pixels (local) |
| Conv layer 2 | Corners, curves, shapes | ~7x7 pixels (combining layer 1 patterns) |
| Conv layer 3+ | Object parts, complex structures | Increasingly larger regions |
CNN vs Dense baseline on MNIST
Training both on the same data with similar training time:
| Model | Parameters | Test accuracy | Spatial awareness |
|---|---|---|---|
| Dense (784 -> 128 -> 10) | ~101,770 | ~97.8% | None |
| CNN (Conv32 -> Conv64 -> Dense128) | ~34,826 | ~99.2% | Full |
The CNN achieves higher accuracy with fewer parameters because weight sharing and spatial awareness let it learn more efficiently from image data.
Loading...
Loading...
Loading...
Think Deeper
Try this:
Your CNN hits 99.2% on MNIST but your Dense baseline hits 97.8%. Is the 1.4% gap worth the added complexity? When would it be?
For MNIST digits, probably not — 97.8% is already excellent. But in security, that 1.4% gap could mean hundreds of missed detections at scale. If your NIDS processes 1 million events/day, 1.4% fewer false negatives = 14,000 more correctly caught threats. The gap matters when the cost of misclassification is high.
Cybersecurity tie-in: The same Conv-Pool-Dense architecture used for MNIST applies directly to malware image classification. Convert binary files to greyscale images, then train a CNN to distinguish malware families. The Conv layers learn structural patterns (packed sections, code caves, import tables) that are visually consistent within a family even when the binary is slightly modified.