Step 1: What Are Hyperparameters?

Parameters vs hyperparameters

1 ExplorePlay below
2 ReadUnderstand
3 BuildHands-on lab
4 CompareSolution
💡 ReflectThink deeper

Parameters vs hyperparameters

These two categories look similar but are fundamentally different in how they are determined:

PropertyParameters (weights)Hyperparameters
Set byTraining (gradient descent)You, before training starts
Stored inlayer.weightsCode / config file
Initial valuesSmall random numbersYour deliberate choice
Updated during trainingYes -- every batchNo -- fixed for the entire run
ExamplesDense layer weights, biasesLearning rate, batch size, number of layers

A hyperparameter inventory

Every Keras model has dozens of hyperparameters. Here are the most impactful ones, grouped by category:

CategoryHyperparameterTypical values
ArchitectureNumber of hidden layers1-5
ArchitectureUnits per layer16, 32, 64, 128, 256
ArchitectureActivation functionrelu, tanh, leaky_relu
TrainingLearning rate0.0001, 0.001, 0.01
TrainingBatch size32, 64, 128, 256
TrainingOptimizeradam, sgd, rmsprop
RegularisationDropout rate0.1 - 0.5
RegularisationEarly stopping patience5 - 20

Inspecting weights before and after training

To see the distinction concretely, examine the actual weight values:

from tensorflow import keras
import numpy as np

np.random.seed(42)   # seeds the initial random weights — see "Why reproducibility matters" card below

model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    keras.layers.Dense(1, activation='sigmoid'),
])
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001),
              loss='binary_crossentropy')

# BEFORE training: weights are small random numbers
w_before = model.layers[0].get_weights()[0]
print(f"Before training - mean: {w_before.mean():.4f}, std: {w_before.std():.4f}")

# Train...
model.fit(X_train, y_train, epochs=20, batch_size=32, verbose=0)

# AFTER training: weights have shifted to encode learned patterns
w_after = model.layers[0].get_weights()[0]
print(f"After training  - mean: {w_after.mean():.4f}, std: {w_after.std():.4f}")
print(f"Weight change   - mean shift: {(w_after - w_before).mean():.4f}")

Why reproducibility matters

Random seeds control the initial parameter values. Hyperparameters must be recorded for reproducibility:

Why two seeds? NumPy and TensorFlow have independent random number generators. Keras uses both: NumPy for things like train/test shuffling, TensorFlow for weight initialisation and dropout masks. You must seed each one separately to get a fully reproducible run.

import numpy as np
import tensorflow as tf

# Set seeds for reproducible results
np.random.seed(42)        # seeds NumPy's RNG (used by sklearn, data shuffling, etc.)
tf.random.set_seed(42)    # seeds TensorFlow's RNG (used by weight init, dropout)

# Document your hyperparameters
config = {
    'units': 64,
    'learning_rate': 0.001,
    'batch_size': 32,
    'epochs': 50,
    'dropout_rate': 0.3,
    'random_seed': 42,
}
print("Experiment config:", config)
Loading...
Loading...
Loading...

Think Deeper

Your colleague says 'the model learned a learning rate of 0.001.' Why is this statement wrong, and why does the distinction matter for reproducible security ML?

Learning rate is a hyperparameter — set by the engineer before training, not learned by gradient descent. The model learns weights and biases (parameters). This matters because hyperparameters must be documented and version-controlled for reproducibility. If a security model's detection rate drops after retraining, you need to know exactly which hyperparameters changed.
Cybersecurity tie-in: In a regulated security environment, you must be able to reproduce model training exactly. If an IDS model flags a transaction as fraud and the decision is challenged, you need to show the exact hyperparameters, random seed, and training data that produced that model. Version-controlling hyperparameters is as important as version-controlling code.

Loading...