Step 1: What Are Hyperparameters?

Parameters vs hyperparameters

1 ExplorePlay below

›

2 ReadUnderstand

›

3 BuildHands-on lab

›

4 CompareSolution

›

💡 ReflectThink deeper

Parameters vs hyperparameters

These two categories look similar but are fundamentally different in how they are determined:

Property	Parameters (weights)	Hyperparameters
Set by	Training (gradient descent)	You, before training starts
Stored in	`layer.weights`	Code / config file
Initial values	Small random numbers	Your deliberate choice
Updated during training	Yes -- every batch	No -- fixed for the entire run
Examples	Dense layer weights, biases	Learning rate, batch size, number of layers

A hyperparameter inventory

Every Keras model has dozens of hyperparameters. Here are the most impactful ones, grouped by category:

Category	Hyperparameter	Typical values
Architecture	Number of hidden layers	1-5
Architecture	Units per layer	16, 32, 64, 128, 256
Architecture	Activation function	relu, tanh, leaky_relu
Training	Learning rate	0.0001, 0.001, 0.01
Training	Batch size	32, 64, 128, 256
Training	Optimizer	adam, sgd, rmsprop
Regularisation	Dropout rate	0.1 - 0.5
Regularisation	Early stopping patience	5 - 20

Inspecting weights before and after training

To see the distinction concretely, examine the actual weight values:

from tensorflow import keras
import numpy as np

np.random.seed(42)   # seeds the initial random weights — see "Why reproducibility matters" card below

model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    keras.layers.Dense(1, activation='sigmoid'),
])
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001),
              loss='binary_crossentropy')

# BEFORE training: weights are small random numbers
w_before = model.layers[0].get_weights()[0]
print(f"Before training - mean: {w_before.mean():.4f}, std: {w_before.std():.4f}")

# Train...
model.fit(X_train, y_train, epochs=20, batch_size=32, verbose=0)

# AFTER training: weights have shifted to encode learned patterns
w_after = model.layers[0].get_weights()[0]
print(f"After training  - mean: {w_after.mean():.4f}, std: {w_after.std():.4f}")
print(f"Weight change   - mean shift: {(w_after - w_before).mean():.4f}")

Why reproducibility matters

Random seeds control the initial parameter values. Hyperparameters must be recorded for reproducibility:

Why two seeds? NumPy and TensorFlow have independent random number generators. Keras uses both: NumPy for things like train/test shuffling, TensorFlow for weight initialisation and dropout masks. You must seed each one separately to get a fully reproducible run.

import numpy as np
import tensorflow as tf

# Set seeds for reproducible results
np.random.seed(42)        # seeds NumPy's RNG (used by sklearn, data shuffling, etc.)
tf.random.set_seed(42)    # seeds TensorFlow's RNG (used by weight init, dropout)

# Document your hyperparameters
config = {
    'units': 64,
    'learning_rate': 0.001,
    'batch_size': 32,
    'epochs': 50,
    'dropout_rate': 0.3,
    'random_seed': 42,
}
print("Experiment config:", config)

Think Deeper

Try this:

Your colleague says 'the model learned a learning rate of 0.001.' Why is this statement wrong, and why does the distinction matter for reproducible security ML?

Learning rate is a hyperparameter — set by the engineer before training, not learned by gradient descent. The model learns weights and biases (parameters). This matters because hyperparameters must be documented and version-controlled for reproducibility. If a security model's detection rate drops after retraining, you need to know exactly which hyperparameters changed.

Cybersecurity tie-in: In a regulated security environment, you must be able to reproduce model training exactly. If an IDS model flags a transaction as fraud and the decision is challenged, you need to show the exact hyperparameters, random seed, and training data that produced that model. Version-controlling hyperparameters is as important as version-controlling code.

← Previous ← → to navigate Next →