Parameters vs hyperparameters
These two categories look similar but are fundamentally different in how they are determined:
| Property | Parameters (weights) | Hyperparameters |
|---|---|---|
| Set by | Training (gradient descent) | You, before training starts |
| Stored in | layer.weights | Code / config file |
| Initial values | Small random numbers | Your deliberate choice |
| Updated during training | Yes -- every batch | No -- fixed for the entire run |
| Examples | Dense layer weights, biases | Learning rate, batch size, number of layers |
A hyperparameter inventory
Every Keras model has dozens of hyperparameters. Here are the most impactful ones, grouped by category:
| Category | Hyperparameter | Typical values |
|---|---|---|
| Architecture | Number of hidden layers | 1-5 |
| Architecture | Units per layer | 16, 32, 64, 128, 256 |
| Architecture | Activation function | relu, tanh, leaky_relu |
| Training | Learning rate | 0.0001, 0.001, 0.01 |
| Training | Batch size | 32, 64, 128, 256 |
| Training | Optimizer | adam, sgd, rmsprop |
| Regularisation | Dropout rate | 0.1 - 0.5 |
| Regularisation | Early stopping patience | 5 - 20 |
Inspecting weights before and after training
To see the distinction concretely, examine the actual weight values:
from tensorflow import keras
import numpy as np
np.random.seed(42) # seeds the initial random weights — see "Why reproducibility matters" card below
model = keras.Sequential([
keras.layers.Dense(64, activation='relu', input_shape=(10,)),
keras.layers.Dense(1, activation='sigmoid'),
])
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss='binary_crossentropy')
# BEFORE training: weights are small random numbers
w_before = model.layers[0].get_weights()[0]
print(f"Before training - mean: {w_before.mean():.4f}, std: {w_before.std():.4f}")
# Train...
model.fit(X_train, y_train, epochs=20, batch_size=32, verbose=0)
# AFTER training: weights have shifted to encode learned patterns
w_after = model.layers[0].get_weights()[0]
print(f"After training - mean: {w_after.mean():.4f}, std: {w_after.std():.4f}")
print(f"Weight change - mean shift: {(w_after - w_before).mean():.4f}")
Why reproducibility matters
Random seeds control the initial parameter values. Hyperparameters must be recorded for reproducibility:
Why two seeds? NumPy and TensorFlow have independent random number generators. Keras uses both: NumPy for things like train/test shuffling, TensorFlow for weight initialisation and dropout masks. You must seed each one separately to get a fully reproducible run.
import numpy as np
import tensorflow as tf
# Set seeds for reproducible results
np.random.seed(42) # seeds NumPy's RNG (used by sklearn, data shuffling, etc.)
tf.random.set_seed(42) # seeds TensorFlow's RNG (used by weight init, dropout)
# Document your hyperparameters
config = {
'units': 64,
'learning_rate': 0.001,
'batch_size': 32,
'epochs': 50,
'dropout_rate': 0.3,
'random_seed': 42,
}
print("Experiment config:", config)
Think Deeper
Your colleague says 'the model learned a learning rate of 0.001.' Why is this statement wrong, and why does the distinction matter for reproducible security ML?