intermediate

Parameters & Model Size

The Notebook with a Billion Pages

6 min read

The Analogy

The Notebook with a Billion Pages

A parameter is one tiny adjustable knob inside the model — 70 billion of them working together create intelligence.

Imagine a massive control panel with 70 billion dials. During training, each dial is turned up or down millions of times until the panel produces correct answers. After training, the dials are locked. Those locked values — the parameters — are the model's knowledge. More dials generally means more nuance, but also more compute and cost.

In Plain English

Parameters are the learned numerical values inside an AI model — adjusted during training to minimise errors. Model size (7B, 70B, 405B) refers to how many parameters it has. More parameters generally means more capable but slower and more expensive.

The Technical Picture

Parameters are the weight matrices and bias vectors in a neural network. During training, gradients update parameters via backpropagation to minimise a loss function. Parameter count determines model capacity, memory footprint (roughly 2 bytes per parameter in float16), and inference compute requirements.

Real-World Examples

Llama 3 8B runs on a laptop; Llama 3 405B needs a server cluster
GPT-4 is estimated at over 1 trillion parameters
Smaller models like Gemma 2B are fast and cheap — good for specific tasks

Key Takeaway

"7B parameters" means 7 billion learned values — that's the compressed knowledge of the model.

Related Concepts

Large Language Models (LLMs)

Foundational Models

Inference vs Training

Model Quantisation

Back to All Concepts