Random World: Large Language Models

1. LLM Definitions

Based on the standard mathematical definition of a Large Language Model (LLM) usually shown in these diagrams, θ (Theta) represents the Parameters (Weights & Biases) of the neural network.

Without θ, the equation is just an empty shell. $\theta$ contains all the compressed knowledge, grammar rules, and facts the model has learned from the internet. When we say a model has "70 Billion parameters," we are talking about the size of θ

2. Step 1 - Pretraining

Only done once because it is time-consuming and resource intensive