1. LLM Definitions
Based on the standard mathematical definition of a Large Language Model (LLM) usually shown in these diagrams, θ (Theta) represents the Parameters (Weights & Biases) of the neural network.
Without θ, the equation is just an empty shell. $\theta$ contains all the compressed knowledge, grammar rules, and facts the model has learned from the internet. When we say a model has "70 Billion parameters," we are talking about the size of θ
2. Step 1 - Pretraining
Only done once because it is time-consuming and resource intensive
Necessary if the model after step 1 does not have good performance of if the task in non-trival
Optional step is done only if certain behaviors appear in the model after step 2 and need to be penalized
.
5. Emergent Abilities - refers to a significant improvement in performance that a language model acquires beyond a given model size
Greedy search -