Large Language Models

1.  LLM Definitions 



Based on the standard mathematical definition of a Large Language Model (LLM) usually shown in these diagrams,  θ (Theta) represents the Parameters (Weights & Biases) of the neural network.

Without θ, the equation is just an empty shell. $\theta$ contains all the compressed knowledge, grammar rules, and facts the model has learned from the internet. When we say a model has "70 Billion parameters," we are talking about the size of θ


2. Step 1 - Pretraining

Only done once because it is time-consuming and resource intensive 





3. Step 2 - Finetuning

Necessary if the model after step 1 does not have good performance of if the task in non-trival

Techniques - SFT, PEFT (LoRA, prefix tuning, adapters)




4. Step 3 - Preference Tuning

Optional step is done only if certain behaviors appear in the model after step 2 and need to be penalized




5. Emergent Abilities - refers to a significant improvement in performance that a language model acquires beyond a given model size



6. Response Generation - determine a way to generate the next token 

Greedy search -