Super Study Guide - Foundations

 Activation Functions

An activation function is applied to the result of the weight multiplication and bias addition, and aims at introducing nonlinearities into the model


1. Sigmoid




2. Tanh - Hyperbolic Tangent


3. ReLU



4. Leaky ReLU



5. GeLU




ReLU-based activations are not differentiable at 0


Two types of predictions - Classification and Regression

Classification - A softmax layer is typically present to convert the result into a probability distribution 




Training

Pre-Training




Supervised Fine-Tuning



Weight Initialization
Epoch
Loss Function