Random World: Super Study Guide

Activation Functions

An activation function is applied to the result of the weight multiplication and bias addition, and aims at introducing nonlinearities into the model

1. Sigmoid

2. Tanh - Hyperbolic Tangent

3. ReLU

4. Leaky ReLU

5. GeLU

ReLU-based activations are not differentiable at 0

Two types of predictions - Classification and Regression

Classification - A softmax layer is typically present to convert the result into a probability distribution

Training

Pre-Training

Supervised Fine-Tuning

Weight Initialization

Epoch

Loss Function