1. Autoencoders are a type of unsupervised ML method that is designed as a reconstruction problem.
2. Training autoencoders involves the minimization error from reconstructing its inputs x in two steps
- The encoding function f transforms the input x into encodings z (latent representations of the data).
- The decoding function g transforms the encodings z back to a reconstruction of the input ˆx.
3. Building an autoencoder model may be useful for several tasks, such as:
- Dimensionality reduction.
- Denoising.
- Pre-processing inputs before classification (“whitening”).
- Outlier detection.
4. Autoencoders as neural networks
Autoencoders can be represented as an NN with one or many fully connected hidden layers.
1. The output layer is identical to the input layer, x.
2. Training the neural network implies finding an identity mapping between inputs and outputs.
The hidden activations are called encodings, latent features, or “factors,” z.
▶ The encodings capture the dependencies between the different observed variables included in x.
We can represent a single-layer autoencoder as follows, where f and g are activation functions:
where:
▶ Encoding weights: W1, k × n matrix. Decoding weights: W2, n × k matrix.
▶ Bias terms: b1, k × 1 vector, and b2, n × 1 vector.
▶ Encodings: Z, k × p matrix.
▶ ιp, p × 1 vector of ones.
▶ Xˆ, n × p reconstructed input.
5. Given a choice of activation functions, f and g, and encoding dimension, k, we can train the autoencoder to minimize the reconstruction error computed, say, with the MSE:
where F denotes the Frobenius norm.
6. Principal Components Analysis (PCA): Principal component analysis consists of a linear transformation of the data to a new coordinate system that explains its variation.
PCA finds a low-dimensional (linear) representation of a data set that contains the maximum level of variation.
For instance, the first principal component of our data X is the linear combination of the uncentered features