Random World

ML - Introduction

ML is about constructing models on observed examples in the rows of data matrices, and using those models to make predictions about missing entries of previously unseen examples.

Classical problems in machine learning include problems such as classification, anomaly detection, and clustering.

An alternative view of machine learning expresses predictions as computational graphs; this idea form the basis for the field of machine learning.

Principal Categories

1. Supervised Learning

- You are given both inputs and outputs

- The input would be a vector, i.e. more than one variable

- We can use supervised learning for regression, predicting numerical values for outputs for any new data points we input - E.g. Input temperate of the day and predict power consumption

- Or we can use it for classification

2. Unsupervised Learning

- The data is not labelled

- We only have inputs, no outputs

- The algorithm finds relationships or patterns for you.

3. Reinforcement Learning

- An algorithm learns to do something

- It is rewarded for successful ‘behavior’ and punished for unsuccessful behavior

Principal Techniques

1. K Nearest Neighbors - KNN is a supervised-learning technique in which we measure distances between any new data point (in any number of dimensions) and the nearest K of our already-classified data and thus conclude to which class our new data point belongs. It can also be used for regression.

2. K Means Clustering: K means clustering (KMC) is an example of an unsupervised-learning technique. We have lots of data in vector form (representing many dimensions of information about each point) and we measure the distance of each data point from K centroids. Then we find the optimal position for these centroids that gives us the best division into the K classes.

3. Naïve Bayes Classifier: NBC is a supervised-learning technique that uses Bayes Theorem to calculate the probability of new data points being in different classes. The `naïve' bit refers to the assumptions made, which are rarely true in practice but that doesn't usually seem to matter.

4. Regression Methods: Regression methods are supervised learning techniques that try to explain a numerical dependent variable in terms of independent variables.

5. Support Vector Machines: SVM is a supervised-learning technique, one that divides data into classes according to which side of a hyperplane in feature space each data point lies.

6. Self-organizing Maps: SOM is an unsupervised-learning technique that turns data in many

dimensions into nice, typically two-dimensional, pictures for visualizing relationships between data points.