Traditional CV techniques were always based on hand-crafted algorithms. After the ML methods began gaining traction, they appeared tries of applying the to computer vision tasks. Most common ML task defined in CV is classification (supervised learning), in which we are given a set of input values
Supervised Learning
One of the most common applications of classification in CV is semantic image classification, where we simply label the entire image with some class. While tackling this task, we often don’t have the access to true probability distribution over the inputs (moreover the join probability of outputs given inputs). Therefore, we will use trainig data distribution as proxy for real-world distribution. This is known as empirical risk minimization, where the expected risk is estimated with:
where
Preprocessing
It is often a good idea to prepare the data for classification. The most common preprocessing steps are:
- centering - subtracting mean from features.
- standarizing - scaling the feature so its variance is equal to
. - whitening - computing SVD and rotating the feature space so the final dimensions are uncorrelated and have unit variance.
Nearest Neighbors
Very simple “brute-force” method. We take the closest
Bayesian Classification
If we are able to come up with analytic model of feature construction and noising, or if we can gather enought samples, we can determine the probability distributions of the feature vectors for each class
where the second for is known as normalized exponential or a softmax. The quantity:
is the log-likelihood of a sample
the resulting technique is called Naive Bayes Classifier. For binary classification task, we can rewrite PLACEHOLDER as
where
Papers: