Traditional CV techniques were always based on hand-crafted algorithms. After the ML methods began gaining traction, they appeared tries of applying the to computer vision tasks. Most common ML task defined in CV is classification (supervised learning), in which we are given a set of input values (often being some feature vectors or entire images) paired with corresponding class labels , which come from predefined set of classes . Despite the dominance of supervised learning, we can also describe some methods for unsupervised or semi-supervised learning in CV.

Supervised Learning

One of the most common applications of classification in CV is semantic image classification, where we simply label the entire image with some class. While tackling this task, we often don’t have the access to true probability distribution over the inputs (moreover the join probability of outputs given inputs). Therefore, we will use trainig data distribution as proxy for real-world distribution. This is known as empirical risk minimization, where the expected risk is estimated with:

where measures loss of predicting an output for input and model parameters when the expected label is .

Preprocessing

It is often a good idea to prepare the data for classification. The most common preprocessing steps are:

  • centering - subtracting mean from features.
  • standarizing - scaling the feature so its variance is equal to .
  • whitening - computing SVD and rotating the feature space so the final dimensions are uncorrelated and have unit variance.

Nearest Neighbors

Very simple “brute-force” method. We take the closest neighbors for a given input data and return the label that occures the most. We can imagine that low numbers of makes the method behave more abuptly (we only sample few neighbors) resulting in overfitting, and when gets large it is prone to underfitting.

Bayesian Classification

If we are able to come up with analytic model of feature construction and noising, or if we can gather enought samples, we can determine the probability distributions of the feature vectors for each class as well as prior class likelihoods . According to Bayes’ rule, the likelihood of given is given by:

where the second for is known as normalized exponential or a softmax. The quantity:

is the log-likelihood of a sample being from class . The process of applying formula PLACEHOLDER to find the likelihood of class given is known as Bayesian classification. In case the components of the feature are strongly independent i.e.

the resulting technique is called Naive Bayes Classifier. For binary classification task, we can rewrite PLACEHOLDER as

where is the difference between the two class log-likelihood and is known as logit.


Papers: