Part 4: Naive Bayes
Naive Bayes methods are a set of supervised learning algorithms based on the application of Bayes’ theorem with the “naive” assumption of conditional independence between each pair of features, given the value of the class variable. There is no single algorithm for training these classifiers, so the Naive Bayes approach assumes that the value of a particular feature is independent of the value of any other feature given a class variable. The Naive Bayes classifier assumes that all predictor variables are independent of each other and predicts, based on a sample of input data, a probability distribution over a set of classes, thus calculating the probability that the target categorical variable belongs to each class. When used for text classification, the Naive Bayes classifier often achieves a higher success rate than other algorithms due to its ability to perform well on multi-class problems while maintaining independence.
The Naive Bayes classifier is intended to be used when the predictors are independent of each other in each class, but in practice it performs well even when this independence assumption is false. The assumption of conditional independence of predictor classes allows the Naive Bayes classifier to estimate the parameters needed for accurate classification using less training data than many other classifiers. Under the naive assumption that colorless and amorphous characteristics are mutually independent, the conditional probabilities of a class can be computed as a simple product of the individual conditional probabilities. One of the strategies for processing continuous data in naive bayesian classification can be discretization of features and the formation of individual categories or the use of a Gaussian kernel to calculate the conditional probabilities of a class.
The Bayesian probabilistic model of naive classifiers is based on Bayes’ popular probability theorem, and the adjective “naive” comes from the assumption that the characteristics of a data set are mutually independent. Many practical applications use the maximum likelihood method to estimate the parameters of naive Bayes models; in other words, it is possible to work with a naive Bayesian model without assuming Bayesian probability and without using Bayesian methods. The key point of Bayes’ theorem is that the probability of an event can be adjusted as new data becomes available.
Bayes’ theorem describes the probability of an event based on prior knowledge of conditions that may be associated with that event. Technically speaking, Bayes’ Law calculates a conditional probability, the probability of the observed data. We can generate a new dummy dataset with the same probability distribution using the Naive Bayes classifier.
We can simply ignore the missing values, because the algorithm processes the input characteristics separately both at the model building stage and at the prediction stage. The purpose of the classifier is to predict if a given fruit is a banana, orange, or other when only 3 characteristics are known (long, sweet, and yellow). The characteristics/predictors used by the above function are the frequency of words in the document.
The Naive Bayes classifier is one of the simplest and most efficient classification algorithms that helps you build fast machine learning models that can make quick predictions.