In the classification process, I use 2 classification methods, naive Bayes and Bayesian decision. Train the classifiers by using features from 3 color models, respectively. Then compare these classifiers to get the idea about which color system is the best in discrimination of skin and non-skin pixels on images.

In naive Bayes method, since features are independent, to get the posterior probability of \( P(w_i | X) \), in which X denotes \( \left[ x_1, x_2, x_3 \right] \) for a color model. We can use the formula: \( P(w_i | x_j)= \frac{ P(x_j|w_i)P(w_i) }{ P(x_j) } \) and \( P(w_i|X) = P(w_i|x_1)P(w_i|x_2)P(w_i|x_3) \) for classification. Assume that \( w_1 \) denotes skin and \( w_2 \) denotes non-skin. We compare the value of \( P(w_1 | X) \) and \( P(w_2 | X) \) for the pixel X. Classify X to the larger value one.

- 1-Nearest Neighbor classifier (1-NN)
- Back-propagation (BP) network
- Decision tree

For the training set, we plan to generate 1000 samples. For the testing set, we plan to generate 200 samples. We will try 3 different p, to simulate different error probabilities.

The following things are taken cared in the project:- Consider two types of features. The first type of feature generation is using the raw dot status. The second type of feature is based on rows and columns. In a matrix, we count the number of on status LED on each row, and count the number of on status LED on each column.
- A rotated digit is considered to be recognized. Zernike Moments is proposed as rotation invariant features.
- PCA method is considered for dimensional reduction.

The result shows that the BP network with 12 hidden nodes has the best precision. On the other side, the decision tree is the worst one. Another insight is the trend of precision with the increasing of error probability p. The precisions of all classifiers are decreased when p is larger.

Then we use PCA method to reduce original 35 dimensions to 10 dimensions. We plot each pair of dimensions from the first 3 dimensions in the figure below. In each subfigure, we plot samples based on a pair of dimensions. From the scatter plots, we can find that some patterns can be easily figured out, because they are clustered in an isolated group. Some patterns are mixed in space, which is difficult to be discriminated.

The precisions of classifiers based on PCA are shown in Figure below. The result shows that the classifiers based on PCA performed better than non-PCA classifiers when LED has a low possibility in defection. When p is increased, the BP network is more robust. It shows better performance in high error probability.