Fig Illustration of the centroids
Fig. 9. Illustration of the centroids extracted by FCM and BPCM.
Average and optimal performance of basic classifiers on data
(Average/Optimal) Mean FCM BPCM
unlikely to be infected by Ko 143 and vice versa. However, such patterns are not observed in the centroids extracted by FCM, and as shown in Table 4, FCM tends to yield a mediocre result that contains less decisive information for Boolean attributes.
To examine the correctness of the above patterns explored by BPCM, we have browsed the dataset and computed the following conditional probabilities:
where x(m) denotes the mth component of x. And (18) indicates that the data distribution is in accordance with the patterns that BPCM discovered.
5.3. Classification performance by the weak learners
The performance of the weak learners was measured using a ten-fold cross-validation procedure. The dataset with the completed attributes was first separated into ten subsets. For each fold, nine subsets are merged and sampled into a bal-anced training set. The classification result of the test set was recorded. To account for possible bias in the selection of the training samples, the entire procedure was repeated for five times and there were overall 5 × 10 = 50 sets of experiments consducted.
During each test, three kinds of weak learners (K = 3), including the naive Bayes classifier  with continuous variables, the k-nearest-neighbor  with k = 5, and the logistic regression algorithm  were adopted. Since the training sets at this stage were balanced, only the accuracy metric is adopted for evaluation.
Table 5 has shown the classification results using different weak learner on the completed data obtained by various missing attribute estimation approaches. Each entry in Table 5 is an average result calculated from fifty experiments. From
Fig. 10. Classification performance by the fuzzy ensemble leaner using various selections of H.
the table, it is observed that for the naive Bayes classifier, the mean substitution approach achieves the best performance. It is mainly because the naive Bayes algorithm classifies a sample by comparing its distance to the centers of two classes. For the clustering based missing attribute estimation approaches, the completed data are gathered around multiple (four) centroids, which may confuse the naive Bayes classification algorithm. Even so, the classification accuracy by BPCM is com-parable to that obtained by mean substitution. On the other hand, for the kNN and logistic regression, the performance on the completed data by the BPCM algorithm usually outperforms those on the data set completed by the other two ap-proaches. The highest classification result on the class-balanced subsets was obtained by the kNN algorithm on the BPCM completed dataset.
5.4. Classification performance by the fuzzy ensemble learning algorithm
In this subsection, the classification performance using the fuzzy ensemble learning algorithm is evaluated. A five-fold cross validation is adopted, i.e., 80% of samples in the dataset are used to train the classifier and the remaining 20% of samples are used for testing. The experiment is repeated for five folds and the average results are recorded. To select an appropriate number of rules, various choices of H was examined and the classification performance in terms of the accuracy and positive-sensitivity is given in Fig. 10(a) and (b) respectively. From the figure, the following observations can be made:
1. There is a trade-off between the accuracy and positive-sensitivity and thus for different kinds of applications, different numbers of rules should be adopted. In general high positive-sensitivity is preferred because we would like to discover the potential cervical cancer patient as early as possible and H = 4 is the most proper setting;