Improving the K value in clustering with the K-NN algorithm by incorporating the expectation maximization algorithm.
M Poornima Poornima
Paper Contents
Abstract
Data stands as the cornerstone of any study, and the research outcomes directly correlate with the data quality employed. The presence of missing data, denoting the absence of a value for a specific attribute in the dataset, poses a significant challenge. Researchers commonly turn to the k-nearest Neighbor (KNN) method to address this issue. However, KNN has drawbacks, particularly in selecting an appropriate value for k, which can impact classification performance.The accuracy of classification results in KNN is influenced by parameters such as the choice of k. Using more than one k parameter involves employing majority voting to determine the classification outcomes. A k value of 1 in KNN leads to tightly bound results, relying solely on the nearest neighbor for classification. Conversely, a higher k value in KNN results in more diffuse classification outcomes.This study aims to optimize the k parameters in UN tax clustering using the Expectation Maximization (EM) algorithm. The research delivers clustering information, considering optimized k values and those without optimization. Subsequent analysis of the clustered data reveals that the EM algorithm's k optimization enhances cluster results, reducing the error rate from 66% to 64%. Although not reaching the pinnacle of measurement accuracy, this improvement signifies a notable advancement in cluster outcome quality.
Copyright
Copyright © 2024 M Poornima. This is an open access article distributed under the Creative Commons Attribution License.