The bootstrap procedure for selecting the number of principal components in PCA
Abstract
The initial step in determining the number of principal components for both classification and regression involves evaluating how much each component contributes to the total variance in the data. Based on this analysis, a subset of components that explains the highest percentage of variance is typically selected. However, multiple valid combinations may exist, and the final choice is often made manually by the researcher. This study introduces a novel yet straightforward algorithm for the automatic selection of the number of principal components. By integrating ANOVA and bootstrapping with principal component analysis (PCA), the proposed method enables automatic component selection in classification tasks. The algorithm is evaluated using three publicly available datasets and applied with both decision tree and support vector machine (SVM) classifiers. Results indicate that this automated procedure not only eliminates researcher bias in selecting components but also improves classification accuracy. Unlike traditional methods, it selects a single optimal combination of principal components without manual intervention, offering a new and efficient approach to PCAbased model development.
Keywords
ANOVA; Bootstrap; Decision trees; Principal component analysis; Support vector machines
Full Text:
PDFDOI: http://doi.org/10.11591/ijict.v14i3.pp1136-1145
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Borislava Petrova Toleva
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The International Journal of Informatics and Communication Technology (IJ-ICT)
p-ISSN 2252-8776, e-ISSNĀ 2722-2616
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).