TY - JOUR
T1 - PFA-Nipals
T2 - An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares
AU - Castillo-Ibarra, Emilio
AU - Alsina, Marco A.
AU - Astudillo, Cesar A.
AU - Fuenzalida-Henríquez, Ignacio
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/10
Y1 - 2023/10
N2 - Unsupervised feature selection (UFS) has received great interest in various areas of research that require dimensionality reduction, including machine learning, data mining, and statistical analysis. However, UFS algorithms are known to perform poorly on datasets with missing data, exhibiting a significant computational load and learning bias. In this work, we propose a novel and robust UFS method, designated PFA-Nipals, that works with missing data without the need for deletion or imputation. This is achieved by considering an iterative nonlinear estimation of principal components by partial least squares, while the relevant features are selected through minibatch K-means clustering. The proposed method is successfully applied to select the relevant features of a robust health dataset with missing data, outperforming other UFS methods in terms of computational load and learning bias. Furthermore, the proposed method is capable of finding a consistent set of relevant features without biasing the explained variability, even under increasing missing data. Finally, it is expected that the proposed method could be used in several areas, such as machine learning and big data with applications in different areas of the medical and engineering sciences.
AB - Unsupervised feature selection (UFS) has received great interest in various areas of research that require dimensionality reduction, including machine learning, data mining, and statistical analysis. However, UFS algorithms are known to perform poorly on datasets with missing data, exhibiting a significant computational load and learning bias. In this work, we propose a novel and robust UFS method, designated PFA-Nipals, that works with missing data without the need for deletion or imputation. This is achieved by considering an iterative nonlinear estimation of principal components by partial least squares, while the relevant features are selected through minibatch K-means clustering. The proposed method is successfully applied to select the relevant features of a robust health dataset with missing data, outperforming other UFS methods in terms of computational load and learning bias. Furthermore, the proposed method is capable of finding a consistent set of relevant features without biasing the explained variability, even under increasing missing data. Finally, it is expected that the proposed method could be used in several areas, such as machine learning and big data with applications in different areas of the medical and engineering sciences.
KW - Nipals
KW - clustering
KW - interpretability
KW - missing data
KW - unsupervised feature selection
UR - http://www.scopus.com/inward/record.url?scp=85176395196&partnerID=8YFLogxK
U2 - 10.3390/math11194154
DO - 10.3390/math11194154
M3 - Article
AN - SCOPUS:85176395196
SN - 2227-7390
VL - 11
JO - Mathematics
JF - Mathematics
IS - 19
M1 - 4154
ER -