PFA-Nipals: An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares

Emilio Castillo-Ibarra; Marco A. Alsina; Cesar A. Astudillo; Ignacio Fuenzalida-Henríquez

doi:10.3390/math11194154

PFA-Nipals: An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares

Emilio Castillo-Ibarra, Marco A. Alsina, Cesar A. Astudillo, Ignacio Fuenzalida-Henríquez^*

^*Autor correspondiente de este trabajo

Universidad de Talca

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

1 Cita (Scopus)

Resumen

Unsupervised feature selection (UFS) has received great interest in various areas of research that require dimensionality reduction, including machine learning, data mining, and statistical analysis. However, UFS algorithms are known to perform poorly on datasets with missing data, exhibiting a significant computational load and learning bias. In this work, we propose a novel and robust UFS method, designated PFA-Nipals, that works with missing data without the need for deletion or imputation. This is achieved by considering an iterative nonlinear estimation of principal components by partial least squares, while the relevant features are selected through minibatch K-means clustering. The proposed method is successfully applied to select the relevant features of a robust health dataset with missing data, outperforming other UFS methods in terms of computational load and learning bias. Furthermore, the proposed method is capable of finding a consistent set of relevant features without biasing the explained variability, even under increasing missing data. Finally, it is expected that the proposed method could be used in several areas, such as machine learning and big data with applications in different areas of the medical and engineering sciences.

Idioma original	Inglés
Número de artículo	4154
Publicación	Mathematics
Volumen	11
N.º	19
DOI	https://doi.org/10.3390/math11194154
Estado	Publicada - 2023

Nota bibliográfica

Publisher Copyright:
© 2023 by the authors.

Áreas temáticas de ASJC Scopus

Informática (miscelánea)
Matemáticas General
Ingeniería (miscelánea)

Acceder al documento

10.3390/math11194154

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{cb29883b044b4743963ae50642b2f276,

title = "PFA-Nipals: An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares",

abstract = "Unsupervised feature selection (UFS) has received great interest in various areas of research that require dimensionality reduction, including machine learning, data mining, and statistical analysis. However, UFS algorithms are known to perform poorly on datasets with missing data, exhibiting a significant computational load and learning bias. In this work, we propose a novel and robust UFS method, designated PFA-Nipals, that works with missing data without the need for deletion or imputation. This is achieved by considering an iterative nonlinear estimation of principal components by partial least squares, while the relevant features are selected through minibatch K-means clustering. The proposed method is successfully applied to select the relevant features of a robust health dataset with missing data, outperforming other UFS methods in terms of computational load and learning bias. Furthermore, the proposed method is capable of finding a consistent set of relevant features without biasing the explained variability, even under increasing missing data. Finally, it is expected that the proposed method could be used in several areas, such as machine learning and big data with applications in different areas of the medical and engineering sciences.",

keywords = "Nipals, clustering, interpretability, missing data, unsupervised feature selection",

author = "Emilio Castillo-Ibarra and Alsina, {Marco A.} and Astudillo, {Cesar A.} and Ignacio Fuenzalida-Henr{\'i}quez",

note = "Publisher Copyright: {\textcopyright} 2023 by the authors.",

year = "2023",

month = oct,

doi = "10.3390/math11194154",

language = "English",

volume = "11",

journal = "Mathematics",

issn = "2227-7390",

publisher = "MDPI AG",

number = "19",

}

TY - JOUR

T1 - PFA-Nipals

T2 - An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares

AU - Castillo-Ibarra, Emilio

AU - Alsina, Marco A.

AU - Astudillo, Cesar A.

AU - Fuenzalida-Henríquez, Ignacio

PY - 2023/10

Y1 - 2023/10

N2 - Unsupervised feature selection (UFS) has received great interest in various areas of research that require dimensionality reduction, including machine learning, data mining, and statistical analysis. However, UFS algorithms are known to perform poorly on datasets with missing data, exhibiting a significant computational load and learning bias. In this work, we propose a novel and robust UFS method, designated PFA-Nipals, that works with missing data without the need for deletion or imputation. This is achieved by considering an iterative nonlinear estimation of principal components by partial least squares, while the relevant features are selected through minibatch K-means clustering. The proposed method is successfully applied to select the relevant features of a robust health dataset with missing data, outperforming other UFS methods in terms of computational load and learning bias. Furthermore, the proposed method is capable of finding a consistent set of relevant features without biasing the explained variability, even under increasing missing data. Finally, it is expected that the proposed method could be used in several areas, such as machine learning and big data with applications in different areas of the medical and engineering sciences.

AB - Unsupervised feature selection (UFS) has received great interest in various areas of research that require dimensionality reduction, including machine learning, data mining, and statistical analysis. However, UFS algorithms are known to perform poorly on datasets with missing data, exhibiting a significant computational load and learning bias. In this work, we propose a novel and robust UFS method, designated PFA-Nipals, that works with missing data without the need for deletion or imputation. This is achieved by considering an iterative nonlinear estimation of principal components by partial least squares, while the relevant features are selected through minibatch K-means clustering. The proposed method is successfully applied to select the relevant features of a robust health dataset with missing data, outperforming other UFS methods in terms of computational load and learning bias. Furthermore, the proposed method is capable of finding a consistent set of relevant features without biasing the explained variability, even under increasing missing data. Finally, it is expected that the proposed method could be used in several areas, such as machine learning and big data with applications in different areas of the medical and engineering sciences.

KW - Nipals

KW - clustering

KW - interpretability

KW - missing data

KW - unsupervised feature selection

UR - http://www.scopus.com/inward/record.url?scp=85176395196&partnerID=8YFLogxK

U2 - 10.3390/math11194154

DO - 10.3390/math11194154

M3 - Article

AN - SCOPUS:85176395196

SN - 2227-7390

VL - 11

JO - Mathematics

JF - Mathematics

IS - 19

M1 - 4154

ER -

PFA-Nipals: An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares

Resumen

Nota bibliográfica

Áreas temáticas de ASJC Scopus

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto