Unsupervised approach for an optimal representation of the latent space of a failure analysis dataset

Microelectronics production failure analysis is an important step in improving product quality and development. In fact, the understanding of the failure mechanisms and therefore the implementation of corrective actions on the cause of the failure depend on the results of this analysis. These analyses are saved under textual features format. Then such data need first to be preprocessed and vectorized (converted to numeric). Second, to overcome the curse of dimensionality caused by the vectorisation process, a dimension reduction is applied. A two-stage variable selection and feature extraction is used to reduce the high dimensionality of a feature space. We are first interested in studying the potential of using an unsupervised variable selection technique, the genetic algorithm, to identify the variables that best demonstrate discrimination in the separation and compactness of groups of textual data. The genetic algorithm uses a combination of the K-means or Gaussian Mixture Model clustering and validity indices as a fitness function for optimization. Such a function improves both compactness and class separation. The second work looks into the feasibility of applying a feature extraction technique. The adopted methodology is a Deep learning algorithm based on variational autoencoder (VAE) for latent space disentanglement and Gaussian Mixture Model for clustering of the latent space for cluster identification. The last objective of this paper is to propose a new methodology based on the combination between variational autoencoder (VAE) for the latent space disentanglement, and genetic algorithm (GA) to find, in an unsupervised way, the latent variables allowing the best discrimination of clusters of failure analysis data. This methodology is called VAE-GA. Experiments on textual datasets of failure analysis demonstrate the effectiveness of the VAE-GA proposed method which allows better discrimination of textual classes compared to the use of GA or VAE separately or the combination of PCA with GA (PCA-GA) or a simple Auto-encoders with GA (AE-GA).

Mots clés

Variational auto-encoder Natural language processing Artifcial intelligence Unsupervised variable selection Failure analysis Genetic algorithm Feature extraction

Domaines

Sciences de l'ingénieur [physics] Mathématiques [math]

Fichier principal

Unsupervised approach for an optimal representation of the latent space of a failure analysis dataset.pdf (1.76 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
Licence	Paternité

Florent Breuil : Connectez-vous pour contacter le contributeur

https://hal-emse.ccsd.cnrs.fr/emse-04243857

Soumis le : lundi 16 octobre 2023-13:18:55

Dernière modification le : jeudi 21 novembre 2024-15:26:33

Archivage à long terme le : mercredi 17 janvier 2024-19:58:18

Dates et versions

emse-04243857 , version 1 (16-10-2023)

Licence

Paternité

Identifiants

HAL Id : emse-04243857 , version 1
DOI : 10.1007/s11227-023-05634-0

Citer

Abbas Rammal, Kenneth Ezukwoke, Anis Hoayek, Mireille Batton-Hubert. Unsupervised approach for an optimal representation of the latent space of a failure analysis dataset. Journal of Supercomputing, 2024, ⟨10.1007/s11227-023-05634-0⟩. ⟨emse-04243857⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EMSE PRES_CLERMONT CNRS FAYOL-ENSMSE LIMOS DEMO-ENSMSE CLERMONT-AUVERGNE-INP INSTITUT-MINES-TELECOM

97 Consultations

54 Téléchargements