<  Retour au portail Polytechnique Montréal

A Lagrangian-based score for assessing the quality of pairwise constraints in semi-supervised clustering

Rodrigo Randel, Daniel Aloise, Simon J. Blanchard et Alain Hertz

Article de revue (2021)

Document en libre accès dans PolyPublie
[img]
Affichage préliminaire
Libre accès au plein texte de ce document
Version finale avant publication
Conditions d'utilisation: Tous droits réservés
Télécharger (1MB)
Afficher le résumé
Cacher le résumé

Abstract

Clustering algorithms help identify homogeneous subgroups from data. In some cases, additional information about the relationship among some subsets of the data exists. When using a semi-supervised clustering algorithm, an expert may provide additional information to constrain the solution based on that knowledge and, in doing so, guide the algorithm to a more useful and meaningful solution. Such additional information often takes the form of a cannot-link constraint (i.e., two data points cannot be part of the same cluster) or a must-link constraint (i.e., two data points must be part of the same cluster). A key challenge for users of such constraints in semi-supervised learning algorithms, however, is that the addition of inaccurate or conflicting constraints can decrease accuracy and little is known about how to detect whether expert-imposed constraints are likely incorrect. In the present work, we introduce a method to score each must-link and cannot-link pairwise constraint as likely incorrect. Using synthetic experimental examples and real data, we show that the resulting impact score can successfully identify individual constraints that should be removed or revised.

Mots clés

Clustering, Semi-supervised, Pairwise constraints, Constraint selection, Lagrangian duality

Sujet(s): 2700 Technologie de l'information > 2706 Génie logiciel
2700 Technologie de l'information > 2713 Algorithmes
Département: Département de génie informatique et génie logiciel
Département de mathématiques et de génie industriel
Centre de recherche: GERAD - Groupe d'études et de recherche en analyse des décisions
Organismes subventionnaires: GRSNG / NSERC
Numéro de subvention: 2017-05617, 2017-05688
URL de PolyPublie: https://publications.polymtl.ca/10831/
Titre de la revue: Data Mining and Knowledge Discovery (vol. 35, no 6)
Maison d'édition: Springer Nature
DOI: 10.1007/s10618-021-00794-0
URL officielle: https://doi.org/10.1007/s10618-021-00794-0
Date du dépôt: 14 mars 2023 11:41
Dernière modification: 06 avr. 2024 07:27
Citer en APA 7: Randel, R., Aloise, D., Blanchard, S. J., & Hertz, A. (2021). A Lagrangian-based score for assessing the quality of pairwise constraints in semi-supervised clustering. Data Mining and Knowledge Discovery, 35(6), 2341-2368. https://doi.org/10.1007/s10618-021-00794-0

Statistiques

Total des téléchargements à partir de PolyPublie

Téléchargements par année

Provenance des téléchargements

Dimensions

Actions réservées au personnel

Afficher document Afficher document