<  Back to the Polytechnique Montréal portal

A Lagrangian-based score for assessing the quality of pairwise constraints in semi-supervised clustering

Rodrigo Randel, Daniel Aloise, Simon J. Blanchard, Alain Hertz

Article (2021)

Open Access document in PolyPublie
[img]
Preview
Open Access to the full text of this document
Accepted Version
Terms of Use: All rights reserved
Download (1MB)
Show abstract
Hide abstract

Abstract

Clustering algorithms help identify homogeneous subgroups from data. In some cases, additional information about the relationship among some subsets of the data exists. When using a semi-supervised clustering algorithm, an expert may provide additional information to constrain the solution based on that knowledge and, in doing so, guide the algorithm to a more useful and meaningful solution. Such additional information often takes the form of a cannot-link constraint (i.e., two data points cannot be part of the same cluster) or a must-link constraint (i.e., two data points must be part of the same cluster). A key challenge for users of such constraints in semi-supervised learning algorithms, however, is that the addition of inaccurate or conflicting constraints can decrease accuracy and little is known about how to detect whether expert-imposed constraints are likely incorrect. In the present work, we introduce a method to score each must-link and cannot-link pairwise constraint as likely incorrect. Using synthetic experimental examples and real data, we show that the resulting impact score can successfully identify individual constraints that should be removed or revised.

Uncontrolled Keywords

Clustering, Semi-supervised, Pairwise constraints, Constraint selection, Lagrangian duality

Subjects: 2700 Information technology > 2706 Software engineering
2700 Information technology > 2713 Algorithms
Department: Department of Computer Engineering and Software Engineering
Department of Mathematics and Industrial Engineering
Research Center: GERAD - Research Group in Decision Analysis
Funders: GRSNG / NSERC
Grant number: 2017-05617, 2017-05688
PolyPublie URL: https://publications.polymtl.ca/10831/
Journal Title: Data Mining and Knowledge Discovery (vol. 35, no. 6)
Publisher: Springer Nature
DOI: 10.1007/s10618-021-00794-0
Official URL: https://doi.org/10.1007/s10618-021-00794-0
Date Deposited: 14 Mar 2023 11:41
Last Modified: 15 May 2023 06:21
Cite in APA 7: Randel, R., Aloise, D., Blanchard, S. J., & Hertz, A. (2021). A Lagrangian-based score for assessing the quality of pairwise constraints in semi-supervised clustering. Data Mining and Knowledge Discovery, 35(6), 2341-2368. https://doi.org/10.1007/s10618-021-00794-0

Statistics

Total downloads

Downloads per month in the last year

Origin of downloads

Dimensions

Repository Staff Only

View Item View Item