<  Retour au portail Polytechnique Montréal

An insight into racial bias in dermoscopy repositories: A HAM10000 data set analysis

Andrés Morales-Forero, Lili Rueda Jaime, Sebastian Ramiro Gil‐Quiñones, Marlon Yesid Barrera Montañez, Samuel Bassetto et Éric Coatanéa

Article de revue (2024)

Document en libre accès dans PolyPublie et chez l'éditeur officiel
[img]
Affichage préliminaire
Libre accès au plein texte de ce document
Version officielle de l'éditeur
Conditions d'utilisation: Creative Commons: Attribution-Pas d'utilisation commerciale-Pas de modification (CC BY-NC-ND)
Télécharger (1MB)
Afficher le résumé
Cacher le résumé

Abstract

Background Studies have revealed a lack of representation of skin of colour patients in academic sources of dermatologic diseases, including databases. This visual racism has consequently generated less comfort and confidence among the specialists in the care and attention of this ethnic group, including the opportunity of being correctly diagnosed.

Objectives To investigate and uncover potential racial biases in the HAM10000 data set through an exploratory analysis of the dark skin tones representation, the identification of inaccuracies in its documentation, the recognition of relevant skin conditions absent for darker skin and the lack of ethnic diversity variables crucial for validating diagnosis across different skin tones.

Methods An exploratory examination was conducted to investigate the occurrence of dark skin within the HAM10000 database (housed in a Harvard Dataverse repository), consisting of 10,015 dermoscopic images of skin lesions. A visual depiction encompassing the whole skin tones was generated by sampling four crucial data points from each image and applying the Gray World Algorithm for colour normalization. To confirm the accuracy of the graphical representation, dermatologists validated the pixel sampling process by analysing a randomly selected 10% of the images for each type of skin lesion. This visual representation was produced for the entire data set as well as for each skin lesion type. The study was further enhanced by comparing the skin lesion representation within the HAM10000 data set against documented prevalences of relevant conditions affecting dark skin.

Results Less than 5% of the images came from dark-skinned patients. Nevertheless, in about 4.9% of cases, our pixel sampling method might inadvertently capture shadows or dark spots resulting from the imaging device or the lesion itself rather than the individual's actual skin tone. In addition, there are inaccuracies in the data set's claims of diversity and comprehensive coverage, notably the underrepresentation of conditions prevalent in darker skin and the absence of ethnic diversity variables.

Conclusions Visual racism is an issue that needs to be addressed in medical sources of information and education. Image databases and artificial intelligence models need to be nourished with information, including all skin types, to guarantee equal access to opportunities. Furthermore, any instances where conditions affecting people of colour are underrepresented must be meticulously documented and reported to highlight and address these disparities effectively. This is particularly important in dermoscopy imaging, where solely relying on image-based racial bias analysis is limited. The alteration of the patient's actual skin tone by the dermatoscope's lighting complicates the accurate assessment of racial bias.

Mots clés

dermatology data sets; diagnostic equity; fairness‐aware machine learning; inclusiveartificial intelligence; racial bias

Sujet(s): 1900 Génie biomédical > 1900 Génie biomédical
2950 Mathématiques appliquées > 2950 Mathématiques appliquées
Département: Département de mathématiques et de génie industriel
URL de PolyPublie: https://publications.polymtl.ca/58718/
Titre de la revue: JEADV Clinical Practice
Maison d'édition: Wiley
DOI: 10.1002/jvc2.477
URL officielle: https://doi.org/10.1002/jvc2.477
Date du dépôt: 17 juil. 2024 10:12
Dernière modification: 18 juil. 2024 07:10
Citer en APA 7: Morales-Forero, A., Jaime, L. R., Gil‐Quiñones, S. R., Montañez, M. Y. B., Bassetto, S., & Coatanéa, É. (2024). An insight into racial bias in dermoscopy repositories: A HAM10000 data set analysis. JEADV Clinical Practice, 477 (8 pages). https://doi.org/10.1002/jvc2.477

Statistiques

Total des téléchargements à partir de PolyPublie

Téléchargements par année

Provenance des téléchargements

Dimensions

Actions réservées au personnel

Afficher document Afficher document