<  Back to the Polytechnique Montréal portal

An insight into racial bias in dermoscopy repositories: A HAM10000 data set analysis

Andrés Morales-Forero, Lili Rueda Jaime, Sebastian Ramiro Gil‐Quiñones, Marlon Yesid Barrera Montañez, Samuel Bassetto and Éric Coatanéa

Article (2024)

Open Acess document in PolyPublie and at official publisher
[img]
Preview
Open Access to the full text of this document
Published Version
Terms of Use: Creative Commons Attribution Non-commercial No Derivatives
Download (1MB)
Show abstract
Hide abstract

Abstract

Background Studies have revealed a lack of representation of skin of colour patients in academic sources of dermatologic diseases, including databases. This visual racism has consequently generated less comfort and confidence among the specialists in the care and attention of this ethnic group, including the opportunity of being correctly diagnosed.

Objectives To investigate and uncover potential racial biases in the HAM10000 data set through an exploratory analysis of the dark skin tones representation, the identification of inaccuracies in its documentation, the recognition of relevant skin conditions absent for darker skin and the lack of ethnic diversity variables crucial for validating diagnosis across different skin tones.

Methods An exploratory examination was conducted to investigate the occurrence of dark skin within the HAM10000 database (housed in a Harvard Dataverse repository), consisting of 10,015 dermoscopic images of skin lesions. A visual depiction encompassing the whole skin tones was generated by sampling four crucial data points from each image and applying the Gray World Algorithm for colour normalization. To confirm the accuracy of the graphical representation, dermatologists validated the pixel sampling process by analysing a randomly selected 10% of the images for each type of skin lesion. This visual representation was produced for the entire data set as well as for each skin lesion type. The study was further enhanced by comparing the skin lesion representation within the HAM10000 data set against documented prevalences of relevant conditions affecting dark skin.

Results Less than 5% of the images came from dark-skinned patients. Nevertheless, in about 4.9% of cases, our pixel sampling method might inadvertently capture shadows or dark spots resulting from the imaging device or the lesion itself rather than the individual's actual skin tone. In addition, there are inaccuracies in the data set's claims of diversity and comprehensive coverage, notably the underrepresentation of conditions prevalent in darker skin and the absence of ethnic diversity variables.

Conclusions Visual racism is an issue that needs to be addressed in medical sources of information and education. Image databases and artificial intelligence models need to be nourished with information, including all skin types, to guarantee equal access to opportunities. Furthermore, any instances where conditions affecting people of colour are underrepresented must be meticulously documented and reported to highlight and address these disparities effectively. This is particularly important in dermoscopy imaging, where solely relying on image-based racial bias analysis is limited. The alteration of the patient's actual skin tone by the dermatoscope's lighting complicates the accurate assessment of racial bias.

Uncontrolled Keywords

dermatology data sets; diagnostic equity; fairness‐aware machine learning; inclusiveartificial intelligence; racial bias

Subjects: 1900 Biomedical engineering > 1900 Biomedical engineering
2950 Applied mathematics > 2950 Applied mathematics
Department: Department of Mathematics and Industrial Engineering
PolyPublie URL: https://publications.polymtl.ca/58718/
Journal Title: JEADV Clinical Practice
Publisher: Wiley
DOI: 10.1002/jvc2.477
Official URL: https://doi.org/10.1002/jvc2.477
Date Deposited: 17 Jul 2024 10:12
Last Modified: 18 Jul 2024 07:10
Cite in APA 7: Morales-Forero, A., Jaime, L. R., Gil‐Quiñones, S. R., Montañez, M. Y. B., Bassetto, S., & Coatanéa, É. (2024). An insight into racial bias in dermoscopy repositories: A HAM10000 data set analysis. JEADV Clinical Practice, 477 (8 pages). https://doi.org/10.1002/jvc2.477

Statistics

Total downloads

Downloads per month in the last year

Origin of downloads

Dimensions

Repository Staff Only

View Item View Item