<  Retour au portail Polytechnique Montréal

Automatic derivation of concepts based on the analysis of identifiers

Latifa Guerrouj

Rapport technique (2010)

Document en libre accès dans PolyPublie et chez l'éditeur officiel
[img]
Affichage préliminaire
Libre accès au plein texte de ce document
Version officielle de l'éditeur
Conditions d'utilisation: Tous droits réservés
Télécharger (705kB)
Afficher le résumé
Cacher le résumé

Abstract

The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Indeed, identifiers are developers' main up-to-date source of information and guide their cognitive processes during program understanding when the high-level documentation is scarce or outdated and when the source code is not sufficiently commented. Deriving domain terms from identifiers using high-level and domain concepts is not an easy task when naming conventions (e.g., Camel Case) are not used or strictly followed and–or when these words have been abbreviated or otherwise transformed. Our thesis aims at developing a contextual approach that overcomes the shortcomings of the existing approaches and maps identifiers to domain concepts even in the absence of naming conventions and–or the presence of abbreviations. We also aim to take advantage of our approach to enhance the predictability of the overall system quality by using identifiers when assessing software quality. The key components of our approach are: dynamic time warping algorithm (DTW) used to recognize words in continuous speech, string-edit distance between terms and words as a proxy for the distance between the terms and the concepts they represent, plus words transformations rules attempting to mimic the cognitive processes of developers when composing identifiers with abbreviated forms. To validate our approach, we apply it to identifiers extracted from different open source applications to show that our method is able to provide a mapping of identifiers to domain terms, compare it with the two families of approaches that to the best of our knowledge, exist in the literature with respect to an oracle that we have manually built. We also enrich our technique by using domain knowledge and context-aware dictionaries to analyze how sensitive are the performances of our approach to the use of contextual information and specialized knowledge.

Mots clés

Identifier Splitting, Program Comprehension, Linguistic Analysis, Software Quality

Sujet(s): 2700 Technologie de l'information > 2700 Technologie de l'information
2700 Technologie de l'information > 2705 Logiciels et développement
2700 Technologie de l'information > 2706 Génie logiciel
Département: Département de génie informatique et génie logiciel
URL de PolyPublie: https://publications.polymtl.ca/2658/
Numéro du rapport: EPM-RT-2010-09
Date du dépôt: 06 oct. 2017 14:07
Dernière modification: 25 sept. 2024 19:08
Citer en APA 7: Guerrouj, L. (2010). Automatic derivation of concepts based on the analysis of identifiers. (Rapport technique n° EPM-RT-2010-09). https://publications.polymtl.ca/2658/

Statistiques

Total des téléchargements à partir de PolyPublie

Téléchargements par année

Provenance des téléchargements

Actions réservées au personnel

Afficher document Afficher document