Rapport technique (2010)
Document en libre accès dans PolyPublie et chez l'éditeur officiel |
|
Libre accès au plein texte de ce document Version officielle de l'éditeur Conditions d'utilisation: Tous droits réservés Télécharger (1MB) |
Abstract
In software engineering, maintenance cost 60% of overall project lifecycle costs of any software product. Program comprehension is a substantial part of maintenance and evolution cost and, thus, any advancement in maintenance, evolution, and program understanding will potentially greatly reduce the total cost of ownership of any software products. Identifiers are an important source of information during program understanding and maintenance. Programmers often use identifiers to build their mental models of the software artifacts. Thus, poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults, and hence maintenance effort. We investigate our conjecture using a measure combining term entropy and term context-coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. We compute term entropy and context-coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context-coverage are more fault-prone than others, and that the new measure is only partially correlated with size. We will build on this study, and will apply summarization technique for extracting linguistic information form methods and classes. Using this information, we will extract domain concepts from source code, and propose linguistic based refactoring.
Sujet(s): |
2700 Technologie de l'information > 2700 Technologie de l'information 2700 Technologie de l'information > 2705 Logiciels et développement 2700 Technologie de l'information > 2706 Génie logiciel |
---|---|
Département: | Département de génie informatique et génie logiciel |
URL de PolyPublie: | https://publications.polymtl.ca/2660/ |
Numéro du rapport: | EPM-RT-2010-11 |
Date du dépôt: | 06 oct. 2017 14:12 |
Dernière modification: | 03 oct. 2024 19:03 |
Citer en APA 7: | Eshkevari, L. M. (2010). Restructuring source code identifiers. (Rapport technique n° EPM-RT-2010-11). https://publications.polymtl.ca/2660/ |
---|---|
Statistiques
Total des téléchargements à partir de PolyPublie
Téléchargements par année
Provenance des téléchargements