<  Retour au portail Polytechnique Montréal

Restructuring source code identifiers

Laleh Mousavi Eshkevari

Rapport technique (2010)

Document en libre accès dans PolyPublie et chez l'éditeur officiel
[img]
Affichage préliminaire
Libre accès au plein texte de ce document
Version officielle de l'éditeur
Conditions d'utilisation: Tous droits réservés
Télécharger (1MB)
Afficher le résumé
Cacher le résumé

Abstract

In software engineering, maintenance cost 60% of overall project lifecycle costs of any software product. Program comprehension is a substantial part of maintenance and evolution cost and, thus, any advancement in maintenance, evolution, and program understanding will potentially greatly reduce the total cost of ownership of any software products. Identifiers are an important source of information during program understanding and maintenance. Programmers often use identifiers to build their mental models of the software artifacts. Thus, poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults, and hence maintenance effort. We investigate our conjecture using a measure combining term entropy and term context-coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. We compute term entropy and context-coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context-coverage are more fault-prone than others, and that the new measure is only partially correlated with size. We will build on this study, and will apply summarization technique for extracting linguistic information form methods and classes. Using this information, we will extract domain concepts from source code, and propose linguistic based refactoring.

Sujet(s): 2700 Technologie de l'information > 2700 Technologie de l'information
2700 Technologie de l'information > 2705 Logiciels et développement
2700 Technologie de l'information > 2706 Génie logiciel
Département: Département de génie informatique et génie logiciel
URL de PolyPublie: https://publications.polymtl.ca/2660/
Numéro du rapport: EPM-RT-2010-11
Date du dépôt: 06 oct. 2017 14:12
Dernière modification: 05 avr. 2024 10:35
Citer en APA 7: Eshkevari, L. M. (2010). Restructuring source code identifiers. (Rapport technique n° EPM-RT-2010-11). https://publications.polymtl.ca/2660/

Statistiques

Total des téléchargements à partir de PolyPublie

Téléchargements par année

Provenance des téléchargements

Actions réservées au personnel

Afficher document Afficher document