<  Back to the Polytechnique Montréal portal

Restructuring source code identifiers

Laleh Mousavi Eshkevari

Technical Report (2010)

Published Version
Terms of Use: All rights reserved.
Download (1MB)
Cite this document: Eshkevari, L. M. (2010). Restructuring source code identifiers (Technical Report n° EPM-RT-2010-11).
Show abstract Hide abstract


In software engineering, maintenance cost 60% of overall project lifecycle costs of any software product. Program comprehension is a substantial part of maintenance and evolution cost and, thus, any advancement in maintenance, evolution, and program understanding will potentially greatly reduce the total cost of ownership of any software products. Identifiers are an important source of information during program understanding and maintenance. Programmers often use identifiers to build their mental models of the software artifacts. Thus, poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults, and hence maintenance effort. We investigate our conjecture using a measure combining term entropy and term context-coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. We compute term entropy and context-coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context-coverage are more fault-prone than others, and that the new measure is only partially correlated with size. We will build on this study, and will apply summarization technique for extracting linguistic information form methods and classes. Using this information, we will extract domain concepts from source code, and propose linguistic based refactoring.

Open Access document in PolyPublie
Subjects: 2700 Technologie de l'information > 2700 Technologie de l'information
2700 Technologie de l'information > 2705 Logiciels et développement
2700 Technologie de l'information > 2706 Génie logiciel
Department: Département de génie informatique et génie logiciel
Research Center: Non applicable
Date Deposited: 06 Oct 2017 14:12
Last Modified: 16 Jun 2021 17:09
PolyPublie URL: https://publications.polymtl.ca/2660/
Document issued by the official publisher
Report number: EPM-RT-2010-11


Total downloads

Downloads per month in the last year

Origin of downloads

Repository Staff Only