<  Back to the Polytechnique Montréal portal

Automatic derivation of concepts based on the analysis of identifiers

Latifa Guerrouj

Technical Report (2010)

Open Acess document in PolyPublie and at official publisher
Open Access to the full text of this document
Published Version
Terms of Use: Tous droits réservés
Download (705kB)
Show abstract
Hide abstract


The existing software engineering literature has empirically shown that a properchoice of identifiers influences software understandability and maintainability. Indeed,identifiers are developers' main up-to-date source of information and guidetheir cognitive processes during program understanding when the high-level documentationis scarce or outdated and when the source code is not sufficiently commented.Deriving domain terms from identifiers using high-level and domain conceptsis not an easy task when naming conventions (e.g., Camel Case) are not used orstrictly followed and–or when these words have been abbreviated or otherwise transformed.Our thesis aims at developing a contextual approach that overcomes theshortcomings of the existing approaches and maps identifiers to domain conceptseven in the absence of naming conventions and–or the presence of abbreviations.We also aim to take advantage of our approach to enhance the predictability of theoverall system quality by using identifiers when assessing software quality.The key components of our approach are: dynamic time warping algorithm(DTW) used to recognize words in continuous speech, string-edit distance betweenterms and words as a proxy for the distance between the terms and the conceptsthey represent, plus words transformations rules attempting to mimic the cognitiveprocesses of developers when composing identifiers with abbreviated forms.To validate our approach, we apply it to identifiers extracted from differentopen source applications to show that our method is able to provide a mapping ofidentifiers to domain terms, compare it with the two families of approaches thatto the best of our knowledge, exist in the literature with respect to an oracle thatwe have manually built. We also enrich our technique by using domain knowledgeand context-aware dictionaries to analyze how sensitive are the performances ofour approach to the use of contextual information and specialized knowledge.

Uncontrolled Keywords

Identifier Splitting, Program Comprehension, Linguistic Analysis, Software Quality

Subjects: 2700 Information technology > 2700 Information technology
2700 Information technology > 2705 Software and development
2700 Information technology > 2706 Software engineering
Department: Department of Computer Engineering and Software Engineering
PolyPublie URL: https://publications.polymtl.ca/2658/
Report number: EPM-RT-2010-09
Date Deposited: 06 Oct 2017 14:07
Last Modified: 11 Nov 2022 14:12
Cite in APA 7: Guerrouj, L. (2010). Automatic derivation of concepts based on the analysis of identifiers (Technical Report n° EPM-RT-2010-09). https://publications.polymtl.ca/2658/


Total downloads

Downloads per month in the last year

Origin of downloads

Repository Staff Only

View Item View Item