[isabelle] 1 year fellowship for a doctoral researcher on program analysis

(apologies for multiple postings)


The University of Namur, Belgium, is seeking a young researcher
(no PhD) for a 1 year research project entitled

   "Automatic program analysis for refactoring logic programs".

A short description follows. Interested candidates should send their CV before September 15, 2008 to Wim Vanhoof (wva at info.fundp.ac.be).


Program refactoring [3] is the process of systematically changing the structure of a program without changing its semantics. The goal of refactoring is to improve the design of the code after it has been written, in order to facilitate maintenance (including further development) of the software. Emerged from the OO and XP communities, program refactoring has recently gained attention in the fields of functional [8] and logic
programming [11]. Within the software engineering community, the process of
refactoring is considered important and has been identified as central to software
development and maintenance [3].

At the basis of the refactoring process is a catalogue of available source-to-source transformations - the so-called refactorings. For each refactoring, a set of conditions is specified under which the transformation is correct in the sense that it preserves the semantics of the program. The activity of refactoring consists then in repeatedly searching through the source code, looking for a code fragment of which the design could be improved by a particular refactoring from the catalogue. The refactoring is subsequently applied, and the whole process is repeated. Although each transformation can have an impact (positive or negative) on the performance of the program, the primary aim of each transformation is to improve the readability and maintainability of the code. Even if refactoring is basically a manual process that is performed by the programmer, the need for automation is recognized due to the time-consuming and error-prone nature of the refactoring activity [3, 8, 10]. Although tools do exist that aid the developer with performing a particular refactoring on a selected fragment of source code - an example being the Refactoring Browser that was developed for Smalltalk [10] - identifying where to perform (a particular) refactoring in the code remains essentially a non-trivial, creative
and therefore manual process.

Although a substantial number of different refactorings has been identified in the literature, the primary indication for where to perform refactoring seems to be the presence of duplicated code [3]. The notion of duplicated code is not limited to literally copied code, but refers to fragments of code that have the same input/output behaviour. The goal of refactoring is then to transform the source code in such a way that the duplication is removed. This generally requires to extract the duplicated input/output behaviour into a new subroutine (be it a method, function or predicate), which may require a generalisation of the concerned code fragments and a possible reorganisation of
the code as a whole [3, 11].

Code duplication can be caused by a number of reasons. First of all, unfamiliarity of the developer with the existing code body may result in reimplementation of routines that already exist in the application under development. Second, the “copy and paste” technique is commonly used when the existing functionality has to be slightly adapted. Even if in this case one usually does not end up literally duplicating input/output behaviour, the changes introduced by adaptation are usually relatively minor and a generalization of the original and the adapted fragments could be often proposed. Finally,
code duplication might result from a polyvariant program analysis.

The problem of deciding whether two code fragments implement the same functionality is well-known to be undecidable. Nevertheless, automatic program analysis techniques have been developed, notably in the context of imperative and object-oriented languages, that are capable to detect such equivalence under particular circumstances and within a certain error margin, e.g. [1, 5, 6, 7, 9]. Also related is the work on plagiarism detection for
programs written in such languages, e.g. [4, 14, 13].

In this project, we will investigate the automatic detection of duplicated code in declarative programming languages, in particular logic programming languages such as Prolog and Mercury. Declarative languages allow the developer to program at a much higher level of abstraction than it is the case with most imperative languages. In a declarative language, one describes properties of the desired solution rather than the
actual algorithm that should be used to find the solution.

We aim to develop an analysis that allows to find, without user intervention, duplicated code fragments into a logic program and we will study if and how the results of this analysis can be used to drive a number of refactorings to remove the unwanted duplication from the program. These refactorings include predicate extraction (replacing duplicated code fragments by a call to a newly defined predicate), the removal of predicates implementing the same relation as another predicate and the generalization of duplicated code fragments into calls to newly generated higher-order predicate [11, 12]. The development of such a duplicated code analysis for logic programs, and its integration with refactoring, presents some interesting research opportunities. Firstly, the declarative nature of logic programs makes it not straightforward to adapt the methods developed in an imperative (or object-oriented) setting. Secondly, the fact that logic programs have a small and formally well-defined semantics and use an explicit symbolic data representation makes the use of advanced analyses possible. Therefore it might well be possible to obtain more fine-grained results than is the case for imperative languages. Finally, it might be worthwhile to investigate how the developed analysis could be used in
the context of plagiarism detection for logic programs.

[1] B. S. Baker. On finding duplication and near-duplication in large software systems. In Proc. Second
IEEE Working Conference on Reverse Engineering, pages 86-95, July 1995.
[2] F. Degrave and W. Vanhoof. Towards a normal form for Mercury programs. In A. King, ed. LOPSTR 2007, volume 4915 of Lecture notes in computer science, pages 43-58. Springer-Verlag, 2007. [3] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts. Refactoring : Improving the Design of
Existing Code. Object Technology Series. Addison-Wesley, 1999.
[4] S. Horwitz. Identifying the semantic and textual differences between two versions of a program. ACM
SIGPLAN Notices, 25(6) :234-245, 1990.
[5] T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder : A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Software Eng., 28(7): 654-670, 2002. [6] R. Komondoor and S. Horwitz. Using slicing to identify duplication in source code. In Static Analysis
Symposium, pages 40-56, 2001.
[7] K. Kontogiannis, R. de Mori, E. Merlo, M. Galler, and M. Bernstein. Pattern matching for clone and
concept detection. Autom. Softw. Eng., 3(1/2) :77-108, 1996.
[8] H. Li, C. Reinke, and S. Thompson. Tool support for refactoring functional programs. In J. Jeuring,
editor, ACM SIGPLAN 2003 Haskell Workshop. ACM 2003.
[9] J. Mayrand, C. Leblanc, and E. Merlo. Experiment on the automatic detection of function clones in a software system using metrics. In Intl. Conf. on Software Maintenance, pages 244-253, 1996. [10] D. Roberts, J. Brant, and R. E. Johnson. A refactoring tool for Smalltalk. Theory and Practice of Object
Systems (TAPOS), 3(4) :253-263, 1997.
[11] A. Serebrenik, T. Schrijvers and B. Demoen. Improving Prolog programs: Refactoring for Prolog.
Theory and practice of logic programming (Accepted) 2008.
[12] W. Vanhoof. Searching semantically equivalent code fragments in logic programs. In S. Etalle, editor, LOPSTR 2004, volume 3573 of Lecture notes in computer science, pages 1-18. Springer-Verlag, 2005. [13] J. Winstead and D. Evans. Towards differential program analysis. In Proceedings of the 2003
Workshop on Dynamic Analysis, 2003.
[14] W. Yang. Identifying syntactic differences between two programs. Software Practice and Experience,
21(7): 739-755, 1991.

This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.