Software engineering is expensive, summing to over one half of
one percent of the US GDP annually. Software maintenance accounts for
over two-thirds of that life cycle cost, and a key aspect of maintenance is
fixing bugs in existing programs. Unfortunately, the number of reported
bugs far outstrips available development resources. It is common for
a popular project to have hundreds of new bug reports filed every day.
Software maintenance is expensive.
GenProg reduces software maintenance costs by
automatically producing patches (repairs) for program defects.
Many bugs can be fixed with just a few changes to a program's source code.
Human repairs often involve inserting new code and deleting
or moving existing code. GenProg uses those same building blocks to search
for repairs automatically.
GenProg uses genetic programming to search for repairs.
Our evolutionary computation represents candidate repairs as sequences of
edits to software source code. Each candidate in a large population is applied
to the original program to produce a new program, which is evaluated using test
suites. Those candidates that pass more tests are said to have a
higher fitness and are iteratively subjected to computational
analogs of the biological processes of mutation and
crossover. This process terminates when a candidate repair is found
that retains all required functionality and fixes the bug. GenProg does not
require special code annotations or formal specifications, and applies to
unmodified legacy software.
GenProg has repaired many kinds of defects, including infinite loops,
segmentation violations, heap buffer overflows, non-overflow
denial-of-service vulnerabilities, stack buffer overflows, format string
vulnerabilities, integer overflows — as well as the ubiquitous
"my program produces the incorrect output" bug. While
we evaluate primarily on C programs, GenProg can also repair programs
in other languages, such as x86 or ARM assembly files or ELF binaries.
GenProg repaired 55 out of 105 bugs for $8 each in a recent
systematic study that included multiple programs, totaling over 5 million lines of code,
and supported by over 10,000 test cases.
GenProg has won multiple academic best paper awards (ICSE 2009,
GECCO 2009, SBST 2009). In addition, this work was awarded the
IFIP TC2
Manfred Paul Award "for excellence in software: theory and practice."
Finally, GenProg won gold (in 2009) and bronze (in 2012) honors at the
"Humies"
Awards for human-competitive results produced by genetic and
evolutionary computation.
Claire Le Goues, Stephanie Forrest, Westley Weimer:
The Case for Software Evolution.Foundations of Software Engineering Working Conference on the Future of
Software Engineering (FoSER) 2010: 205-209
This paper describes the relationship between human developers and GenProg-generated patches:
Zachary P. Fry, Bryan Landau, Westley Weimer:
A Human Study of Patch Maintainability. International Symposium on Software Testing and Analysis (ISSTA) 2012.
Eric Schulte, Jonathan Dorn, Stephen Harding, Stephanie Forrest, Westley
Weimer:
Post-compiler Software Optimization for
Reducing Energy. Architectural Support for Programming Languages and
Operating Systems (ASPLOS) 2014: 639-652
ThanhVu Nguyen,
Westley Weimer,
Claire Le Goues,
Stephanie Forrest.
Using Execution Paths to Evolve Software
Patches.Workshop on Search-Based Software Testing (SBST) 2009
(best short paper award)
(best presentation award)
Westley Weimer, ThanVu Nguyen, Claire Le Goues, Stephanie Forrest:
Automatically Finding Patches Using
Genetic Programming.International Conference on Software Engineering (ICSE) 2009:
364-374
(distinguished paper award)
(IFIP TC2 Manfred Paul award)
We are committed to reproducible research and to making our tools
freely available. Since our tools, techniques and benchmark
suites evolve and improve over time, there is no single all-inclusive
package. Instead, for major papers, we try to make available a snapshot
of the programs and benchmarks used in that paper, so that others can
reproduce or extend those experiments. All archives contain READMEs with
instructions on how to use the included tools or benchmarks.
Source Code and instructions for compiling and running our prototype
tool are available. READMEs contain instructions for fulfilling
dependencies, building, and running the tools; benchmark-specific
instructions are archived separately.
Code snapshot, October 2015
This October 2015 release is a snapshot of svn version 1688. It is not associated with
any particular paper or set of experiments, but is a bit more recent (and should have a few additional bugfixes)
than what's been available to date. It is intended as a stopgap as we move to a public (github?)
hosting solution.
GenProg Source v3.0.
This October 2013 release was used for the
experiments in the ASE 2013 paper.
GenProg Source v1.0.
This deprecated release was used for experiments appearing in ICSE 2009,
GECCO 2009, GECCO 2010, and TSE 2012.
Software
Evolution Library.
This common interface abstracts over multiple types of software objects
including abstract syntax trees parsed from source code, LLVM IR,
compiled assembler, and linked ELF binaries. Mutation and evaluation
methods are implemented on top of this interface. This release was used
for experiments appearing in ASPLOS 2014.
The manual is available
here.
A
master backup is available in case GitHub is not available.
Virtual Machine Images are available. Follow the instructions
carefully; you will likely have to replace the default outdated version of
GenProg that appears in these images with a more recent one.
GenProg ICSE 2012 virtual machine images with instructions for use with VirtualBox.
Includes binary and source for the version of GenProg 2.0 used for the ICSE 2012 experiments. Also supports recent versions of GenProg.
Experimental Data Sets are available. These benchmarks allow for the
replication of our research or the comparative evaluation of repair
techniques.
TSE 2015 (accepted) benchmarks, including two sets of defects in C programs, baseline results for three automated techniques, and virtual machine images. Note that these supercede (and partially extend) the defect scenarios from ICSE 2012, below.
105 GenProg ICSE 2012 benchmarks (program bugs), for use with the ICSE 2012 virtual machine images. Note that these are deprecated. We provide them for completeness but discourage their use in future work. Instead, we encourage you to refer to the more recent extension here, which includes important corrections.
ASPLOS 2013 Benchmark
Programs. These benchmarks were used to study GenProg approaches to
assembly and binary repairs in embedded systems.
ASE 2013 Experimental
Results. These results relate to the Adaptive Equality repair algorithm
that uses an approximation to program equivalence to reduce the search
and introduces on-line learning strategies to order test cases and repairs.
GPEM 2013
Experimental Results. These results relate to neutral mutants and
software mutational robustness, the notion that mutations (at various
levels of representation) can result in programs that still adhere to their
original specification and/or pass all test cases. Additional results for
higher-order
neutral mutants are also available.
ICSE 2012 Experimental Results, referencing the ICSE 2012 benchmarks (program bugs). This includes all of the output for the repair trials conducted on Amazon's EC2 cloud computing service. In addition, it lists every defect for which at least one repair was found, every (defect, random seed) pair that led to a repair, gives an internal description of the repair, and includes each produced repaired .c files that passed all tests. Note that we have improved and expanded upon our experimental setup for these experiments as part of our TSE 2015 article, linked above, and encourage the use of those baseline results for future analysis.
GenProg is primarily a collaboration between
Westley Weimer
at the University of Virginia,
Stephanie Forrest
at the University of New Mexico,
and
Claire Le Goues
at Carnegie Mellon University.
In addition, we are indebted to a number of
graduate and undergraduate researchers, without whom this project
would not be possible: