Software engineering is expensive, summing to over one half of
one percent of the US GDP annually. Software maintenance accounts for
over two-thirds of that life cycle cost, and a key aspect of maintenance is
fixing bugs in existing programs. Unfortunately, the number of reported
bugs far outstrips available development resources. It is common for
a popular project to have hundreds of new bug reports filed every day.
Software maintenance is expensive.
GenProg reduces software maintenance costs by
automatically producing patches (repairs) for program defects.
Many bugs can be fixed with just a few changes to a program's source code.
Human repairs often involve inserting new code and deleting
or moving existing code. GenProg uses those same building blocks to search
for repairs automatically.
GenProg uses genetic programming to search for repairs.
Our evolutionary computation represents candidate repairs as sequences of
edits to software source code. Each candidate in a large population is applied
to the original program to produce a new program, which is evaluated using test
suites. Those candidates that pass more tests are said to have a
higher fitness and are iteratively subjected to computational
analogs of the biological processes of mutation and
crossover. This process terminates when a candidate repair is found
that retains all required functionality and fixes the bug. GenProg does not
require special code annotations or formal specifications, and applies to
unmodified legacy software.
GenProg has repaired many kinds of defects, including infinite loops,
segmentation violations, heap buffer overflows, non-overflow
denial-of-service vulnerabilities, stack buffer overflows, format string
vulnerabilities, integer overflows — as well as the ubiquitous
"my program produces the incorrect output" bug. While
we evaluate primarily on C programs, GenProg can also repair programs
in other languages, such as x86 or ARM assembly files or ELF binaries.
GenProg repaired 55 out of 105 bugs for $8 each in a recent
systematic study that included multiple programs, totaling over 5 million lines of code,
and supported by over 10,000 test cases.
GenProg has won multiple academic best paper awards (ICSE 2009,
GECCO 2009, SBST 2009). In addition, this work was awarded the
Manfred Paul Award "for excellence in software: theory and practice."
Finally, GenProg won gold (in 2009) and bronze (in 2012) honors at the
Awards for human-competitive results produced by genetic and
We are committed to reproducible research and to making our tools
freely available. Since our tools, techniques and benchmark
suites evolve and improve over time, there is no single all-inclusive
package. Instead, for major papers, we try to make available a snapshot
of the programs and benchmarks used in that paper, so that others can
reproduce or extend those experiments. All archives contain READMEs with
instructions on how to use the included tools or benchmarks.
Source Code and instructions for compiling and running our prototype
tool are available. READMEs contain instructions for fulfilling
dependencies, building, and running the tools; benchmark-specific
instructions are archived separately.
Code snapshot, October 2015
This October 2015 release is a snapshot of svn version 1688. It is not associated with
any particular paper or set of experiments, but is a bit more recent (and should have a few additional bugfixes)
than what's been available to date. It is intended as a stopgap as we move to a public (github?)
GenProg Source v3.0.
This October 2013 release was used for the
experiments in the ASE 2013 paper.
Experimental Data Sets are available. These benchmarks allow for the
replication of our research or the comparative evaluation of repair
TSE 2015 (accepted) benchmarks, including two sets of defects in C programs, baseline results for three automated techniques, and virtual machine images. Note that these supercede (and partially extend) the defect scenarios from ICSE 2012, below.
105 GenProg ICSE 2012 benchmarks (program bugs), for use with the ICSE 2012 virtual machine images. Note that these are deprecated. We provide them for completeness but discourage their use in future work. Instead, we encourage you to refer to the more recent extension here, which includes important corrections.
ASPLOS 2013 Benchmark
Programs. These benchmarks were used to study GenProg approaches to
assembly and binary repairs in embedded systems.
ASE 2013 Experimental
Results. These results relate to the Adaptive Equality repair algorithm
that uses an approximation to program equivalence to reduce the search
and introduces on-line learning strategies to order test cases and repairs.
Experimental Results. These results relate to neutral mutants and
software mutational robustness, the notion that mutations (at various
levels of representation) can result in programs that still adhere to their
original specification and/or pass all test cases. Additional results for
neutral mutants are also available.
ICSE 2012 Experimental Results, referencing the ICSE 2012 benchmarks (program bugs). This includes all of the output for the repair trials conducted on Amazon's EC2 cloud computing service. In addition, it lists every defect for which at least one repair was found, every (defect, random seed) pair that led to a repair, gives an internal description of the repair, and includes each produced repaired .c files that passed all tests. Note that we have improved and expanded upon our experimental setup for these experiments as part of our TSE 2015 article, linked above, and encourage the use of those baseline results for future analysis.