The Problem

Software engineering is expensive, summing to over one half of one percent of the US GDP annually. Software maintenance accounts for over two-thirds of that life cycle cost, and a key aspect of maintenance is fixing bugs in existing programs. Unfortunately, the number of reported bugs far outstrips available development resources. It is common for a popular project to have hundreds of new bug reports filed every day.

Software maintenance is expensive. GenProg reduces software maintenance costs by automatically producing patches (repairs) for program defects.

Our Approach

Many bugs can be fixed with just a few changes to a program's source code. Human repairs often involve inserting new code and deleting or moving existing code. GenProg uses those same building blocks to search for repairs automatically.

GenProg uses genetic programming to search for repairs. Our evolutionary computation represents candidate repairs as sequences of edits to software source code. Each candidate in a large population is applied to the original program to produce a new program, which is evaluated using test suites. Those candidates that pass more tests are said to have a higher fitness and are iteratively subjected to computational analogs of the biological processes of mutation and crossover. This process terminates when a candidate repair is found that retains all required functionality and fixes the bug. GenProg does not require special code annotations or formal specifications, and applies to unmodified legacy software.

Our Results

GenProg has repaired many kinds of defects, including infinite loops, segmentation violations, heap buffer overflows, non-overflow denial-of-service vulnerabilities, stack buffer overflows, format string vulnerabilities, integer overflows — as well as the ubiquitous "my program produces the incorrect output" bug. While we evaluate primarily on C programs, GenProg can also repair programs in other languages, such as x86 or ARM assembly files or ELF binaries.

GenProg repaired 55 out of 105 bugs for $8 each in a recent systematic study that included multiple programs, totaling over 5 million lines of code, and supported by over 10,000 test cases.

GenProg has won multiple academic best paper awards (ICSE 2009, GECCO 2009, SBST 2009). In addition, this work was awarded the IFIP TC2 Manfred Paul Award "for excellence in software: theory and practice." Finally, GenProg won gold (in 2009) and bronze (in 2012) honors at the "Humies" Awards for human-competitive results produced by genetic and evolutionary computation.


This Summer 2012 teaser video highlights GenProg repairing 55 out of 105 bugs for $8 each:

This Summer 2012 teaser video highlights a human study of GenProg patch maintainability:

This December 2011 video demonstrates GenProg running on a smartphone embedded devices and discusses the associated challenges:

Research Papers

We recommend that you start with this paper:

Note that we have released new results on an extended and improved version of that dataset and new baseline results as part of the following:

These papers provide broad overviews or high-level arguments:

This paper describes the relationship between human developers and GenProg-generated patches:

This paper investigates properties of software that may relate to GenProg's success:

  • Eric Schulte, Zachary P. Fry, Ethan Fast, Westley Weimer, Stephanie Forrest: Software Mutational Robustness. Journ. Genetic Programming and Evolvable Machines 2013: July 28

These papers highlight the application of GenProg to new subject areas and domains:

These papers describe optimizations of or parameter choices for GenProg:

Finally, these papers detail older versions of, or historical antecedents to, the GenProg system:

Data Sets & Experiments

We are committed to reproducible research and to making our tools freely available. Since our tools, techniques and benchmark suites evolve and improve over time, there is no single all-inclusive package. Instead, for major papers, we try to make available a snapshot of the programs and benchmarks used in that paper, so that others can reproduce or extend those experiments. All archives contain READMEs with instructions on how to use the included tools or benchmarks.

Source Code and instructions for compiling and running our prototype tool are available. READMEs contain instructions for fulfilling dependencies, building, and running the tools; benchmark-specific instructions are archived separately.

Virtual Machine Images are available. Follow the instructions carefully; you will likely have to replace the default outdated version of GenProg that appears in these images with a more recent one.

Experimental Data Sets are available. These benchmarks allow for the replication of our research or the comparative evaluation of repair techniques.

Experimental Results are available. These are archives of the raw output of our tool (e.g., debug logs, repairs found, etc.) for various experiments.

  • TSE 2015 Main Website, including links to baseline results for three generate-and-validate repair techniques.
  • ASPLOS 2014 Experimental Results. These experimental results describe using GenProg-like approaches to reduce the power consumption of software.
  • ASPLOS 2013 Experimental Results. These experimental results relate to assembly and binary repairs of embedded systems.
  • ASE 2013 Experimental Results. These results relate to the Adaptive Equality repair algorithm that uses an approximation to program equivalence to reduce the search and introduces on-line learning strategies to order test cases and repairs.
  • GPEM 2013 Experimental Results. These results relate to neutral mutants and software mutational robustness, the notion that mutations (at various levels of representation) can result in programs that still adhere to their original specification and/or pass all test cases. Additional results for higher-order neutral mutants are also available.
  • GECCO 2012 Experimental Results, including repair results for various genetic algorithm parameter values.
  • ICSE 2012 Experimental Results, referencing the ICSE 2012 benchmarks (program bugs). This includes all of the output for the repair trials conducted on Amazon's EC2 cloud computing service. In addition, it lists every defect for which at least one repair was found, every (defect, random seed) pair that led to a repair, gives an internal description of the repair, and includes each produced repaired .c files that passed all tests. Note that we have improved and expanded upon our experimental setup for these experiments as part of our TSE 2015 article, linked above, and encourage the use of those baseline results for future analysis.
  • ICSE and GECCO 2009 experimental results.

Our human study data sets are available here:

Case Studies are available. These files allow for the replication of our research or the comparative evaluation of repair techniques.


GenProg is primarily a collaboration between Westley Weimer at the University of Virginia, Stephanie Forrest at the University of New Mexico, and Claire Le Goues at Carnegie Mellon University.

In addition, we are indebted to a number of graduate and undergraduate researchers, without whom this project would not be possible:

Current graduate researchers:

Past researchers:

  • Dr. Eric Schulte, moved on to GrammaTech
  • Dr. ThanhVu Nguyen, moved on to the University of Maryland
  • Dr. Zachary P. Fry, moved on to GrammaTech
  • Brianna Satchell, moved on to Microsoft
  • Bryan Alex Landau, moved on to Microsoft
  • Ethan Fast, moved on to Stanford
  • Jonathan DiLorenzo, moved on to Cornell
  • Michael Dewey-Vogt, University of Virginia
  • Nicholas Jalbert, moved on to Berkeley
  • Nicholas Modly, University of Virginia
  • Shirley Park, moved on to Carnegie Mellon