Home

Learn more

Buy the book

For readers

Publishers' sites

Paperback · 423 pages
ISBN 978-0-12-374515-6 [US]
ISBN 978-3-89864-620-8 [DE]
     

Project 1: Simplifying Input

In this project, you write a simple tool that simplifies failure-inducing input. The idea is to use delta debugging to find a minimal input that causes an XML parser to fail.

This project requires you to program in Python. Learning Python is not hard, though; and even if you include the time it takes you to learn Python, you will still complete your work faster than in most other languages.

Read this first

The failure

XMLProc is a small XML parser written in Python. It has a small defect. To exhibit the defect, follow these steps:
  1. Download the XMLProc parser. It should unpack into a directory called xmlproc.
  2. In the xmlproc dictionary, the file xpcmd.py is a command-line interface to the XML parser. To have the parser parse the XML file demo/nstest1.xml, type
    $ cd xmlproc
    $ python xpcmd.py demo/nstest1.xml
    xmlproc version 0.70
    
    Parsing 'demo/nstest1.xml'
    Parse complete, 0 error(s) and 0 warning(s)
    $ _
    
  3. The input file demo/urls.xml causes the parser to fail:
    $ python xpcmd.py demo/urls.xml
    xmlproc version 0.70
    
    Traceback (most recent call last):
      File "xpcmd.py", line 112, in ?
        p.parse_resource(sysid) [...]
      File "xmlproc/xml/parsers/xmlproc/xmlproc.py", line 401, in parse_charref
        if digs==None: return
    UnboundLocalError: local variable 'digs' referenced before assignment
    $ _
    

Your task

Use Delta Debugging to simplify the failure-inducing input. Proceed in eight steps:

Step 1: Write a testing function.

Start with a Python program that invokes the parser, as described above, and assesses the result. To avoid complications, you may wish to conduct this work in the xmlproc dictionary.

To invoke the parser, you have two options:

  • Invoke the parser as a separate program. Use Python's os module to execute the parser. You can use fork(), execv(), and wait() functions, as in Figure 5.11; however, for this particular task, system() or popen() are far easier to use. Python's built-in File Objects are useful in communicating with the parser via files.
  • Invoke the parser directly from Python. Take a look at the xpcmd.py source to see how this works. The essential lines are
    from xml.parsers.xmlproc import xmlproc
    	
    app = xmlproc.Application()
    p = xmlproc.XMLProcessor()
    p.parse_resource(file)
    In order to catch the exception thrown by the parser, be sure to read and understand exception handling in Python.
Take care to differentiate three outcomes:
  • The parser completely parses the file (PASS), without errors or warnings;
  • The parser fails as in the original failure (FAIL)
  • The parser has another outcome, in particular syntax errors (UNRESOLVED)

Step 2: Write a splitting function.

You can start with the split() function as described in the book (Figure 5.9).

In the long term, you may want to split the input along token delimiters, or even use syntactic simplification (Section 5.8.3).

Step 3: Attach Delta Debugging.

To complete Delta Debugging, you need an implementation of the listminus() function as described in the book (Figure 5.10). The listsets module gives you exactly this.

Finally, you need the ddmin() function from Figure 5.7. This ddmin module even comes with a self-contained test function.

Step 4: Choose a representation.

How do you represent the configuration that is to be minimized?
  • If you want to simplify input, each delta might be a single character—thus, the string "defect" becomes the list ['d', 'e', 'f', 'e', 'c', 't']. However, this will cause trouble as you cannot distinguish between the same character at different locations. For instance, listminus(['d', 'e', 'f', 'e', 'c', 't'], ['e']) = ['d', 'f', 'c', 't']—which is not what you want.
  • A better solution is to store characters with their indices, as in [('d', 1), ('e', 2), ('f', 3), ('e', 4), ('c', 5), ('t', 6)]—this way, each circumstance can be uniquely identified.
  • For a more general solution, you may wish to set up Circumstance or Delta classes and to store lists of these objects.

Step 5: Run it.

What is the simplified failure-inducing input in demo/urls.xml? Record the number of tests that Delta Debugging takes.

Step 6: Document it.

Be sure to have docstrings for every function which describe its purpose.

Have a README file that describes how to invoke the simplification program.

Step 7 (optional): Make it more general.

Convert your testing function into Python's unittest framework.

Generalize the delta debugging function such that it can be parameterized with an arbitrary unit test and an arbitrary input—that is, it becomes independent from XMLProc and can easily be adapted to other input minimization tasks. (As examples of such tasks, see the paper Simplifying and Isolating Failure-Inducing Input.) Eventually, you will have a general stand-alone framework; to have it solve the XMLProc problem, instantiate it with the XMLProc unit test and the failing input.

Step 8 (optional): Improve efficiency.

In the book, Section 5.8.3 sketches how to implement syntactic simplification. Using a Python XML parser such as Expat, split the input along the XML structure to narrow down the failure cause more rapidly. Validate the success by counting the number of tests.

Get the book at Amazon.com · Amazon.de
Comments? Write to Andreas Zeller <zeller@whyprogramsfail.com>.