If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

DART and CUTE reviews

Page history last edited by Koushik Sen 14 years, 7 months ago

Add your reviews as comment to this page ...

Comments (21)

DART is a tool for automated testing, combining three techniques to achieve high coverage for all possible execution paths for a given program. It first extracts the external interface of a program to determine and simulate the "most general environment" it can run in (while this environment needs to be more constrained for some cases, e.g. for detecting security attacks based on a realistic model of an intruder). It then generates a test driver which repeatedly invokes the program with random input, and the execution of this input lets the tool perform dynamical analysis to keep track of the execution tree for the program, by collecting the predicate for each encountered branching structure. DART can then force the execution along an alternative path by solving an appropriate set of constraints and perform concrete execution with the solution. It uses a solver for linear integer arithmetic, and when the constraints fall outside this theory, the system falls back to the concrete values of variables to continue execution.

By combining concrete execution with symbolic analysis, DART overcomes the issue of false alarms resulting from the imprecision of the latter. Every error reported by the tool is sound, because it is based on a real execution of the program. The approach proves to be useful also with the use of library functions whose source code is not available for analysis. While static analysis cannot assume anything about their use, DART can benefit from inspecting the actual value returned by such a function. Meanwhile, the tool assumes that these calls have no side effect on the analyzed program, and the process of dealing with these issues by "inserting interface code" contrasts with the automated aspect of the method, and could maybe facilitated by analyzing the specification for the functions in question and by taking the appropriate measures in a less user-dependent way.

(DART continued)

The need for taking into account specifications for the tested functions themselves seems also essential to reveal only those bugs that are "interesting" to report.

As it is pointed out in the CUTE paper, the random generation of recursive data structures may not terminate in DART. Another deficiency of DART (addressed in CUTE) is the lack of constraint solving involving pointers.

CUTE, the Concolic Unit Testing Engine, is a tool building on the same work, that is, combining concrete and symbolic execution to explore alternative execution paths in order to reveal possible assertion violations. While using lp_solve like DART, for solving linear constraints, CUTE introduces optimizations by syntactically detecting a new sub-constraint which is the negation of another one, eliminating common sub-constraints, and solving in an incremental way, only considering those sub-constraints dependent to the newly introduced one. CUTE also handles predicates involving pointer variables, by inspecting the equivalence classes for these.

Although there are two ways to construct input data structures, the first one, which consists of using function calls, has the disadvantage of having to rely on these functions. Solving the class invariant for data structures should be the primary choice for this process.

CUTE doesn't perform exact pointer analysis, and its precision depends on the predicates encountered so far in the program. Trade-offs between performance -within the limits of tractability- and higher path coverage could be discussed in the context of this aspect of the tool.

Another possible investigation could involve the "fallback" process to concrete values. I wonder whether we could, for instance, make a better choice among two integer variables multiplied together, by the means of an analysis of their behavior so far. Performance comparison of strategies for choosing the variable to replace by its concrete value in a predicate could be considered.

DART is a very useful, practical, tool for testing software written in C. Since
it does not require additional effort from the programmer, it is even a less
costly method to do a "quick check" of the program rather than writing a few
test by hand just to test the general functionalities.

I think the main contribution of DART is the idea (and algorithms) of directed
search through all the possible execution path of the program. It is a really
interesting solution to the problem of random testing which has low probability
to hit all execution path.

The other great contribution is the working tool. I do not see any reason to not
use it in a project, if you are using C of course. Maybe it is not well adapted
to current software engineering, because C is no more used in any serious
project (except for system or embedded things).

I think the section of the paper on the implementation in C was not really
needed. It actually explains too much about interface extraction and random
generation, which are in my opinion not very complicated to implement, and does
not give any details on the main problem which is the directed search. I would
know more about how to translate from the theoretical model (with only two
statement) to the actual C syntax.

Maybe there is an other approach for the implementation of the tool. This would
have been to use a new language (like the one from the model) and implement the
algorithms for this language, and then writing a front end that translates C
into this language. The tool would be now easy to extend for additional language
by just writing a front end for each new language.

CUTE uses a similar idea as DART. It also generates random input (except for
pointers) and then does a directed search. It does not do automatic interface
extraction and requires more user intervention, but it also can test successfully
more complicated data structure.

Maybe the main contributions of this tool are the way they use approximation of
symbolic execution and the generation of dynamic data structures. The directed
search is still a very useful idea, but not a new one. CUTE also could learned
from a few flows of DART and improve these points.

Some of the important features of DART
* Complete Automation of testing - automatic interface extraction + random testing.
* Effective code coverage by using dynamic test generation techniques, also handles function calls (with unavailable source code) or cases where constraint solver is incapable by approximating them with concrete values
* Execution leading to errors are sound (not false alarms)

Criticism:
* Limited Completeness (does not handle pointers well)

Cute takes care of the pointers by replacing symbolic expressions of them with symbolic pointer variables. But there does not seem to be an insight on how to choose the bounded depth for the execution path.

Key Ideas/Points

* Fall back on concrete values for constraints that fall outside of available decision procedures
* Ability to simulate erronous library/system methods by marking them as black box external methods (which it doesn't execute but only gives random return values for)

Limitations
* Address locations aren't allowed to be symbolic , unlike for e.g EXE
* Limitation of this particular method is that it only finds memory faults or errors which a programmar has marked out as sanity checks via assert conditions
* More info about search heuristics and their effect on program execution time and program effectiveness would have been useful
* In the last part of the paper the authors note that they randomize malloc return values ? Why and isn't it a lib function, to be called instead of randomizing ? What would you achieve by randomizing malloc return values -- segfaults ?
* Not clear how they handle predicates they can't reason about in conditionals, do they keep trying random values until both branches are taken ? if nested predicates, this could be really slow ...
* More details on how it implements the nitty grittys .. like the following symbolic expressions across boundries .. how ?
* Unlike CUTE, doesn't do symbolic execution on complex types and their fields. [Although this wan't clear to me reading thepaper and only became clear on reading the CUTE paper ]

CUTE

* A lot of the work is similar to DART, so the comments particular to this paper are limited
* a Key addition to DART was the caching of past contraints and breaking up of constraints to help improve caching
* Another was adding support for pointers and recursive data structures

Questions
* What exactly is OPT2 -- common subconstraint elimination ? It could have been explained a little more
* how did CUTE find the memory leak ? Its not an error condition. Or was CUTE run on Valgrind running the test program ,which would have memory leak as error condition , but would have required valgrind's source to be instrumented ?
----------

* Is a general review of a large amount of C code possible to create trees and figure out statistically what is the best way to reach error conditions ? DFS or BFS or (probably) a combination and what are the heuristics that work best ? Or is there a pape on this already ?

DART provides an automated way to test functions written in C. The goal of DART is to run all possible paths through the function. DART performs 3 steps: extracting the the test interface, generating random inputs, and dynamic analyis of the program to generate a new input which will explore a new path. More specifically, the new input is created by negating the last predicate for the current path constraint. When the constraints cannot be solved, DART substitutes in the concrete value to continue the search. One drawback of DART is its inability to create data structures which are not DAGs, since pointers are only assigned NULL or to a new pointer. In addition, the theorm prover used seems to only solve a limited number of constraints. It would be interesting to see how a more powerful theorm prover would interact with the system. Lastly, I did not fully understand how the completeness flags worked with the system.

CUTE seems much like DART, but the symbolic model being used is more powerful (can contain pointers to already created nodes) and the theorm solver is both more powerful and is built to be efficient in this system. CUTE is also used to test more than one function, unlike DART. CUTE uses two symbolic maps, one to map memory locations to symbolic arithmatic and one to map memory to symbolic pointer expressions. One aspect of the paper that was helpful was the example of where CUTE wouldn’t work “correctly.” This showed exactly what approximations CUTE was using and how it traded off correctness vs. speed. But, I could not understand how the data structure testing section fit into CUTE and how exactly segmentation faults were caught by CUTE. More explanation into these tests would have been helpful.

The DART paper makes two main contributions: the use of static analysis to automatically generate a test driver that interfaces correctly with the program; and random testing combined with dynamic analysis to direct test inputs so that all conditional branch paths get exercised. DART combines concrete and symbolic execution for its directed test generation: It will start with a random input vector and then track the path the program takes with that input and generate symbolic constraints that will force a different path through the branch statements. The symbolic constraints can then be evaluated to form a new concrete input vector for the next run. The next run is checked to verify that the predicted alternate path is indeed the path taken. The combination of symbolic execution with concrete execution provides more accurate results with fewer false positives and more complete code coverage than either alone achieves. A program bug is found if DART encounters any assertion failures, segmentation faults, or times out indicating non-termination.

DART relies (mostly) on assertion failures to find bugs. However, it is not obvious to me that including all the appropriate assertions in the source code is an easy task. It would have been interesting to see what sort of assertions were included in the Needham-Schroeder code in order to find the bugs. Would it be more difficult to write assertions for a protocol for which the bug is not already known? Of course the task of coming up with appropriate assertions might force the programmer to think through any assumptions s/he may be making which is not a bad thing.

DART achieves branch coverage, but that is not the same as full state coverage and therefore it can not be used to definitively say a program is free of bugs. The AC-controller and Needham-Schroeder examples are both instances where running DART a single time would cover all branches, but not all states and program bugs would be missed.

CUTE is a follow-up paper to DART. It uses many of the same techniques as DART for concolic testing, but it can handle pointers in the inputs to the unit under test. In addition CUTE allows pre-conditions to be given by the programmer. An input that violates the pre-condition is not considered for testing. This reduces the number of false positives generated. One downside to CUTE is that the programmer must specify the pre-conditions and the interfaces of the functions under test.

When the input to a unit under test is a complex data structure CUTE takes steps to avoid testing the function with invalid inputs. This may make sense when testing for unit correctness, but it is not enough from a security standpoint. There is usually no guarantee (unless some formal verification has been done) that, during runtime, a function's input will be valid. It is important to test how the function will behave given invalid inputs; if a function fails, will it always "fail safe"?

DART is a tool for automated testing of a program given only its source code and some guidelines from the programmer as to what to test. It is a reaction to unit testing, which is effective if done properly but requires a huge amount of work from the programmer; the insight of the DART team is that testing is more likely to be used if it requires as little time and effort to perform as possible. DART determines every way in which a program may interact with the external world and a random input for it. It then observes the program as it runs with this input, trying to determine the constraints on the input that will lead to the same execution path. Once it has those constraints, it is able to explore the space of possible execution paths of the program by negating one constraint at a time and repeating the process. DART's underlying constraint solver can only handle linear constraints, but it works around this by substituting observed values for subexpressions it cannot evaluate. If this does not "work" in the sense that it does not enable DART to find a new execution path, DART tries to find one by generating a new random input. In practice, DART's code coverage is quite good (although not perfect) and it is effective at finding crashes, infinite loops, and assertion failures. It is clearly a superior alternative to random testing, which is quite unlikely to find bugs if the inputs that generate them are distributed very sparsely in the input space; it is also better than static analysis tools because it does not generate a host of false positives.

CUTE builds on the DART work and improves on it in several respects. CUTE has more sophisticated handling of pointers; it is able to reason about them using constraints involving their equality or inequality. (DART, it should be noted, cannot generate two pointers to the same memory location within the same input.) It is also able to ensure data structure invariants and preconditions for functions; it is able to do this either by generating random data structures that conform to a repOk() function, or by constructing a random sequence of calls to data structure manipulation functions. These two features together mean that CUTE is better suited for checking programs involving complicated data structures than DART, which will find itself relying on random input generation in such situations. CUTE also introduces several optimizations for the constraint solving process. One optimization discards many unsatisfiable constraints before they reach the solver; since the solver can take a much longer time to report unsatisfiability than to solve a constraint, this is a very useful optimization. Another uses the structure of the problem to speed up the process: CUTE, like DART, only negates one constraint out of the set of path constraints at a time, so only that constraint and those that involve the same variables need to be solved again. Overall, CUTE is a useful tool in the fight against software faults that eliminates several of DART's shortcomings; it is somewhat less automatic than DART because of the configuration needed to handle data structure invariants, but one assumes that this configuration can be ignored and CUTE will fall back to random input generation, like DART.

The authors present a testing framework able to generate test cases for programs. First of all, the external interfaces of the program are extracted. Then the program is instrumented so that the paths taken are recorded. Finally, the program is tested by generating a set of inputs and running the program on it. The trace containing the paths taken by the program is analyzed and a new set of inputs is generated. The new inputs are forged so that the execution of the program will explore a new path. The authors rely on forward symbolic execution as defined by P. Cousot to collect the path constraints along the trace. These are solved by standard SMT solver. Contrarily to DART, CUTE handles linked data structures well. However, it requires the user to specify an input memory map.

The advantage of this technique are:
- Coverage of all paths (although this is limited in practice to path constraints in Presburger arithmetic).
- No false positive because the testing is done by running the program itself.
- No need for inaccurate pointer analysis. All the reasoning is done with memory location resolved at run-time. My own experience has been that pointer analysis is not only inaccurate but also does not scale well.
- This technique seems to scale better than the completely static approaches with comparable accuracy.
- It can handle library calls as black boxes simply by executing them.

The weaknesses are:
- Exploring exhaustively all paths can be long (exponential blow up)

Possible further work:
- What about using W1S1 (Weak monadic second order logic) solvers like MONA to solve the constraints on linked data structures.

The CUTE and DART papers cover work done on automatic test generation. CUTE is somewhat an improvement on DART. The most interesting points of both softwares are the following :

- They combine concrete and symbolic testing of a program (coining the new term concolic testing). On a given program, they perform a static analysis (parsing) to insert symbolic testing code at key parts of the. This symbolic testing is very useful on testing conditional branches (bound by different constraints). These constraints are very often linear, and they are solved using lp_solve.
- Another interesting feature by CUTE is the ability to solve not only linear constraints, but also pointer constraints, by having three possible domain values for a pointer : NULL, Non-NULL, and a representation of the internal structures.
- CUTE optimises a lot of constraint solving by detecting dependencies between different constraints, and eliminating certain execution paths by detecting inconsistencies on these.
- CUTE also performs testing of data structures, and uses some ideas from Korat (such as the repOk method). For testing data-structures, CUTE generates inputs which pass the repOk test.

Concolic testing is a very interested testing mechanism, because it provides a good trade-off between the concrete testing methods which need to generate all possible inputs, and symbolic testing, which is computationally expensive with regards to solving all constraints. The assumption that programming libraries usually have functions that verify consistency is valid, though it cannot, unfortunately be forced. A possible way of forcing this is to provide a tool which checks if such methods are implemented.

DART is a tool that automatically extracts the interface of a program, generate test driver for the interface and dynamically analyzes the behavior of the program to systematically direct the execution along alternative program paths using random testing and automatic generation.

DART is able to find errors that static analysis tools cannot. However, DART cannot prove the soundness of a system like static analysis tools. DART does not address the pointer analysis problem. The paper does not address the issue of DART’s scalability since if the depth of the execution paths of a program are long, will DART be able to test all the paths?

CUTE uses symbolic execution for pointer analysis besides concrete values, which is an improvement over DART. To do that, CUTE separates pointer constraints from integer constraints and keeps the pointer constrains simple to make the symbolic execution light-weight.

CUTE uses bounded depth first search to avoid infinite or very long execution paths, however, the CUTE paper does not show the scalability of this approach. Both DART does not seem to handle the case of infinite loops.

In DART, the authors presented an automated technique for unit testing of softwares. The key contributions of this papers are:
1.Automating the unit testing framework. It makes use of the static program analysis to extract the interface of the program, automatic generation of the test driver for this interface and simulates general environment by random testing, lastly it guides the random testing as described in next point.
2.Directed random testing, which makes use of the last conditional that not being explored in the previous run to generate input vector for the next run. This is a nice search strategy to direct random testing to guarantee path coverage.
3.The other advantage is in its handling of library (by concretely executing) and external (by capturing their nondeterministic behavior) functions.
4.It presents a nice combination of concrete and symbolic execution to reduce the false alarms and also falls back to concrete execution in case its underlying theorem solver fails to reason certain path-constraints.

Limitations:
1.The biggest limitation that I feel is the inability of DART to handle loops. It can instrument only conditionals and assignment statements. Loops often occur in real code-bases. It is not clear as well what techniques they employ in such cases. I think this is an important limiting factor for the practical applicability of this tool.
2.DART relies on user's assertions for testing functionality of the tool other than segfaults. I think declarative annotations like employed in Korat is more preferable than to write an additional C code for testing functionality.

CUTE also serves the purpose of unit testing of software and has similar approach w.r.t. DART. The key difference is CUTE maintains two symbolic states at the runtime, one maps memory locations to symbolic arithmetic expressions, and other maps memory locations to symbolic pointer expressions. CUTE excels over DART in having pointers to already created objects (instead of NULL or new object), which was absent in DART. It also avoids the expensive symbolic execution over pointers and deals in simpler pointer constraints (relations: == or !=) but maintains a precise relationship between the pointers from the logical input map. It also optimizes the constraints solving through incremental solving. The major downside is the inability to handling loops, which is present in DART as well.

DART is a system for automatically finding bugs in software. It detects interfaces to functions (including argument data types) using static analysis, and creates wrapper code to simulate "general environment" for the tested program to run in. DART's dynamic component runs the tested code and initially assigns random values to variables and the return values of external functions. In contrast with standard fuzzing techniques, DART has several advantages: the directed search in DART, enabled by the symbolic execution engine, allows for much greater code coverage. The dynamic component in DART helps with pointer analysis and other limitations of purely static analysis. One important feature of the DART system compared to other Unit Testing techniques is that developers can use DART without the need to write any test driver code. Additionally, DART is not tied to any specific constraint solver, so improvements in constraint solver technology can be easily incorporated into DART. A limitation of DART is that it makes several assumptions about the tested code which may not be valid - for example, that external functions have no side effects, and program variables are properly initialized.

CUTE builds on DART and helps with DART's limited support for pointer inputs by generating and solving what they call "approximate pointer constraints". The technique, which helps with the alias analysis problem, uses an execution of the code to generate path constraints which are then solved to create logical input maps -- these input maps allow CUTE to treat arbitrary pointer locations as symbolic variables. CUTE also adds optimizations to speed up constraint solving, and additional techniques to deal with the general case of data structures as inputs. Most of the other underlying ideas in CUTE are based on DART.

DART is an approach that extracts the interface of a program and then combines randomized testing with symbolic execution in order to achieve better code coverage. The main benefit of interface extraction is that the user does not need to specify what should be tested. The approach is sound in that all errors found correspond to reachable failed abort statements, and it is complete in the sense that if it halts without printing a bug found, then there is no reachable failed abort statement. It avoids decidability concerns by not guaranteeing termination.

I was worried that in real programs it may not be feasible to exhaustively test all possible paths, especially if only linear constraints are checked. If non-termination is a common result of running the program, its soundness and completeness properties may not be useful.

CUTE uses similar techniques, but lets the user specify the component to be tested and generates unit tests. Preconditions can be specified by either building up the data type with a user-specified call sequence, or including defensive validity checks in the code. CUTE also includes additional optimizations for constraint solving.

I think that the biggest drawback of CUTE is the extra requirement on user specification. Since it does not run the entire program, it is important to have preconditions on the functions in order to generate valid test inputs. I would have liked to see a more declarative specification of the preconditions, as in Korat, rather than changing the code to check preconditions (or build data structures in a specific way).

This 2005 work by Godefroid, Klarlund, and Sen describes DART, an early system combining symbolic and concrete execution to generate program inputs that provide good branch coverage. As a white-box extension of random testing, it begins with random inputs and monitors the result of each predicate that is evaluated; each of these represents a path constraint, a predicate regarding the input that must hold for that program path to be taken. By inverting one of these predicates and solving the constraints, the program can be forced down a different path; eventually all feasible program paths are traversed in this manner. Unlike static analysis tools, which also typically consider all paths, any errors detected by DART are real errors and not false positives.

As compared to later works, DART used a less powerful solver and relied more heavily on concretization (conversion of symbolic values to concrete values), limiting scalability and the number of issues detected. In particular, all pointer dereferences must be concretized due to a lack of ability to symbolically reason about pointers.

CUTE's most important contribution overt DART is the ability to reason about pointers. It adds a system of logical addresses, a set of simple pointer equality and inequality constraints, and a pointer constraint solver based on an equivalence graph. This allows it to test new type of programs such as data structures libraries.

Later papers such as "EXE: Automatically Generating Inputs of Death" in 2006 improved on the concrete/symbolic execution methodology primarily by substituting a more powerful solver based on reduction to the boolean satisfiability problem followed by using a state-of-the-art SAT solvers, further improving scability.

You don't have permission to comment on this page.

To join this workspace, request access.

Already have an account? Log in!

Loading…

FrontPage

Schedule

Korat Reviews

DART and CUTE reviews

CUTE-Homework

RaceFuzzer Homework

Loading…

DART and CUTE reviews

DART and CUTE reviews

Page Tools

Insert links

Comments (21)

Ali Sinan Köksal said

Ali Sinan Köksal said

Ali Sinan Köksal said

Régis Blanc said

Régis Blanc said

Himanshu Sharma said

Devdatta said

Devdatta said

Shaon Barman said

Cynthia said

Cynthia said

Seth Fowler said

Seth Fowler said

tbh said

manojo said

Dai Bui said

Sagar Jain said

Sagar Jain said

Noah Johnson-Walls said

(account deleted) said

Derrick Coetzee said

Join this workspace

Navigator

SideBar

Recent Activity