Fuzzing is one of the most powerful strategies for identifying security issues in real-world software. Unfortunately, it also offers fairly shallow coverage: it is impractical to exhaustively cycle through all possible inputs, so even something as simple as setting three separate bytes to a specific value to reach a chunk of unsafe code can be an insurmountable obstacle to a typical fuzzer.
There have been numerous attempts to solve this problem by augmenting the process with additional information about the behavior of the tested code. These techniques can be divided into three broad groups:
- Simple coverage maximization. This approach boils down to trying to isolate initial test cases that offer diverse code coverage in the targeted application - and them fuzzing them using conventional techniques.
- Control flow analysis. A more sophisticated technique that leverages instrumented binaries to focus the fuzzing efforts on mutations that generate distinctive sequences of conditional branches within the instrumented binary.
- Static analysis. An approach that attempts to reason about potentially interesting states within the tested program and then make educated guesses about the input values that could possibly trigger them.
The first technique is surprisingly powerful when used to pre-select initial test cases from a massive corpus of valid data - say, the result of a large-scale web crawl. Unfortunately, coverage measurements provide only a very simplistic view of the internal state of the program, making them less suited for creatively guiding the fuzzing process later on.
The latter two techniques are extremely promising in experimental settings. That said, in real-world applications, they are not only very slow, but frequently lead to irreducible complexity: most of the high-value targets will have a vast number of internal states and possible execution paths, and deciding which ones are interesting and substantially different from the rest is an extremely difficult challenge that, if not solved, usually causes the "smart" fuzzer to perform no better than a traditional one.
American fuzzy lop tries to find a reasonable middle ground between sophistication and practical utility. In essence, it's a fuzzer that relies on a form of edge coverage measurements to detect subtle, local-scale changes to program control flow without having to perform complex global-scale comparisons between series of long and winding execution traces - a common failure point for similar tools.
In almost-plain English, the fuzzer does this by instrumenting every effective line of C or C++ code (or any other GCC-supported language) to record a tuple in the following format:
[ID of current code location], [ID of previously-executed code location]
The ordering information for tuples is discarded; the primary signal used by the fuzzer is the appearance of a previously-unseen tuple in the output dataset; this is also coupled with coarse magnitude count for tuple hit rate. This method combines the self-limiting nature of simple coverage measurements with the sensitivity of control flow analysis. It detects both explicit conditional branches, and indirect variations in the behavior of the tested app.
The output from this instrumentation is used as a part of a simple, vaguely "genetic" algorithm:
- Load user-supplied initial test cases into the queue,
- Take input file from the queue,
- Repeatedly mutate the file using a balanced variety of traditional fuzzing strategies (see later),
- If any of the generated mutations resulted in a new tuple being recorded by the instrumentation, add mutated output as a new entry in the queue.
- Go to 2.
The discovered test cases are also periodically culled to eliminate ones that have been made obsolete by more inclusive finds discovered later in the fuzzing process. Because of this, the fuzzer is useful not only for identifying crashes, but is exceptionally effective at turning a single valid input file into a reasonably-sized corpus of interesting test cases that can be manually investigated for non-crashing problems, handed over to valgrind, or used to stress-test applications that are harder to instrument or too slow to fuzz efficiently. In particular, it can be extremely useful for generating small test sets that may be programatically or manually examined for anomalies in a browser environment.
(For a quick partial demo, click here.)
Of course, there are countless "smart" fuzzer designs that look good on paper, but fail in real-world applications. I tried to make sure that this is not the case here: for example, afl can easily tackle security-relevant and tough targets such as gzip, xz, lzo, libjpeg, libpng, giflib, libtiff, or webp - all with absolutely no fine-tuning and while running at blazing speeds. The control flow information is also extremely useful for accurately de-duping crashes, so the tool does that for you.
In fact, I spent some time running it on a single machine against libjpeg, giflib, and libpng - some of the most robust best-tested image parsing libraries out there. So far, the tool found:
- CVE-2013-6629: JPEG SOS component uninitialized memory disclosure in jpeg6b and libjpeg-turbo,
- CVE-2013-6630: JPEG DHT uninitialized memory disclosure in libjpeg-turbo,
- MSRC 0380191: A separate JPEG DHT uninitialized memory disclosure in Internet Explorer,
- CVE-2014-1564: Uninitialized memory disclosure via GIF images in Firefox,
- CVE-2014-1580: Uninitialized memory disclosure via <canvas> in Firefox,
- Chromium bug #398235, Mozilla bug #1050342: Probable library-related JPEG security issues in Chrome and Firefox (pending),
- PNG zlib API misuse bug in MSIE (DoS-only),
- Several browser-crashing images in WebKit browsers (DoS-only).
More is probably to come. In other words, you should probably try it out. The most significant limitation today is that the current fuzzing strategies are optimized for binary files; the fuzzer does:
- Walking bitflips - 1, 2, and 4 bits,
- Walking byte flips - 1, 2, and 4 bytes,
- Walking addition and subtraction of small integers - byte, word, dword (both endians),
- Walking insertion of interesting integers (-1, MAX_INT, etc) - byte, word, dword (both endians),
- Random stacked flips, arithmetics, block cloning, insertion, deletion, etc,
- Random splicing of synthetized test cases - pretty unique!
All these strategies have been specifically selected for an optimal balance between fuzzing cost and yields measured in terms of the number of discovered execution paths with binary formats; for highly-redundant text-based formats such as HTML or XML, syntax-aware strategies (template- or ABNF-based) will obviously yield better results. Plugging them into AFL would not be hard, but requires work.
0 nhận xét:
Đăng nhận xét