Build a simple fuzzer with libFuzzer

libFuzzer is coverage-guided, evolutionary fuzzing engine. And it’s a wonderful tool to help with bug hunting.

To quote the libFuzzer page:

LibFuzzer is linked with the library under test, and feeds fuzzed inputs to the library via a specific fuzzing entrypoint (aka “target function”); the fuzzer then tracks which areas of the code are reached, and generates mutations on the corpus of input data in order to maximize the code coverage. The code coverage information for libFuzzer is provided by LLVM’s SanitizerCoverage instrumentation.

So basically once it gets linked with your library it provides an easy way to feed mutated input to a target function at each iteration and the mutation is done in a way that tries to maximize the code coverage. All clear.

What we want to accomplish here is to write a simple fuzzer for libclamav, the library at the core of ClamAV antivirus.

So the first step is understanding how we’re going to link libFuzzer to libclamav when building ClamAV and its components.

To quote libFuzzer documentation:

If modifying CFLAGS of a large project, which also compiles executables requiring their own main symbol, it may be desirable to request just the instrumentation without linking:

clang -fsanitize=fuzzer-no-link mytarget.c

And this is exactly our case.

So these are the steps:

1. tar zxvf clamav-0.104.2.tar.gz && cd clamav-0.104.2
2. mkdir build && build
3. CC=clang CXX=clang++ CFLAGS="-fsanitize=fuzzer-no-link,address" cmake ../
4. cmake --build .

The important points here are the the choice of the compiler, which has to be of course clang and setting the CFLAGS in order to add to the code the fuzzing instrumentation and ASAN. The fuzzer-no-link specifies that we add the fuzzing instrumentation to the components being compiled, but we will provide the entry point in another application, which will be the fuzzer itself.

In order to fuzz a function of our choice the entry point of the fuzzer application needs to be declared as the following one:

int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  DoSomethingInterestingWithMyAPI(Data, Size);
  return 0;  // Non-zero return values are reserved for future use.
}

Data and Size are the current mutated input derived from the corpus (or in case of a missing corpus it will be generated by libFuzzer itself) and its size in bytes.

So the question now is: how do we provide such data to libclamav? We choose a target function of the library, which in this case is:

int cl_scanfile(
    const char *filename,
    const char **virname,
    unsigned long int *scanned,
    const struct cl_engine *engine,
    struct cl_scan_options *options);

This function receives a filename to be scanned, a pointer to virname where the virus name will be stored in case the supplied file has been identified as a malicious one, the scanned data size til now (actually optional parameter, it can be NULL), the engine which is the AV engine struct and options which is a struct containing a series of options that control the behavior of the engine.

Lets first write a function which initializes the engine and returns a pointer to it:

static const char *byteCodePath = "/var/lib/clamav/bytecode.cvd";
struct cl_engine *init_clam() {
    unsigned int signo;
    cl_error_t status = cl_init(CL_INIT_DEFAULT);
    if (status != CL_SUCCESS) {
        fprintf(stderr, "cl_init error: %s\n", cl_strerror(status));
        return NULL;
    }
    struct cl_engine *engine = cl_engine_new();
    if (engine == NULL) {
        fprintf(stderr, "cl_engine_new error\n");
        return NULL;
    }
    status = cl_load(byteCodePath, engine, &signo, CL_DB_BYTECODE);
    if (status != CL_SUCCESS) {
        fprintf(stderr, "cl_load error: %s\n", cl_strerror(status));
        goto cleanup_engine_with_error;
    }
    status = cl_engine_compile(engine);
    if (status != CL_SUCCESS) {
        fprintf(stderr, "cl_engine_compile: %s\n", cl_strerror(status));
        goto cleanup_engine_with_error;
    }
    return engine;
cleanup_engine_with_error:
    status = cl_engine_free(engine);
    if (status != CL_SUCCESS) {
        fprintf(stderr, "cl_engine_free: %s\n", cl_strerror(status));
    }
    return NULL;
}

We call cl_init() to initialize the library, the we cl_engine_new() in order to get a new engine then cl_load() to load bytecode based signatures. Notice here that I’ve used specifically this kind of signature since it is small and quick to be loaded. We do so in order to speed up the fuzzer initialization process, if you use huge signatures database it will take time for the fuzzer to load and execute each time (for example if you’re going to fuzz clamscan binary with AFL++ or hongfuzz you will likely experience timeouts). We then finalize the process calling cl_engine_compile(). In case of errors we just return NULL and free the engine.

Our target function cl_scanfile() accepts a filename, while libFuzzer provides us with pointer to a series of bytes. So in order to provide this input to cl_scanfile() we need to first drop the content into a file and then invoke it passing the filename. So we write an utility function, which accepts a pointer to the data and its size and writes the content into a file which name suffix is “randomized”.

static char *create_fuzz_file(const uint8_t *data, size_t size) {
    char path[] = "/tmp/fuzz-XXXXXX";
    int fd = mkstemp(path);
    if (fd == -1) {
        return NULL;
    }
 
    int status = write(fd, data, size);
    if (status == -1) {
        close(fd);
        unlink(path);
        return NULL;
    }
 
    close(fd);
    char *f = strndup(path, strlen(path));
    if (f == NULL) {
        unlink(path);
        return NULL;
    }
 
    return f;
}

The function is very simple, it creates a temporary file name with the /tmp/fuzz-XXXXXX template passed to mkstemp, writes the data in and returns a copy of its name allocated on the heap, so we’ve to remember to free it at some point.

Now last piece of the, the fuzzer entry point function:

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    struct cl_engine *engine = init_clam();
    if (engine == NULL) {
        fprintf(stderr, "error while initializing clam\n");
        return 0;
    }
     
    char *fname = create_fuzz_file(data, size);
    if (fname == NULL) {
        fprintf(stderr, "failed to create fuzz file\n");
        goto cleanup_engine;
    }
 
    const char *virname;
    struct cl_scan_options options;
 
    options.parse = CL_SCAN_PARSE_ELF | CL_SCAN_PARSE_ARCHIVE |
        CL_SCAN_PARSE_HTML | CL_SCAN_PARSE_HWP3 | CL_SCAN_PARSE_MAIL |
        CL_SCAN_PARSE_OLE2 | CL_SCAN_PARSE_PDF | CL_SCAN_PARSE_PE |
        CL_SCAN_PARSE_SWF | CL_SCAN_PARSE_XMLDOCS | CL_SCAN_MAIL_PARTIAL_MESSAGE;
 
    options.general = CL_SCAN_GENERAL_ALLMATCHES | CL_SCAN_GENERAL_HEURISTICS;
 
    options.heuristic = CL_SCAN_HEURISTIC_BROKEN | CL_SCAN_HEURISTIC_MACROS |
        CL_SCAN_HEURISTIC_STRUCTURED;
 
    cl_error_t status = cl_scanfile(fname, &virname, NULL, engine, &options);
    if (status == CL_VIRUS) {
        //fprintf(stdout, "detected virus: %s\n", virname);
    } else if (status != CL_CLEAN) {
        //fprintf(stderr, "cl_scanfile error: %s\n", cl_strerror(status));
        goto cleanup_file;
    }
cleanup_file:
    unlink(fname);
    free(fname);
 
cleanup_engine:
    status = cl_engine_free(engine);
    if (status != CL_SUCCESS) {
        fprintf(stderr, "cl_engine_free: %s\n", cl_strerror(status));
    }
 
    return 0;
}

Here the steps performed by this function:

Call our init_clam() function and get a pointer to a new engine instance
Call create_fuzz_file() and get a pointer to the filename
Create a struct cl_options and initialize it with various file formats the engine has to scan, then we specify CL_SCAN_GENERAL_ALLMATCHES, which tells the scan function to continue to scan the file after it finds a first match and CL_SCAN_GENERAL_HEURISTICS which enables heuristics alerts to be shown (you can disable it if you want). Then we enable some heuristics and that’s it
We finally invoke the cl_scanfile() and at the end of the function we just cleanup freeing the engine and removing the fuzzed file
Before we build the fuzzer we need another step, to just make our life a little bit easier. We copy the instrumented shared libraries that have been built before in our current fuzzer directory:

find clamav-0.104.2/build -name "*.so.*" -exec cp {} . \;

Now we just build our fuzzer:

clang -o fuzz_libclamav fuzz_libclamav.c libclamav.so.9.1.0 libclammspack.so.0.8.0 -fsanitize=fuzzer,address -I ./clamav-0.104.2/libclamav/ -I ./clamav-0.104.2/build/

Now we need to have some corpus in order to aid the fuzzer with its job, for example we can create a CORPUS directory and put inside an ELF binary like echo or whatever we like. Or can try with archives, PDF and any other file type supported by ClamAV.

OK! Finally we can run our fuzzer (we need to use LD_LIBRARY_PATH to instruct the dynamic linker at runtime that necessary libraries have to be searched into our current fuzzer directory):

LD_LIBRARY_PATH=(pwd) ./fuzz_libclamav CORPUS/
Fuzzing session starting up

libfuzzer

If we hit CTRL-C and interrupt the execution we can always restart it later just rerunning the command above.

Don’t forget to check the libFuzzer documentation in order to dig deep into its details since we just scratched the surface.