# DeepXplore: Automated Whitebox Testingof Deep Learning Systems

### Background and Motivation

Safety- and security-critical DL systems must be tested systematically to detect and fix ideally any potential flaws or undesired behaviors. Existing DNN testing techniques are not ideal because they require expensive human effort to provide correct labels and, more importantly, they achieve very low test coverage, which will leave different behaviors of DNNs unexplored.&#x20;

### Solution Overview&#x20;

The key ideas of DeepXplore are the concept of **neuron coverage** for measuring the parts of a DL system's logic exercised by a set of test inputs based on the number of neurons activated and **differential testing**, in which multiple models are used to identify erroneous corner cases without manual checks.&#x20;

![](/files/-M2vnYk39TCXOvF0zwzz)

DeepXplore takes unlabeled test input as seeds and generates new tests. While generating tests, DeepXplore tries to maximize both neuron coverage and the chances of tests that cause the DNN models to behave differently(i.e., output different labels). Both goals are necessary for thorough testing that exposes erroneous corner cases.&#x20;

DeepXplore solves the above joint optimization problem for neuron coverage and differential behavior maximization using gradient ascent.(See the paper for the detailed algorithm)

### Comments

One major limitation of DeepXplore, in my opinion, is its completeness. Although neuron coverage is clearly a better metric compared to code coverage or random testing, it is still far from complete verification of the model.

In particular, for traditional software testing, if we have an if statement(e.g., if( x < 0) ... else ...), it's easy to generate tests that cover both branches. However, it's hard to use neuron coverage to generate all possible inputs. In addition, DeepXplore requires users to provide input seeds and make minor changes to the input seeds to get the difference- inducing inputs, but how to pick these input seeds? For an image classification model, if the input seeds do not contain any cat images, how to find bugs that can only be caused by cat images?&#x20;

On the other hand, the paper only discussed DNN on image recognition tasks. Is it also applicable to other applications?(e.g., NLP tasks) and how to make sure that the tests DeepXplore generated are valid sentences?


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://xzhu0027.gitbook.io/blog/ml-system/sys-ml-index/deepxplore-automated-whitebox-testingof-deep-learning-systems.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
