Defuse: Debugging Classifiers Through Distilling Unrestricted Adversarial Examples
Dylan Slack, Nathalie Rauschmayr, and Krishnaram Kenthapadi
With the greater proliferation of machine learning models, the imperative of diag-nosing and correcting bugs in models has become increasingly clear. As a route to better discover and fix model bugs, we propose failure scenarios: regions on the data manifold that are incorrectly classified by a model. We propose an end-to-end debugging framework called Defuse to use these regions for fixing faulty classifier predictions. The Defuse framework works in three steps. First, Defuse identifies many unrestricted adversarial examples—naturally occurring instances that are misclassified—using a generative model. Next, the procedure distills the misclassified data using clustering. This step both reveals sources of model error and helps facilitate labeling. Last, the method corrects model behavior on the distilled scenarios through an optimization based approach. We illustrate the utilityof our framework on a variety of image data sets. We find that Defuse identifies and resolves concerning predictions while maintaining model generalization.
Links coming soon!