Explaining Black-Box Classifiers by Implicitly Learning
Decision Trees

Lange, Jane

Author(s)

Lange, Jane

DownloadThesis PDF (668.6Kb)

Advisor

Rubinfeld, Ronitt

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

We present algorithms for finding two types of objects that explain the classification of a black-box model f : {±1}^d → {±1} on an instance x ∈ {±1}^d . The first is a certificate: a small set of x’s features that in conjunction essentially determines f(x). The second is a counterfactual: a nearest instance x′ for which f(x′) ≠ f(x). We obtain both algorithms via a connection to the problem of implicitly learning decision trees. The implicit nature of this learning task allows for efficient algorithms even when the complexity of f necessitates an intractably large surrogate decision tree. We solve the implicit learning task by bringing together techniques from learning theory, local computation algorithms, and complexity theory. Our approach of “explaining by implicit learning” shares elements of two previously disparate methods for post-hoc explanations, global and local explanations, and we make the case that it enjoys advantages of both. Our certification algorithm runs in time poly(d, C(f)) and outputs a certificate of size poly(C(f)), where C(f) is the “average certificate complexity" of f. Our counterfactual algorithm runs in time S(f)^[O(∆f (x))] ·log d, where S(f) is the sensitivity of f (a discrete analogue of the Lipschitz constant) and ∆f (x) is the distance from x to its nearest counterfactual. We further prove a lower bound of S(f)^[Ω(∆f (x))] + Ω(log d) for finding counterfactuals, thereby showing that the guarantees of our algorithm are essentially optimal.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/163682

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses