Explaining Black-Box Classifiers by Implicitly Learning
Decision Trees

Lange, Jane

dc.contributor.advisor	Rubinfeld, Ronitt
dc.contributor.author	Lange, Jane
dc.date.accessioned	2025-11-17T19:07:02Z
dc.date.available	2025-11-17T19:07:02Z
dc.date.issued	2025-05
dc.date.submitted	2025-08-14T19:32:12.772Z
dc.identifier.uri	https://hdl.handle.net/1721.1/163682
dc.description.abstract	We present algorithms for finding two types of objects that explain the classification of a black-box model f : {±1}^d → {±1} on an instance x ∈ {±1}^d . The first is a certificate: a small set of x’s features that in conjunction essentially determines f(x). The second is a counterfactual: a nearest instance x′ for which f(x′) ≠ f(x). We obtain both algorithms via a connection to the problem of implicitly learning decision trees. The implicit nature of this learning task allows for efficient algorithms even when the complexity of f necessitates an intractably large surrogate decision tree. We solve the implicit learning task by bringing together techniques from learning theory, local computation algorithms, and complexity theory. Our approach of “explaining by implicit learning” shares elements of two previously disparate methods for post-hoc explanations, global and local explanations, and we make the case that it enjoys advantages of both. Our certification algorithm runs in time poly(d, C(f)) and outputs a certificate of size poly(C(f)), where C(f) is the “average certificate complexity" of f. Our counterfactual algorithm runs in time S(f)^[O(∆f (x))] ·log d, where S(f) is the sensitivity of f (a discrete analogue of the Lipschitz constant) and ∆f (x) is the distance from x to its nearest counterfactual. We further prove a lower bound of S(f)^[Ω(∆f (x))] + Ω(log d) for finding counterfactuals, thereby showing that the guarantees of our algorithm are essentially optimal.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Explaining Black-Box Classifiers by Implicitly Learning Decision Trees
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: lange-jlange-sm-eecs-2025-thes ...
Size:: 668.6Kb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record

Explaining Black-Box Classifiers by Implicitly Learning Decision Trees

Files in this item

This item appears in the following Collection(s)