Abstract
The growing field of explainable AI (XAI) develops methods that help better understand ML model predictions. While SHapley Additive exPlanations (SHAP) is a widely-used, model-agnostic method for explaining predictions, its use comes with a substantial computational burden, particularly for complex models and large datasets with many features. The key—and so far unaddressed—challenge lies in efficiently scaling these computations without compromising accuracy. In this paper, we present a scalable, model-agnostic SHAP sampling framework on top of Apache SystemDS. We leverage Antithetic Permutation Sampling for its efficiency and optimization potential, and we devise a carefully vectorized and parallelized implementation for local and distributed operations. Compared with the state-of-the-art Python SHAP package, our solutions yield similar accuracy but achieve substantial speedups of up to 14× for multi-threaded singlenode operations as well as up to 35× for distributed Spark operations (on a small 8 node cluster).
Citation
Le Page, L., Dionysio, C. and Boehm, M., Scalable Computation of Shapley Additive Explanations. Datenbanksysteme für Business, Technologie und Web (BTW 2025)
@article{le2025scalable,
title={Scalable Computation of Shapley Additive Explanations},
author={Le Page, Louis and Dionysio, Christina and Boehm, Matthias},
journal={Datenbanksysteme f{\"u}r Business, Technologie und Web (BTW 2025)},
pages={355}
}