Resumen
Model induction is one of the most popular methods to extract information to better understand AI?s decisions by estimating the contribution of input features for a class of interest. However, we found a potential issue: most model induction methods, especially those that compute class activation maps, rely on arbitrary thresholding to mute some of their computed attribution scores, which can cause the severe quality degradation of model induction. Therefore, we propose a new threshold fine-tuning (TFT) procedure to enhance the quality of input attribution based on model induction. Our TFT replaces arbitrary thresholding with an iterative procedure to find the optimal cut-off threshold value of input attribution scores using a new quality metric. Furthermore, to remove the burden of computing optimal threshold values on a per-input basis, we suggest an activation fine-tuning (AFT) framework using a tuner network attached to the original convolutional neural network (CNN), retraining the tuner-attached network with auxiliary data produced by TFT. The purpose of the tuner network is to make the activations of the original CNN less noisy and thus better suited for computing input attribution scores based on class activation maps from the activations. In our experiments, we show that the per-input optimal thresholding of attribution scores using TFT can significantly improve the quality of input attribution, and CNNs fine-tuned with our AFT can be used to produce improved input attribution matching the quality of TFT-tuned input attribution without requiring costly per-input threshold optimization.