Resumen
This paper provides new models of the attention-based random forests called LARF (leaf attention-based random forest). The first idea behind the models is to introduce a two-level attention, where one of the levels is the ?leaf? attention, and the attention mechanism is applied to every leaf of trees. The second level is the tree attention depending on the ?leaf? attention. The second idea is to replace the softmax operation in the attention with the weighted sum of the softmax operations with different parameters. It is implemented by applying a mixture of Huber?s contamination models and can be regarded as an analog of the multi-head attention, with ?heads? defined by selecting a value of the softmax parameter. Attention parameters are simply trained by solving the quadratic optimization problem. To simplify the tuning process of the models, it is proposed to convert the tuning contamination parameters into trainable parameters and to compute them by solving the quadratic optimization problem. Many numerical experiments with real datasets are performed for studying LARFs. The code of the proposed algorithms is available.