Pruning Adapters with Lottery Ticket

Jiarun Wu and Qingliang Chen

Resumen

Massively pre-trained transformer models such as BERT have gained great success in many downstream NLP tasks. However, they are computationally expensive to fine-tune, slow for inference, and have large storage requirements. So, transfer learning with adapter modules has been introduced and has become a remarkable solution for those problems. Nevertheless, recent studies reveal that the parameters in adapters are actually still quite redundant, which could slow down inference speed when fusing multiple adapters for a specific downstream task, and thus, they can be further reduced. To address this issue, we propose three novel ways to prune the adapter modules iteratively based on the prestigious Lottery Ticket Hypothesis. Extensive experiments on the GLUE datasets show that the pruned adapters can achieve state-of-the-art results, with sizes reduced significantly while performance remains unchanged, and some pruned adapters even outperform the ones with the same size that are fine-tuned alone without pruning.

Palabras claves

pre-trained transformer model - adapter - prune

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 15 Parte: 2 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

DOI

https://doi.org/10.3390/a15020063

Pruning Adapters with Lottery Ticket

Revistas destacadas