|
|
|
Jiarun Wu and Qingliang Chen
Massively pre-trained transformer models such as BERT have gained great success in many downstream NLP tasks. However, they are computationally expensive to fine-tune, slow for inference, and have large storage requirements. So, transfer learning with ad...
ver más
|
|
|