Distilling Knowledge with a Teacher?s Multitask Model for Biomedical Named Entity Recognition

Tahir Mehmood

Alfonso E. Gerevini

Alberto Lavelli

Matteo Olivato and Ivan Serina

Resumen

Single-task models (STMs) struggle to learn sophisticated representations from a finite set of annotated data. Multitask learning approaches overcome these constraints by simultaneously training various associated tasks, thereby learning generic representations among various tasks by sharing some layers of the neural network architecture. Because of this, multitask models (MTMs) have better generalization properties than those of single-task learning. Multitask model generalizations can be used to improve the results of other models. STMs can learn more sophisticated representations in the training phase by utilizing the extracted knowledge of an MTM through the knowledge distillation technique where one model supervises another model during training by using its learned generalizations. This paper proposes a knowledge distillation technique in which different MTMs are used as the teacher model to supervise different student models. Knowledge distillation is applied with different representations of the teacher model. We also investigated the effect of the conditional random field (CRF) and softmax function for the token-level knowledge distillation approach, and found that the softmax function leveraged the performance of the student model compared to CRF. The result analysis was also extended with statistical analysis by using the Friedman test.

Palabras claves

biomedical named entity recognition - deep learning - single-task model - multitask learning - knowledge distillation

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 14 Parte: 5 (2023)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Journal of Low Power Electronics and Applications
Applied Sciences
Journal of Intelligence Studies in Business

DOI