Resumen
Relation extraction (RE) is a fundamental NLP task that aims to identify relations between some entities regarding a given text. RE forms the basis for many advanced NLP tasks, such as question answering and text summarization, and thus its quality is critical to the relevant downstream applications. However, evaluating the quality of RE models is non-trivial. On the one hand, obtaining ground truth labels for individual test inputs is tedious and even difficult. On the other hand, there is an increasing need to understand the characteristics of RE models in terms of various aspects. To mitigate these issues, this study proposes evaluating RE models by applying metamorphic testing (MT). A total of eight metamorphic relations (MRs) are identified based on three categories of transformation operations, namely replacement, swap, and combination. These MRs encode some expected properties of different aspects of RE. We further apply MT to three popular RE models. Our experiments reveal a large number of prediction failures in the subject RE models, confirming that MT is effective for evaluating RE models. Further analysis of the experimental results reveals the advantages and disadvantages of our subject models and also uncovers some typical issues of RE models.