Resumen
For all research data collected, data descriptions and information about the corresponding variables are essential for data analysis and reuse. To enable cross-study comparisons and analyses, semantic interoperability of metadata is one of the most important requirements. In the area of clinical and epidemiological studies, data collection instruments such as case report forms (CRFs), data dictionaries and questionnaires are critical for metadata collection. Even though data collection instruments are often created in a digital form, they are mostly not machine readable; i.e., they are not semantically coded. As a result, the comparison between data collection instruments is complex. The German project NFDI4Health is dedicated to the development of national research data infrastructure for personal health data, and as such searches for ways to enhance semantic interoperability. Retrospective integration of semantic codes into study metadata is important, as ongoing or completed studies contain valuable information. However, this is labor intensive and should be eased by software. To understand the market and find out what techniques and technologies support retrospective semantic annotation/enrichment of metadata, we conducted a literature review. In NFDI4Health, we identified basic requirements for semantic metadata annotation software in the biomedical field and in the context of the FAIR principles. Ten relevant software systems were summarized and aligned with those requirements. We concluded that despite active research on semantic annotation systems, no system meets all requirements. Consequently, further research and software development in this area is needed, as interoperability of data dictionaries, questionnaires and data collection tools is key to reusing and combining results from independent research studies.