Resumen
In order to solve the current problems in domain long text classification tasks, namely, the long length of a document, which makes it difficult for the model to capture key information, and the lack of expert domain knowledge, which leads to insufficient classification accuracy, a domain long text classification model based on a knowledge graph and a graph convolutional neural network is proposed. BERT is used to encode the text, and each word?s corresponding vector is used as a node for the graph convolutional neural network so that the initialized vector contains rich semantic information. Using the trained entity?relationship extraction model, the entity-to-entity?relationships in the document are extracted and used as the edges of the graph convolutional neural network, together with syntactic dependency information. The graph structure mask is used to learn about edge relationships and edge types to further enhance the learning ability of the model for semantic dependencies between words. The method further improves the accuracy of domain long text classification by fusing knowledge features and data features. Experiments on three long text classification datasets?IFLYTEK, THUCNews, and the Chinese corpus of Fudan University?show accuracy improvements of 8.8%, 3.6%, and 2.6%, respectively, relative to the BERT model.