REVISTA
Algorithms

TODAS

Inicio / Algorithms / Vol: 15 Par: 1 (2022) / Artículo

ARTÍCULO

TITULO

Knowledge Distillation-Based Multilingual Code Retrieval

Wen Li

Junfei Xu and Qi Chen

Resumen

Semantic code retrieval is the task of retrieving relevant codes based on natural language queries. Although it is related to other information retrieval tasks, it needs to bridge the gaps between the language used in the code (which is usually syntax-specific and logic-specific) and the natural language which is more suitable for describing ambiguous concepts and ideas. Existing approaches study code retrieval in a natural language for a specific programming language, however it is unwieldy and often requires a large amount of corpus for each language when dealing with multilingual scenarios.Using knowledge distillation of six existing monolingual Teacher Models to train one Student Model?MPLCS (Multi-Programming Language Code Search), this paper proposed a method to support multi-programing language code search tasks. MPLCS has the ability to incorporate multiple languages into one model with low corpus requirements. MPLCS can study the commonality between different programming languages and improve the recall accuracy for small dataset code languages. As for Ruby used in this paper, MPLCS improved its MRR score by 20 to 25%. In addition, MPLCS can compensate the low recall accuracy of monolingual models when perform language retrieval work on other programming languages. And in some cases, MPLCS? recall accuracy can even outperform the recall accuracy of monolingual models when they perform language retrieval work on themselves.

Palabras claves

multilingualities - code search - knowledge distillation

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 15 Parte: 1 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Aerospace
Information

DOI

https://doi.org/10.3390/a15010025

Artículos similares

Leveraging Language Models for Inpatient Diagnosis Coding

Acceso

Kerdkiat Suvirat, Detphop Tanasanchonnakul, Sawrawit Chairat and Sitthichok Chaichulee

Medical coding plays an essential role in medical billing, health resource planning, clinical research and quality assessment. Automated coding systems offer promising solutions to streamline the coding process, improve accuracy and reduce the burden on ... ver más

Revista: Applied Sciences

Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque?Spanish ASR

Acceso

Mikel Penagarikano, Amparo Varona, Germán Bordel and Luis Javier Rodriguez-Fuentes

In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an... ver más

Revista: Applied Sciences

ChatGPT for Education and Research: Opportunities, Threats, and Strategies

Acceso

Md. Mostafizer Rahman and Yutaka Watanobe

In recent years, the rise of advanced artificial intelligence technologies has had a profound impact on many fields, including education and research. One such technology is ChatGPT, a powerful large language model developed by OpenAI. This technology of... ver más

Revista: Applied Sciences

Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

Acceso

Sergiu Zaharia, Traian Rebedea and Stefan Trausan-Matu

The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, f... ver más

Revista: Applied Sciences

Authorship Identification of Binary and Disassembled Codes Using NLP Methods

Acceso

Aleksandr Romanov, Anna Kurtukova, Anastasia Fedotova and Alexander Shelupanov

This article is part of a series aimed at determining the authorship of source codes. Analyzing binary code is a crucial aspect of cybersecurity, software development, and computer forensics, particularly in identifying malware authors. Any program is ma... ver más

Revista: Information

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas