We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Using Natural Language Processing for Programming Language Code Classification with Multinomial Naive Bayes.
- Authors
Odeh, Ayman Hussein; Odeh, Munther; Odeh, Hussein; Odeh, Nada
- Abstract
Classifying Programming Languages scripts is very important task for several reasons such as: automated analysis, code maintenance, code search, quality assurance, and code understanding; this process is similar to processing natural languages, especially high-level languages like Python, Java, C#, C, C++, PHP, JavaScript, and others. Leveraging natural language processing concepts, this research explores the application of the Multinomial Naïve Bayes (MNB) algorithm to identify and classify programming languages used in source code files. MNB is a relatively simple and fast algorithm for text classification. The study utilizes a dataset comprising 12 programming languages and consists of 12,003 samples, totaling 396,090 lines of code. The MNB algorithm is trained on this diverse dataset, and its performance in classifying programming language source code is evaluated. The results of the study demonstrate an impressive accuracy rate of 95.09% in accurately identifying and classifying programming languages. This high accuracy highlights the effectiveness of the applied NLP techniques, specifically the MNB algorithm, in the classification task. The findings of this research have significant implications for multi-programming language editors such as Visual Studio Code and Notepad+ or any programming editor. With the automatic recognition of programming languages enabled by this approach, users can conveniently paste source code into these editors, and the system will automatically identify and classify the programming language being used. This functionality enhances the user experience and streamlines the coding process, particularly in multi-language development environments.
- Subjects
NAIVE Bayes classification; PROGRAMMING languages; JAVASCRIPT programming language; NATURAL language processing; SOURCE code
- Publication
Revue d'Intelligence Artificielle, 2023, Vol 37, Issue 5, p1229
- ISSN
0992-499X
- Publication type
Article
- DOI
10.18280/ria.370515