We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
تقييم النمذجة الموضوعية لنصوص الصحف السعودية باستخدام دراسة لغوية حاسوبية :LDA خوارزمية تخصيص دركليه الكامن
- Authors
أفراح عبد العزيز التميمي
- Abstract
This paper is in the field of natural language processing. It applied unsupervised machine learning approach to identifying the latent topics in Saudi newspapers using one of the most important unsupervised topic modeling algorithms. This algorithm is called Latent Dirichlet Allocation (LDA). I built a corpus from Saudi newspapers, and it contained 4,781 texts after the preprocessing stage. It consisted of 649,734 tokens. The results of training 20 models with ten words showed that the optimal value for the number of topics in those texts is 7 topics. The 7-topics model got a good coherence degree of 0.6723. These topics were inferred through its ten words that had the highest probabilities on each topic. I interpreted the topics, respectively, according to the following topics: surveillance and awareness, development and improvement, sports, health, economics, domestic affairs, and international politics. The 7-topic model was evaluated qualitatively by manually reviewing the coherence of words in each topic. Also, I reviewed the first fifty texts on each topic to make sure that each of which belongs to the topic that LDA was assigned to it. The qualitative evaluation was supported by the algorithm being conducted again on the texts of each of the seven topics to access more details on each topic separately. Although there are some shortcomings in the results of the topic modeling, they can be optimized and then studied in discourse analysis instead of the traditional approaches.
- Subjects
NATURAL language processing; MACHINE learning; DISCOURSE analysis; ALGORITHMS; NEWSPAPERS
- Publication
Umm Al-Qura University Journal for Languages & Literature, 2022, Issue 29, p24
- ISSN
1658-4694
- Publication type
Article