We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Double-weight LDA extracting keywords for financial fraud detection system.
- Authors
Cheng, Ching-Hsue; Cai, Wen-Hong
- Abstract
The impact of financial fraud is widespread, from everyday life to the financial industry, and it reduces industry confidence and destabilizes the country's economy. Therefore, it is important to develop an intelligent financial fraud detection system for early warning and prevention. This study proposes a double-weight latent Dirichlet allocation (DW-LDA) to extract the keywords from financial fraud data, and then we use five intelligent classifiers to build an intelligent text fraud detection model. In addition, the financial fraud dataset usually contains more non-fraud cases than fraud cases, which is an imbalanced dataset; hence, this study uses a synthesized minority oversampling technique (SMOTE) and random undersampling to handle imbalanced datasets. In verification, this study collected the Enron email and MD&A datasets to compare the performances of the related topic models and weighted LDA (TFIDF+LDA and PMI + LDA) with the proposed DW-LDA after SMOTE handling. In evaluating model performance, we use accuracy, recall, precision, F-score, and AUC as evaluation metrics, and the results show that the proposed DW-LDA (TFIDF+PMI + LDA) has a better performance than the listing topic models. For visual information representation, we use visual graphs to show the important results, such as the word cloud of the fraudulent email and keywords. The research results and the built intelligent text fraud detection model can be provided to investors and stakeholders for reference.
- Subjects
FRAUD investigation; SPAM email; FRAUD; CLOUD storage; NATURAL language processing; INVESTORS
- Publication
Multimedia Tools & Applications, 2024, Vol 83, Issue 17, p50757
- ISSN
1380-7501
- Publication type
Article
- DOI
10.1007/s11042-023-17334-1