We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification.
- Authors
Dogan, Turgut; Uysal, Alper Kursat
- Abstract
The performance of text classification can be affected by the choice of appropriate term weighting scheme as well as other parameters. The terminology supervised term weighting scheme has become popular in recent years, as it may provide discriminative representation in vector space for text documents belonging to different classes. A term weighting scheme generally consists of three factors, namely term frequency factor, collection frequency factor, and length normalization factor. The researchers mostly have been focused on developing new collection frequency factors in term weighting studies. However, the term frequency factor has an important role, especially in supervised term weighting. In this study, we extensively analyzed the effects of using different term frequency factors on seven supervised term weighting schemes. While six of these supervised term weighting schemes were applied in the previous studies in the literature, we derived one of them from an existing feature selection method and it was not used as a weighting method before. This analysis is performed using SVM and Roccio classifiers on two widely known benchmark datasets with different characteristics. Experimental results showed that modification of term frequency factor in supervised term weighting schemes increased the performance of almost all weighting schemes. Also, term weighting schemes using square root function-based term frequency factor (SQRT_TF) are more successful than the ones using term frequency (TF) and logarithmic function-based term frequency (LOG_TF) factors. TF term frequency factor seems as the least effective one among three different term frequency factors according to the experimental results and statistical analysis.
- Subjects
VECTOR spaces; FEATURE selection; SQUARE root; TEXT processing (Computer science); TERMS &; phrases
- Publication
Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. ), 2019, Vol 44, Issue 11, p9545
- ISSN
2193-567X
- Publication type
Article
- DOI
10.1007/s13369-019-03920-9