We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Automatic content extraction and time-aware topic clustering for large-scale social network on cloud platform.
- Authors
Li, Chunlin; Bai, Jingpan
- Abstract
In recent years, with the increase in users in social network, the social network has had the feature of big data. The large-scale social network has become an indispensable part in people's life. However, the traditional data mining technology cannot suit the large-scale social network. Thus, it is urgent to develop a more suitable mining technology for the large-scale social network. In this section, a crawler model based on semantic analysis and spatial clustering is proposed firstly. Then, the content extraction model based on document object model tree is built to extract the target text information from the links fetched by the proposed crawler model. The similarities between textual information in different regions are computed to choose the important information. Moreover, a two-stage topic clustering model based on time information is presented. The time information is introduced into the similarity computation between two posts or clusters. The single-pass algorithm is improved and applied in different clustering stage to improve the clustering accuracy. Finally, the proposed algorithms are evaluated on Hadoop platform. The Hadoop platform can effectively reduce the computing time and improve the server quality of users in large-scale social network. Meanwhile, the experiments demonstrate that the proposed algorithms are suitable for the data processing in large-scale social network.
- Subjects
SOCIAL networks; DATA mining; CLOUD computing; CLUSTER analysis (Statistics); ALGORITHMS; ACCURACY
- Publication
Journal of Supercomputing, 2019, Vol 75, Issue 5, p2890
- ISSN
0920-8542
- Publication type
Academic Journal
- DOI
10.1007/s11227-018-2704-z