EBSCO Logo
Connecting you to content on EBSCOhost
Results
Title

Automatic content extraction and time-aware topic clustering for large-scale social network on cloud platform.

Authors

Li, Chunlin; Bai, Jingpan

Abstract

In recent years, with the increase in users in social network, the social network has had the feature of big data. The large-scale social network has become an indispensable part in people's life. However, the traditional data mining technology cannot suit the large-scale social network. Thus, it is urgent to develop a more suitable mining technology for the large-scale social network. In this section, a crawler model based on semantic analysis and spatial clustering is proposed firstly. Then, the content extraction model based on document object model tree is built to extract the target text information from the links fetched by the proposed crawler model. The similarities between textual information in different regions are computed to choose the important information. Moreover, a two-stage topic clustering model based on time information is presented. The time information is introduced into the similarity computation between two posts or clusters. The single-pass algorithm is improved and applied in different clustering stage to improve the clustering accuracy. Finally, the proposed algorithms are evaluated on Hadoop platform. The Hadoop platform can effectively reduce the computing time and improve the server quality of users in large-scale social network. Meanwhile, the experiments demonstrate that the proposed algorithms are suitable for the data processing in large-scale social network.

Subjects

SOCIAL networks; DATA mining; CLOUD computing; CLUSTER analysis (Statistics); ALGORITHMS; ACCURACY

Publication

Journal of Supercomputing, 2019, Vol 75, Issue 5, p2890

ISSN

0920-8542

Publication type

Academic Journal

DOI

10.1007/s11227-018-2704-z

EBSCO Connect | Privacy policy | Terms of use | Copyright | Manage my cookies
Journals | Subjects | Sitemap
© 2025 EBSCO Industries, Inc. All rights reserved