We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Research of classification analysis for distributed data warehouse.
- Authors
LI Wei-wei; LI Mei; ZHANG Yang; SHEN Ai-li
- Abstract
According to the limit of GAC-RDB classification algorithm which was designed for stand-alone data warehouse, in order to carry out data mining works more convenient and efficient on cloud computing platform, based on HBase, a distributed data warehouse, and the implementation mechanism of GAC-RDB classification algorithm, this paper proposed a distributed strategy, put forward the distributed GAC-RDB classification algorithm by native HiveQL language. Experiments show that the algorithm running time steadily decline as increased the number of nodes in the cluster. Results indicate that the efficiency of GAC-RDB algorithm can be improved when it is working on a distributed data warehouse, with extended scalability. Relative to the MapReduce framework, HiveQL cut down the technical requirements for data mining workers, decrease development time of the algorithm.
- Subjects
COMPUTER algorithms; DISTRIBUTED databases; CLASSIFICATION algorithms; CLUSTER analysis (Statistics); DATA mining; PROGRAMMING languages
- Publication
Application Research of Computers / Jisuanji Yingyong Yanjiu, 2013, Vol 30, Issue 10, p2936
- ISSN
1001-3695
- Publication type
Article
- DOI
10.3969/j.issn.1001-3695.2013.10.013