Hierarchical Document Clustering

Author:

Fung, Benjamin C.M.

Place:

Hershey

Publisher:

IGI Global

Date published:

2008

Responsibility:

Wang, Ke, jt.author

Ester, Martin, jt.author

Editor:

Wang, John

Journal Title:

Encyclopedia of Data Warehousing and Mining, Second Edition

Source:

Encyclopedia of Data Warehousing and Mining, Second Edition

Abstract:

Document clustering is an automatic grouping of text documents into clusters so that documents within a cluster have high similarity in comparison to one another, but are dissimilar to documents in other clusters. Unlike document classification (Wang, Zhou, & He, 2001), no labeled documents are provided in clustering; hence, clustering is also known as unsupervised learning. Hierarchical document clustering organizes clusters into a tree or a hierarchy that facilitates browsing. The parent-child relationship among the nodes in the tree can be viewed as a topic-subtopic relationship in a subject hierarchy such as the Yahoo! directory. This chapter discusses several special challenges in hierarchical document clustering: high dimensionality, high volume of data, ease of browsing, and meaningful cluster labels. State-of-the-art document clustering algorithms are reviewed: the partitioning method (Steinbach, Karypis, & Kumar, 2000), agglomerative and divisive hierarchical clustering (Kaufman & Rousseeuw, 1990), and frequent itemset-based hierarchical clustering (Fung, Wang, & Ester, 2003). The last one, which was developed by the authors, is further elaborated since it has been specially designed to address the hierarchical document clustering problem.

Catagory URL:

http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-60566-0…

DOI:

https://library.au.int/10.4018/978-1-60566-010-3.ch150

CITATION: Fung, Benjamin C.M.. Hierarchical Document Clustering edited by Wang, John . Hershey : IGI Global , 2008. Encyclopedia of Data Warehousing and Mining, Second Edition - Available at: https://library.au.int/hierarchical-document-clustering