Clustering Categorical Data with k-Modes

Author:

Huang, Joshua Zhexue

Place:

Hershey

Publisher:

IGI Global

Date published:

2008

Editor:

Wang, John

Journal Title:

Encyclopedia of Data Warehousing and Mining, Second Edition

Source:

Encyclopedia of Data Warehousing and Mining, Second Edition

Abstract:

A lot of data in real world databases are categorical. For example, gender, profession, position, and hobby of customers are usually defined as categorical attributes in the CUSTOMER table. Each categorical attribute is represented with a small set of unique categorical values such as {Female, Male} for the gender attribute. Unlike numeric data, categorical values are discrete and unordered. Therefore, the clustering algorithms for numeric data cannot be used to cluster categorical data that exists in many real world applications. In data mining research, much effort has been put on development of new techniques for clustering categorical data (Huang, 1997b; Huang, 1998; Gibson, Kleinberg, & Raghavan, 1998; Ganti, Gehrke, & Ramakrishnan, 1999; Guha, Rastogi, & Shim, 1999; Chaturvedi, Green, Carroll, & Foods, 2001; Barbara, Li, & Couto, 2002; Andritsos, Tsaparas, Miller, & Sevcik, 2003; Li, Ma, & Ogihara, 2004; Chen, & Liu, 2005; Parmar, Wu, & Blackhurst, 2007). The k-modes clustering algorithm (Huang, 1997b; Huang, 1998) is one of the first algorithms for clustering large categorical data. In the past decade, this algorithm has been well studied and widely used in various applications. It is also adopted in commercial software (e.g., Daylight Chemical Information Systems, Inc, http://www. daylight.com/).

Catagory URL:

http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-60566-0…

DOI:

https://library.au.int/10.4018/978-1-60566-010-3.ch040

CITATION: Huang, Joshua Zhexue. Clustering Categorical Data with k-Modes edited by Wang, John . Hershey : IGI Global , 2008. Encyclopedia of Data Warehousing and Mining, Second Edition - Available at: https://library.au.int/clustering-categorical-data-k-modes