Data Mining and the Text Categorization Framework
Data Mining and the Text Categorization Framework
The aim of this contribution is to show one of the most important application of text mining. According to a wide part of the literature regarding the aforementioned field, great relevance is given to the classification task (Drucker et al., 1999, Nigam et al., 2000). The application contexts are several and multitask, from text filtering (Belkin & Croft, 1992) to word sense disambiguation (Gale et al., 1993) and author identification ( Elliot and Valenza, 1991), trough anti spam and recently also anti terrorism. As a consequence in the last decade the scientific community that is working on this task, has profuse a big effort in order to solve the different problems in the more efficient way. The pioneering studies on text categorization (TC, a.k.a. topic spotting) date back to 1961 (Maron) and are deeply rooted in the Information Retrieval context, so declaring the engineering origin of the field under discussion. Text categorization task can be briefly defined as the problem of assigning every single textual document into the relative class or category on the basis of the content and employing a classifier properly trained. In the following parts of this contribution we will formalize the classification problem detailing the main issues related.
CITATION: Cerchiello, Paola. Data Mining and the Text Categorization Framework edited by Wang, John . Hershey : IGI Global , 2008. Encyclopedia of Data Warehousing and Mining, Second Edition - Available at: https://library.au.int/frdata-mining-and-text-categorization-framework