Text Representation with WordNet Synsets using Soft Sense Disambiguation
Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI)
Date Issued
Ramakrishnanan, Ganesh
Bhattacharyya, Pushpak
Text information processing depends critically on the proper representation of texts. A common and naive way of representing a text is as a bag of its component words. This representation suffers primarily from two drawbacks, viz., polysemy and synonymy which arise because of the ambiguity of the words and the lack of information about the relations between the words. This paper presents a model for representing a text in terms of the synsets in the WordNet- the lexical knowledge base of English words along with the semantic relations. These synsets stand for concepts which correspond to the words of the text. In particular, a soft sense disambiguation approach has been proposed. The text representation so obtained is found to convey the key ideas that the texts deal with. WordNet relations with other words in the sentence are exploited to disambiguate the senses. This scheme has been evaluated using a goodness measure based the information content of the representation of the text. As an actual application, the problem of text classification has been taken up, and the results are encouraging.