An Investigation of Word Sense Disambiguation for Improving Lexical Chaining

Speaker: Mattt Enns

For my Masters thesis I researched the effects of word sense disambiguation on lexical chains. A lexical chain is a sequence of words in a document that are semantically related (i.e., related in meaning). Lexical chains indicate where certain topics or subjects are being discussed in a document. The chains therefore can provide context and be used to determine where topic changes occur. Other applications include document summarization and keyword generation for documents.

My thesis examines how polysemous words and word sense disambiguation affect lexical chaining. Polysemous words are words with more than one meaning or sense. For example, "bat" can refer to a baseball bat or a flying mammal. Two polysemous words should only be chained together if their intended meanings are related: bat should only be linked to wing if in the context of the doucment it actually refers to a flying mammal. Word sense disambiguation is the process of choosing a single sense for each polysemous word in a document.

Lexical chaining algorithms to date have performed word sense disambiguation as part of building lexical chains. In my thesis I propose that word sense disambiguation be performed prior to creating lexical chains, using the most accurate disambiguation algorithm available. The centrepiece of my thesis is an experiment that observes the effects of performing word sense disambiguation accurately, prior to creating lexical chains. I show that performing word sense disambiguation accurately does significantly increase the correctness of the lexical chains produced. I also present an unexpected result: in some cases, incorrect word sense diambiguation allows for better lexical chains than correct word sense disambiguation does.

My talk will start with an overview of lexical chaining, word sense disambiguation and semantic relatedness. I then present my experiment and its results, followed by analyses of the more interesting observations.