An Investigation of Word Sense Disambiguation for Improving Lexical Chaining
Speaker: Mattt Enns
For my Masters thesis I researched the effects of word sense
disambiguation on lexical chains. A lexical chain is a sequence of
words in a document that are semantically related (i.e., related in
meaning). Lexical chains indicate where certain topics or subjects
are being discussed in a document. The chains therefore can provide
context and be used to determine where topic changes occur. Other
applications include document summarization and keyword generation for
documents.
My thesis examines how polysemous words and word sense disambiguation
affect lexical chaining. Polysemous words are words with more than
one meaning or sense. For example, "bat" can refer to a baseball bat
or a flying mammal. Two polysemous words should only be chained
together if their intended meanings are related: bat should only be
linked to wing if in the context of the doucment it actually refers to
a flying mammal. Word sense disambiguation is the process of choosing
a single sense for each polysemous word in a document.
Lexical chaining algorithms to date have performed word sense
disambiguation as part of building lexical chains. In my thesis I
propose that word sense disambiguation be performed prior to creating
lexical chains, using the most accurate disambiguation algorithm
available. The centrepiece of my thesis is an experiment that
observes the effects of performing word sense disambiguation
accurately, prior to creating lexical chains. I show that performing
word sense disambiguation accurately does significantly increase the
correctness of the lexical chains produced. I also present an
unexpected result: in some cases, incorrect word sense diambiguation
allows for better lexical chains than correct word sense
disambiguation does.
My talk will start with an overview of lexical chaining, word sense
disambiguation and semantic relatedness. I then present my experiment
and its results, followed by analyses of the more interesting
observations.