Paragraph Context Determination Through Rhetorical Figures - A
Practical Approach Using Epanaphora
Speaker: Claus Strommer
In this study we examine the applications of rhetorical figures in
natural language processing. We shall focus on rhetorical anaphora
(epanaphora). Since epanaphora is simply the repetition of words at
the beginning of sentences, phrases, or paragraphs, it can be parsed
with minimal machine error.For the purpose of this research we use the
TREC '06 Blog corpus, chosen over a pre-parsed corpus like Penn
Treebank because its larger size encompasses a greater variety of
styles. Furthermore, due to it being more recent, and because of the
sources used, it is more representative of modern, common-day
prose.Detection and recording of epanaphora only involves finding
well-delimited repetitions. Classification, however, is more
challenging. We created three main categories into which we aggregate
the found instances of epanaphora: Accidental, designed-intentional,
and brute-intentional. We show how to identify and use the markers
that let us classify instances of epanaphora into these
categories.Once the found instances of epanaphora have been
classified, we demonstrate how the classification criteria correlate
to the context of the sentences. We show that distinct author goals
influence the type of epanaphora used, and that different types of
epanaphora may be used as indicators of writing style and author
intention.We conclude that epanaphora are a cost-effective method for
classifying the context of paragraphs and that this method can be used
to complement other natural language processing techniques. We
furthermore infer that our classification methods can be applied to
other rhetorical figures, and that they are potentially
language-agnostic.