Summarizing Speech

Speaker: Gerald Penn, University of Toronto

Speech is arguably the most basic, most natural form of communication that we engage in, so it should come as no surprise that there has been a consistent pressure to deliver spoken audio content on web pages that, in principle, can be searched through. Even once the search problem is solved, however, the low-bandwidth, non-visual, traditional delivery of spoken audio makes it much more difficult to browse through. This makes the automated summarization of speech particularly attractive: given a number N, prepare a summary of a spoken "document" that contains the most important or salient content that is N seconds long, or N utterances long, or N percent of the original document's length.

This talk will present a (human-prepared) summary of our research on summarizing speech. We'll talk about how speech summarization is usually evaluated, including some of the appropriate baselines in this area, the dependence of genre on the performance and tuning of summarizers, the role of automated speech transcription in summarization, and the usefulness of some of the acoustic, untranscribed features of the speech signal.

Gerald Penn is an Associate Professor of Computer Science at the University of Toronto. He received his Ph.D. in 2000 from the School of Computer Science at Carnegie Mellon University. From 1999 to 2001, he was a Member of Technical Staff in the Multimedia Communications Research Laboratory at Bell Labs in the United States. His other research interests include mathematical linguistics, parsing in freer word-order languages, spoken language processing and programming languages.