Greetings! This is the web site of the Stat-NLP group (Statistical Natural Language Processing group) at University of Waterloo. In September 1999, this group was initialized by Prof. Dale Schuurmans, and were enthusiastically participated by Prof. Nick Cercone, Prof. Frank Tompa, Prof. Chrysanne DiMarco, Prof. Charlie Clarke, and their graduate students.

We meet weekly to learn and discuss our research on many areas in statistical NLP. The current main interests of the group include: statistical language modeling, information extraction, information retrieval,question answering, parsing, text mining, data mining, text categorization, document clustering,speech recognition, etc.

Group Meetings

Next meeting:
Time:12:00 noon, April 4th, 2003
Place:DC2306 C
Speaker: Yingwei Wang
Topic: The Design and Analysis of the Binary One Algorithm
Abstract: The Binary One Algorithm is a time-efficient and space-efficient algorithm that retrieves high-frequency words from a document collection. This algorithm can be used to explore topics of a document collection. Its time cost is O(n*E); its space cost is O(log(n)*E), where n is the number of words (excluding stopwords) in the document collection and E is the maximal word length. The Binary One Algorithm uses a special strategy. As its name suggests, the algorithm's behavior is related to the number of ONEs in a binary representation. Inevitably the following questions need to be answered: Why do we devise such an algorithm? Why does this algorithm work in this way? Why is the number of ONEs in a binary representation helpful? The aim of this seminar is to introduce the Binary One Algorithm and analyze its behavior.
Meeting archive

Tentative Future Speaker List

Date Speaker Topic

Group E-mail List

To subscribe: send a message to majordomo@MailList.math.uwaterloo.ca with the following line in the message body:
subscribe cs-stat-nlp

To send a message to the list use the address: cs-stat-nlp@MailList.math.uwaterloo.ca

Current Members

(new graduate students interested in statNLP are most welcome to join us! just send a message to Fuchun Peng to add your name here):

Professors: Dale Schuurmans Frank Tompa Chrysanne DiMarco Nick Cercone Charlie Clarke
Students: Fuchun Peng ZhenMei Gu Steve Engels Michael Fleming
Peter Vanderheyden Jiye Li Alfred Renaud Daming Yao
Tony Abou-Assaleh Xiangdong An Robert H. Warren Yingwei Wang
Post Docs: Shaojun Wang Jimmy Huang
Visitors: Kanlaya Naruedomkul Naruemon Wattanapongsakorn Lalita Hathai

Former Members

Vlado Keselj, PhD, 2002, assistant Professor at CS of University of Dalhousie

Reem Al-Halimi, PhD, 2002

Kristjansson, Trausti: Ph.D. (2002) now postdoc fellow at UofToronto

Amit Dubey:MMath (2001), now PhD student at Universitdt des Saarlandes

Jianchao Han:Ph.D.(2001): assistant professor at California State University

Books

Courses

Available slides

Chap 3
Linguistic Essential

Chap4
Corpus-based work

Chapter 5: Collocation part 1,
part 2

Chap 6
Statistical inference

 Chap7:
Word Sense Disambiguation

Chapter 8:Lexicalization
Part 2

Resources

Data, Readme

Conferences:Collection of interesting conferences

Bibliography:Some bibliography of research papers on IE,TM,word segmentation,POS tagging,semi-supervised learning,language modeling,maximum entropy models

Papers

Corpus-based reference papers

1: David D. Palmer Marti A. Hearst, Adaptive Sentence Boundary Disambiguation, UC-Berkeley Technique Report

2: Xianping Ge, Wanda Pratt, Padhraic Smyth, Discovering Chinese Words from Unsegmented Text

3:Gregory Grefenstette, Pasi Tapanainen, What is a word, What is a sentence? Problems of Tokenization

4:DDL, Inducing Features of Random Fields

Chapter 6

Good-Turing Smoothing Without Tears

Precise n-gram Probabilities from Stochastic Context-free Grammers

Chapter 6: Statistical estimation: n-gram models over sparse data

More Papers

Useful Links

Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources

Oxford English Dictionary at Waterloo

Corpus Linguistics

Software Tools for NLP

NLP Resources@NUS: It includes some free corpus

Natural Language Software Registry


LPAIG Group

|

CS Department

|

UWaterloo

|

website maintained by Fuchun Peng, last updated: March 31th, 2003