![]()

![]()
Greetings! This is the web site of the Stat-NLP group (Statistical Natural
Language Processing group) at University of Waterloo.
In September 1999, this group was initialized by
Prof. Dale Schuurmans,
and were enthusiastically participated by
Prof. Nick Cercone,
Prof. Frank Tompa,
Prof. Chrysanne DiMarco,
Prof. Charlie Clarke,
and their graduate students.
We meet weekly to
learn and discuss our research on many areas in statistical NLP.
The current main interests of the group include:
statistical language modeling, information extraction, information retrieval,question answering,
parsing, text mining, data mining, text categorization, document clustering,speech recognition, etc.
| Next meeting: | |
|---|---|
| Time: | 12:00 noon, April 4th, 2003 |
| Place: | DC2306 C |
| Speaker: | Yingwei Wang |
| Topic: | The Design and Analysis of the Binary One Algorithm |
| Abstract: | The Binary One Algorithm is a time-efficient and space-efficient algorithm that retrieves high-frequency words from a document collection. This algorithm can be used to explore topics of a document collection. Its time cost is O(n*E); its space cost is O(log(n)*E), where n is the number of words (excluding stopwords) in the document collection and E is the maximal word length. The Binary One Algorithm uses a special strategy. As its name suggests, the algorithm's behavior is related to the number of ONEs in a binary representation. Inevitably the following questions need to be answered: Why do we devise such an algorithm? Why does this algorithm work in this way? Why is the number of ONEs in a binary representation helpful? The aim of this seminar is to introduce the Binary One Algorithm and analyze its behavior. |
| Meeting archive | |
| Date | Speaker | Topic |
To subscribe: send a message to majordomo@MailList.math.uwaterloo.ca with the following line in the message body:
subscribe cs-stat-nlp
To send a message to the list use the address: cs-stat-nlp@MailList.math.uwaterloo.ca
(new graduate students interested in statNLP are most welcome to join us! just send a message to Fuchun Peng to add your name here):
Vlado Keselj, PhD, 2002, assistant Professor at CS of University of Dalhousie
Reem Al-Halimi, PhD, 2002
Kristjansson, Trausti: Ph.D. (2002) now postdoc fellow at UofToronto
Amit Dubey:MMath (2001), now PhD student at Universitdt des Saarlandes
Jianchao Han:Ph.D.(2001): assistant professor at California State University
Chap 3 |
Chap4 |
Chapter 5: Collocation part 1, |
Chap 6 |
Chapter 8:Lexicalization |
Conferences:Collection of interesting conferences
Bibliography:Some bibliography of research papers on IE,TM,word segmentation,POS tagging,semi-supervised learning,language modeling,maximum entropy models
Corpus-based reference papers
1: David D. Palmer Marti A. Hearst, Adaptive Sentence Boundary Disambiguation, UC-Berkeley Technique Report
2: Xianping Ge, Wanda Pratt, Padhraic Smyth, Discovering Chinese Words from Unsegmented Text
3:Gregory Grefenstette, Pasi Tapanainen, What is a word, What is a sentence? Problems of Tokenization
4:DDL, Inducing Features of Random Fields
Chapter 6
Good-Turing Smoothing Without Tears
Precise n-gram Probabilities from Stochastic Context-free Grammers
Chapter 6: Statistical estimation: n-gram models over sparse data
Oxford English Dictionary at Waterloo
NLP Resources@NUS: It includes some free corpus
Natural Language Software Registry
LPAIG Group | | | CS Department | | | UWaterloo | | |
website maintained by Fuchun Peng, last updated: March 31th, 2003