Semi-supervised Multi-view HMMs Learning for Information Extraction

Speaker: Zhenmei Gu

In information extraction (IE), training statistical models like HMMs (hidden Markov models) usually requires a considerably large set of labeled data. Such a requirement may not be easily met in a practical IE application. In this talk, we investigate how to adapt a fully supervised IE learner to a semi-supervised one so that the learner is able to make use of unlabeled data to help train a more robust model from very limited labeled data. In particular, we consider applying the Co-EM learning strategy for this purpose. A semi-supervised two-view algorithm is then proposed to train HMMs to learn extraction patterns from both the term sequences and the Part-of-Speech sequences of texts. Our initial experimental results show that the proposed algorithm yields better, in some cases significantly better, extraction performance than the learner using only the labeled data.