Semi-Supervised Conditional Random Fields for Improved Sequence
Segmentation and Labeling
Speaker: Feng Jiao
We present a new semi-supervised training
procedure for conditional random fields (CRFs) that can be used to
train sequence segmentors and labelers from a combination of labeled
and unlabeled training data. Our approach is based on extending the
minimum entropy regularization framework to the structured prediction
case, yielding a training objective that combines unlabeled
conditional entropy with labeled conditional likelihood. Although the
training objective is no longer concave, it can still be used to
improve an initial model (e.g. obtained from supervised training) by
iterative ascent. We apply our new training algorithm to the problem
of identifying gene and protein mentions in biological texts, and show
that incorporating unlabeled data improves the performance of the
%standard supervised CRF in this case.