Technion
 
         Computer Science Colloquium
 
Time+Place : Monday 09/01/2012 14:30 room 337-8 Taub  Bld.
Speaker    : Roi Reichart  NOTE UNUSUAL DAY 
Affiliation: CS and AI Lab - M I T
Host       : Johann Makowsky
Title      : Efficient and Exact Inter-Sentence Decoding for
             Natural Language Processing
 
Abstract   :
 
A fundamental task in Natural Language Processing (NLP) is learning the
syntax of human languages from text.
The task is defined both in the sentence level ("syntactic parsing") where a
syntactic tree describing the head-argument structure is to be created, and
in the word level ("part-of-speech tagging") where every word is assigned a
syntactic category such as noun, verb, adjective etc.
This syntactic analysis is an important building block in NLP applications
such as machine translation and information extraction.
 
While supervised learning algorithms perform very well on these tasks when
large collections of manually annotated text (corpora) exist, creating
manually annotated corpora is costly and error prone due to the complex
nature of annotation. Since most languages and text genres do not have large
syntactically annotated corpora, developing algorithms that learn syntax
with little human supervision is of crucial importance.
 
The work I will describe is focused on learning better parsing and tagging
models from limited amounts of manually annotated training data.
Our key observation is that existing models for these tasks are defined at
the sentence level, keeping inference tractable at the cost of discarding
inter-sentence information.
 
In this work we use Markov random fields to augment sentence-level models
for parsing and part-of-speech tagging with inter-sentence constraints.
To handle the resulting inference problem, we present a dual decomposition
algorithm for efficient, exact decoding of such global objectives. We apply
our model to the lightly supervised setting and show significant
improvements to strong sentence-level models across six languages.
 
Our technique is general and can be applied to other structured prediction
problems in natural language processing and in other fields, to enable
inference over large collections of data.
 
Joint work with Alexander Rush, Amir Globerson and Michael Collins.
 
Short bio:
 
Roi Reichart is a post-doctoral associate at the Computer Science and
Artificial Intelligence laboratory in the Massachusetts Institute of
Technology (MIT).
He is a member of the natural language processing group of Professor Regina
Barzilay. Before that he completed his PhD (June 2010)  in the Hebrew
University under the supervision of Prof. Ari Rappoport.
 
His main research interests are unsupervised and semi-supervised learning in
NLP, especially for syntactic acquisition tasks. His paper on active
learning for syntactic parsing  (together with Ari Rappoport) has won the
best paper award in CoNLL 2009. He is a recipient of the ISF bikura
fellowship for outstanding Israeli post-docs.
 
----------------------------------------------------------
Visit our home page-   <http://www.cs.technion.ac.il/~colloq>
---------------------------------------------------------
Technion Math. Net (TECHMATH)
Editor: Michael Cwikel   <techm@math.technion.ac.il> 
Announcement from: Hadas Heier   <heier@cs.technion.ac.il>