Technion
 
         Computer Science Colloquium
 
Time+Place : Sunday 20/11/2011 14:30 room 337-8 Taub  Bld.
 
Speaker    : Reut Tsarfaty
Affiliation: Uppsala University, Sweden
 
Host       : Johann Makowsky
 
Title      : Statistical Parsing in the Face of Language Diversity
 
Abstract   :
 
Syntactic parsing, that task of automatically analyzing the structure of
natural language sentences, is considered a core Computational
Linguistics/Natural Language Processing (CL/NLP) task as it provides the
first step towards utterance understanding, text summarization, machine
translation and other applications. Statistical parsers are designed to
automatically discover a set of relations between language-independent
elements such as a subject, a predicate, an object, etc., based on the
language-specific realization patterns observed in language data. The
diversity in the realization of grammatical relations across languages has
dramatic effects on parsing accuracy. A subject in English, for example, is
realized in syntax using word order. In German, in contrast, it is realized
in morphology, using word affixes. Existing statistical parsing models
demonstrate excellent performance on English, but when trained on data from
other languages they often fail to yield comparable results. A research
question thus emerges, namely, what kind of models are suitable for parsing
these different languages?
 
In this talk I present the motivation, design and application of a
Relational-Realizational (RR) parsing model which is designed to cope with
cross-linguistic diversity by mapping grammatical relations to
morphosyntactic realization in a non-rigid, language-independent, fashion.
The model is defined over a formal grammar that inter-relates function,
syntax and morphology. The model parameters encode complex interactions,
which, for particular languages, are estimated based on corpus statistics,
thus capturing their language-specific behavior. I demonstrate the
application of the RR model to parsing Hebrew and Swedish, showing
significant improvements over previous results.   I further use these
results to instantiate an explicit link between language technology and
linguistic typology, whereby the search for a  ``universal grammar"  for
cross-linguistic description is equated with the development of a processing
engine that learns different probability distributions from data in
different domains. I suggest that exploring this link further will lead to
advances at both the technological and the scientific CL/NLP fronts, from
better models for machine translation to modeling human language
acquisition.
 
SHORT BIO:
 
Reut Tsarfaty is a Post-Doctoral Research Fellow at the Computational
Linguistics lab at Uppsala University in Sweden. She received her Ph.D. and
MSc. from the Institute for Logic, Language and Computation (ILLC) at the
University of Amsterdam, and her BSc. from the Computer Science department
at the Technion.  Reut's research focuses on models and methodologies for
cross-linguistic and cross-framework natural language parsing. Beyond
syntactic parsing Reut has also worked on modeling the morphological,
syntactic and semantic interactions in natural language processing and on
applying formal logic to natural language semantics. Reut is a recipient of
the Dutch Science Foundation's prestigious MOSAIC award, and she is an
internationally renowned expert on Parsing Morphologically Rich Languages
(PMRL),  a topic on which she now serves as a guest editor on the
Computational Linguistics journal editorial board.
 
----------------------------------------------------------
Visit our home page-   <http://www.cs.technion.ac.il/~colloq>
 
---------------------------------------------------------
Technion Math. Net (TECHMATH)
Editor: Michael Cwikel   <techm@math.technion.ac.il> 
Announcement from: Hadas Heier   <heier@cs.technion.ac.il>