Computer Science Colloquium
Time+Place : Wednesday 11/01/2012 14:30 room 337-8 Taub  Bld.
Speaker    : Joseph Keshet
Affiliation: TTI-Chicago
Host       : Johann Makowsky
Title      : Making Computers Good Listeners
Abstract   :
A typical problem in speech and language processing has a very large number
of training examples, is sequential, highly structured, and has a unique
measure of performance, such as the word error rate in speech recognition,
or the BLEU score in machine translation. The simple binary classification
problem typically explored in machine learning is no longer adequate for the
complex decision problems encountered in speech and language applications.
Binary classifiers cannot handle the sequential nature of these problems,
and are designed to minimize the zero-one loss, i.e., correct or incorrect,
rather than the desired measure of performance.
In addition, the current state-of-the-art models in speech and language
processing are generative models that capture some temporal dependencies,
such as Hidden Markov Models (HMMs). While such models have been immensely
important in the development of accurate large-scale speech processing
applications, and in speech recognition in particular, theoretical and
experimental evidence have led to a wide-spread belief that such models have
nearly reached a performance ceiling.
In this talk, I first present a new theorem stating that a general learning
update rule directly corresponds to the gradient of the desired measure of
performance. I present a new algorithm for phoneme-to-speech alignment based
on this update rule, which surpasses all previously reported results on a
standard benchmark. I show a generalization of the theorem to training
non-linear models such as HMMs, and present empirical results on phoneme
recognition task which surpass results from HMMs trained with all other
training techniques.
I will then present the problem of automatic voice onset time (VOT)
measurement, one of the most important variables measured in phonetic
research and medical speech analysis. I will present a learning algorithm
for VOT measurement which outperforms previous work and performs near human
inter-judge reliability. I will discuss the algorithm's implications for
tele-monitoring of Parkinson's disease, and for predicting the effectiveness
of chemo-radiotherapy treatment of head and neck cancer.
Short Bio:
Joseph Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering
in 1994 and 2002, respectively, from Tel Aviv University. He received his
Ph.D. in Computer Science from The School of Computer Science and
Engineering at The Hebrew University of Jerusalem in 2007. From 1995 to 2002
he was a researcher at IDF, and won the prestigious Israeli award, "Israel
Defense Prize", for outstanding research and development achievements. From
2007 to 2009 he was a post-doctoral researcher at IDIAP Research Institute
in Switzerland. From 2009 He is a research assistant professor at
TTI-Chicago, a philanthropically endowed academic computer science institute
within the campus of university of Chicago. Dr. Keshet's research interests
are in speech and language processing and machine learning. His current
research focuses on the design, analysis and implementation of machine
learning algorithms for the domain of speech and language processing
Visit our home page-   <>
Technion Math. Net (TECHMATH)
Editor: Michael Cwikel   <> 
Announcement from: Hadas Heier   <>