Technion, IEM faculty - Statistics Seminar
 
Speaker: Richard Olshen, Stanford University
 
Title: Successive normalization of rectangular sets of data
 
Date: 03/07/2011
 
Time: 14:30
 
Place: Bloomfield-527
 
Abstract: 
 <http://ie.technion.ac.il/seminar_files/1307369931_Olshen.pdf>
 
Or read this:
 
Standard statistical techniques often require transforming data to have mean 0 and standard
deviation 1. Typically, this process of “standardization” or “normalization” is applied across
subjects when each subject produces a single number. High throughput genomic data often come
as a rectangular array, where each coordinate in one direction concerns a subject with, for
example, case or control status; and each coordinate in the other designates “outcome” for a
specific feature, for example, a “gene” or “gene fragment.” When analyzing data that come as
rectangular arrays, it may helpful if both subjects and features are “on the same footing.” This
entails a need to standard across rows and columns. We have investigated convergence of what
seems to us a natural approach to successive normalization that we learned from Bradley Efron.
The process applies when the array has at least three rows and at least three columns. It involves
successive polishing, by row (say), and then by column. A polish of rows (columns) means first
subtracting off means by row (column) and then dividing the resultant rows (columns) by
respective row (column) standard deviations. The process is iterated, beginning by rows, then
columns, then rows; alternatively beginning with columns and proceeding analogously. Not only
does this process converge for (Lebesgue) almost all initial arrays, but also convergence is very
fast in the numbers of iterations. Limiting arrays have row and column mean values 0, row and
column standard deviations 1.
 
Research thus far will be summarized, and plots that make clear the rapidity of convergence will
be shown. Particularly illustrative graphics will be given for the 3ª3 case. Extensions of
conclusions will be given.
 
All research in the area of this talk is collaborative with Stanford colleague Bala Rajaratnam.
 
---------------------------------------------------------
Technion Math. Net (TECHMATH)
Editor: Michael Cwikel   <techm@math.technion.ac.il> 
Announcement from:  <ynardi@ie.technion.ac.il>