Computer Science Colloquium
Time+Place : Tuesday 31/01/2012  15:30  room 337-8 Taub  Bld.
Speaker    : Ran Ginosar  NOTE UNUSUAL HOUR 
Affiliation: EE & CS, Technion
Host       : Johann Makowsky
Title      : The Plural Architecture: Shared Memory Many-cores with
             Hardware Scheduling
Abstract   :
The Plural many-core architecture combines hundreds of simple cache-less
cores, many shared cache banks, a hardware scheduler, and two custom active
networks-on-chip: cores-to-shared-caches and cores-to-scheduler. A
theoretical model (almost) justifies increasing the number of cores while
making them smaller and slower, maximizing performance-to-power ratio.
Several benchmark simulations are demonstrated, showing close to linear
speedup and high performance-to-power ratio.
A de-synchronized PRAM-like task-based non-CSP programming model for shared
memory enables fine-grain parallelism. Plural tasks are sequential.
Precedence relations among tasks are described by a task map, which is
executed by the hardware scheduler. Duplicable tasks are described once and
executed as multiple instances, under control of the hardware scheduler.
Tasks are not functions-they neither receive inputs nor generate outputs;
data are shared only through shared memory. Control tasks (join, fork,
condition) contain no code, and are executed only by the scheduler.
There are no locking mechanisms-all synchronizations are formulated as
inter-task dependencies and managed by the scheduler.
The shared memory is organized as many banks, allowing all or most cores
simultaneous access. A multistage interconnection network resolves address
conflicts and may include fetch-and-op facility to enhance PRAM-like
concurrent read-and-write as well as unique indexing operations. Addresses
are interleaved to reduce conflicts. The entire shared memory is organized
as a shared L1 cache. The architecture supports an optional L2 cache
The Plural architecture employs standard processors; we have tried Sparc,
Microblaze and some proprietary ones. Cores contain a small private
scratch-pad memory for unshared variables. Shared co-processors include FPU
and collective support. DMA processors provide for data pre-fetching.
The Plural architecture is intended for one-job-at-a-time accelerators; it
is not a multitasking multicore, and there should be no OS. The architecture
has been implemented as an IP core for mobile SoC and as a FPGA accelerator.
It has yet to be demonstrated as a standalone IC. During the talk we will
also contrast it with other many-core architectures including Tiles, Rigel
and XMT.
Short Bio:
Prof. Ran Ginosar received BSc from the Technion and PhD from Princeton
University in 1982. He has conducted research at Bell Laboratories, at the
University of Utah and at Intel Research Laboratories in Oregon, USA.
He is member of the faculty of EE and CS departments at the Technion, and
heads the VLSI Systems Research Center. He has also co-founded several
start-up companies in the area of VLSI and parallel processing. His research
interests focus on VLSI and parallel processing architectures.
Visit our home page-   <>
Fri Jan 13 10:24:10 IST 2012
Technion Math. Net (TECHMATH)
Editor: Michael Cwikel   <> 
Announcement from: Hadas Heier   <>