VSCSE - Virtual School of Computational Science and Engineering

Proven Algorithmic Techniques for Many-core Processors

August 2–6, 2010

Course Archive

Visit the public course archive for the course schedule, course content, a discussion forum, and more.


Center for Computation & Technology, Louisiana State University, Baton Rouge

Institute for Data and High Performance Computing, Georgia Institute of Technology, Atlanta

Institute for Digital Research and Education, University of California, Los Angeles

National Center for Supercomputing Applications, Urbana, Illinois

Northwestern University, Evanston, Illinois

Ohio Supercomputer Center, Ohio State University, Columbus

RENCI, Chapel Hill, North Carolina

University of Iowa, Iowa City

University of Michigan, Ann Arbor

University of Tennessee, Knoxville


  • Experience working in a Unix environment
  • Experience developing and running scientific codes written in C or C++
  • Basic knowledge of CUDA (A short online course, Introduction to CUDA, is available to registered on-site students who need assistance in meeting this prerequisite)

Students who took the course Many-core Processors in 2009 are encouraged to take this follow-on course, which includes new topics and lab exercises.

Wen-Mei W. Hwu, professor of electrical and computer engineering and principal investigator of the CUDA Center of Excellence, University of Illinois at Urbana-Champaign

David Kirk, NVIDIA fellow

Course outline:

  • Introduction
    • why problem formulation and algorithm design choices can have dramatic effect on performance
    • common algorithmic strategies for high performance
  • Increasing locality in dense arrays
    • tiling of data access and layout
  • Improving efficiency and vectorization in dense arrays
    • granularity coarsening
  • Reducing output interference
    • conversion from scatter to gather
    • parallelizing reductions and histograms
  • Dealing with non-uniform data
    • data sorting and binning
  • Dealing with sparse data
    • sorting and packing
  • Dealing with dynamic data
    • parallel queue-based algorithms
  • Improving data efficiency in large data traversal
    • stencil and other grid-based computation
  • Extending beyond many-core processors
    • MPI+CUDA
    • MPI+OpenCL
  • Overview of use of techniques in application domains
    • molecular dynamics
    • computational fluid dynamics
    • medical imaging
    • computer vision
    • gene sequencing
  • Case studies:
    • molecular dynamics (NAMD/VMD, MPI, use of algorithm strategies)
    • medical imaging
    • gene sequencing, financial analysis, etc.
  • Hands-on Lab

NOTE: Students are required to provide their own laptops.