VSCSE - Virtual School of Computational Science and Engineering

2014 Summer School Courses

Data Intensive Summer School (June 30 - July 2, 2014)

All speakers from San Diego Supercomputer Center unless noted otherwise.

----- Monday, June 30 -----
PACIFIC TIME
VIDEO OF COURSE BELOW AGENDA

8:00-8:15
Robert Sinkovits
Introduction and overview

8:15-9:15
Rachana Ananthakrishnan
Computation Institute, University of Chicago
Globus for Research Data Management Slides
Globus for Research Data Management Presentation

9:15-10:00
Ilkay Altintas
- Overview of workflows and some workflow use cases
- How are workflows being used in science today
Workflow Driven Science Presentation
Overview of Workflows and Some Workflow Use Cases

10:00-10:15
Break

10:15-10:45
Ilkay Altintas
- Introduction to Kepler and is features

10:45-11:15
Shweta Purawat
- Demo session for getting started with Kepler, Keplers UI and how to build a workflow.
- Demos of introductory and advanced Kepler workflow examples.

11:15-11:45
Lunch

11:45-12:15
Shweta Purawat
- Provenance framework in Kepler and reproducibility
- Demo of Keplers provenance, run manager and reporting modules

12:15-2:00
Ilkay Altintas
- Distributed computing on XSEDE and commodity clusters (SGE, GPU) in Kepler
- Introduction to Hadoop, Spark and Flink engines in Kepler, system demonstration
- Scalable bioinformatics using bioKepler

----- Tuesday, July 1 -----

8:00-9:00
Rick Wagner
File systems, hardware and the nuts and bolts of storage

9:00-10:00
Amarnath Gupta and Bill West
Working with Big Data
Code and Data
https://www.dropbox.com/s/zsbxv7spmvahll0/map_reduce_data.tar.gz

10:00-10:15
Break

10:15-11:15
Amarnath Gupta and Bill West
Working with big data (continued)

11:15-11:45
Lunch

11:45-2:00
Amarnath Gupta and Bill West
Working with big data (continued w/ optional break ~ 1:00)

----- Wednesday, July 2 -----

8:00-9:00
Natasha Balac
Introduction to predictive analytics and data mining

9:00-9:30
Nicole Wolter
Overview of data mining tools

9:30-9:45
Break

9:45-11:15
Paul Rodriguez
Unsupervised learning (PCA and clustering)
https://www.dropbox.com/s/ei5bmfntm425pkd/AHW_1.csv
https://www.dropbox.com/s/pyjtpafwz8vk2dz/AHW_1_filteredsports.csv
https://www.dropbox.com/s/4zdi2yvnsrb0f6x/QUEST_out_data.csv
https://www.dropbox.com/s/77kvf4l4vhvtpbi/core_interactions.txt

11:15-11:45
Lunch

11:45-12:45
Nicole Wolter and Natasha Balac
Supervised learning (decision trees)

12:45-1:00
Break

1:00-2:00
Paul Rodriguez
Techniques and Strategies for Big Data

VIDEOS FROM DATA INTENSIVE COURSE
Dealing with Data: Choosing a Good Storage Technology for Your Application, Rick Wagner
Big Data, What Does it Mean For Me? Amarnath Gupta, Bill West
Session 1: Workflow-Driven Science, Dr. Ilkay Altintas
Globus Introduction, Rachana Ananthakrishnan
Session 2: Kepler Workflow System and Features, Dr. Ilkay Altintas
Shweta Purawat Lecture
Session 5: Distributed Computing in Kepler, Dr. Ilkay Altintas
Benefits and Burdens, Amarnath Gupta
Introduction to Predictive Analytics and Data Mining, Natasha Balac, Ph.D.
Introduction Predictive Analytics Tools: Weka, R.
Supervised Learning
Strategies for Big Data
Downloading R RStudio
Data Exploration

Harness the Power of GPU's: Introduction to GPGPU Programming (June 16 - 20, 2014)

Harness the Power of GPUs, an Introduction to GPGPU Programming is a mixture of lectures and labs and introduces all levels of parallelism as well as common approaches for parallelization in order to achieve the following goals: Better utilization of the GPUs by enabling more scientists to use them, better understanding of the efficiency in the GPU utilization by the application developers and a higher job throughput by enabling more resources and shortening job runtimes. In addition, participants will understand and avoid the common pitfalls of parallel computing, learn CUDA and OpenACC, understand the basic principles of data parallel computing, tap into enormous computing power, even on a laptop, and speed up research.

Monday - Lecture 1: Introduction
Monday - Lab 1: Introduction
Monday - Lecture 2: Kernels, Threads, Blocks and Grids
Monday - Lab 2: Kernels, Threads, Blocks and Grids
Tuesday - Lecture 3: GPU Architecture
Tuesday - Lab 3: GPU Architecture
Tuesday - Lecture 4: GPU Memory Hierarchies and Management
Tuesday - Lab 4: Matrix-Matrix Multiplication
Wednesday - Lecture 5: Shared Memory
Wednesday - Lab 5: Advanced Matrix-Matrix Multiplication
Wednesday - Lecture 6: Streams and Dynamic Parallelism
Wednesday - Lab 6: Quicksort
Thursday - Lab 7: First Open ACC Examples
Thursday - Lecture 8: Advanced Open ACC
Thursday - Lab 8: Using Data Regions
Friday - Lecture 9: Parallelization Techniques and Optimization
Friday - Lab 9: Body Simulation

Science Visualization Course (August 25-26, 2014)

This two-day training will cover all aspects of visualizing data from a broad variety of domains. We will kick off the training with an introduction to visualization. We will follow by teaching best practices when dealing with diverse data (abstract and spatial), demonstrating a variety of methods and techniques on those data sets, and demonstrating a range of freely available software. We will take real world problems for which visualization is needed and take the attendees through the process of visualizing this data and gaining insight.

Registration is open for Data Intensive and GPGPU! Please go to the XSEDE online calendar to register, or click HERE.