Lecture number for WS 2019/20: L.079.05722
All materials (slides, exercises, programming exercises) and current information will be provided via the PANDA lecture management system, see PANDA page for this lecture.
The course comprises three components: lecture, theoretical exercises, and programming exercises/challenges, which will be held during the following time slots:
- Fridays 14:15-15:45, lecture hall O2
- Wednesdays 13:15-15:45, lecture hall SP2.0.201 (changed!)
The lecture will be typically held on Friday and exercises on Wednesday. However, to accomodate for the availability of the lecturer and public holidays, some lectures will be held on Wednesday. Please consult the schedule on the PANDA lecture website for up to date information.
The first lecture will take place on 11 Oct 2019.
The exercises will begin on 16 Oct 2019.
Goals and Contents of the Lecture
The goal of this course is to teach the fundamentals of high-performance computing. That is, we will discuss programming models, languages and frameworks for efficiently using parallel computer sytems. The lecture will be complemented by a considerable amount of practical programming exercises that allow the students to gain practical experiences with programming, performance optimization and debugging parallel computer systems. To this end, the student will get access to the HPC clusters operated by the Paderborn Center for Parallel Computing (PC²).
Parts of the lecture and exercises will losely based on the textbook Peter S. Pacheco, An introduction to Parallel Programming, Morgan Kaufmann publishers, 2011. The book is available online within the Paderborn University Network (use VPN for DFN-AAI for access from outside). The book also comprises a number of code excerpts from programs that illustrate the use of the parallel programming techniques introduced in the book. The source code for these examples is available here.
The tenative list of lecture topics is as follows:
- Patterns of parallel programming (map, reduce, gather, Berkeley dwarfs, ...)
- Relevant fundamentals of computer architecture (SIMD, caches)
- Model for performance (roofline, weak/strong scaling, work-span span, LogP)
- Applications (examples that willl be used throughout the lecture: n-body, stencils, dense and sparse linear algebra, conjugate gradient/matrix multiplication)
- Single node optimization (cache blocking, memory access pattern, vectorization)
- MPI basics (SPMD, communicator, messages, network, two-sided communication block)
- MPI next steps (collectives, one-sided, non-blocking, derived data types)
- OpenMP basics (threading, work-sharing, parallel for, scheduling)
- OpenMP tasking (task, dependencies, taskloops, ...)
- Libraries (BLAS, LAPACK)
- Performance engineering (profiling, bottleneck analysis)
- HW Accelerators (application acceleration with GPUs and/or FPGAs)