Lecture number for WS 2019/20: L.079.05722

All materials (slides, exercises, programming exercises) and current information will be provided via the PANDA lecture management system, see PANDA page for this lecture.

## Organization

The course comprises three components: lecture, theoretical exercises, and programming exercises/challenges, which will be held during the following time slots:

- Fridays 14:15-15:45, lecture hall O2
- Wednesdays 13:15-15:45,
**lecture hall SP2.0.201 (changed!)**

The lecture will be typically held on Friday and exercises on Wednesday. However, to accomodate for the availability of the lecturer and public holidays, some lectures will be held on Wednesday. Please consult the schedule on the PANDA lecture website for up to date information.

The **first lecture** will take place on **11 Oct 2019**.

The **exercises will begin on 16 Oct 2019**.

## Goals and Contents of the Lecture

The goal of this course is to teach the fundamentals of high-performance computing. That is, we will discuss programming models, languages and frameworks for efficiently using parallel computer sytems. The lecture will be complemented by a considerable amount of practical programming exercises that allow the students to gain practical experiences with programming, performance optimization and debugging parallel computer systems. To this end, the student will get access to the HPC clusters operated by the Paderborn Center for Parallel Computing (PC²).

Parts of the lecture and exercises will losely based on the textbook Peter S. Pacheco, *An introduction to Parallel Programming*, Morgan Kaufmann publishers, 2011. The book is available online within the Paderborn University Network (use VPN for DFN-AAI for access from outside). The book also comprises a number of code excerpts from programs that illustrate the use of the parallel programming techniques introduced in the book. The source code for these examples is available here.

The tenative list of lecture topics is as follows:

- Introduction
- Patterns of parallel programming (map, reduce, gather, Berkeley dwarfs, ...)
- Relevant fundamentals of computer architecture (SIMD, caches)
- Model for performance (roofline, weak/strong scaling, work-span span, LogP)
- Applications (examples that willl be used throughout the lecture: n-body, stencils, dense and sparse linear algebra, conjugate gradient/matrix multiplication)
- Single node optimization (cache blocking, memory access pattern, vectorization)
- MPI basics (SPMD, communicator, messages, network, two-sided communication block)
- MPI next steps (collectives, one-sided, non-blocking, derived data types)
- OpenMP basics (threading, work-sharing, parallel for, scheduling)
- OpenMP tasking (task, dependencies, taskloops, ...)
- Libraries (BLAS, LAPACK)
- Performance engineering (profiling, bottleneck analysis)
- HW Accelerators (application acceleration with GPUs and/or FPGAs)

## Theoretical and Programming Exercises

This course aims at teaching the foundations but also practicals skills in high-performance computing. Hence, the lecture will be complemented with theoretical exercises and programming exercises for all topics covered in the class.

The theoretical exercises will be discussed in the exercises session but the they will not be corrected and graded.

The programming exercises will also be discussed in the exercise sessions and teaching assistants will support the students with solving the programming tasks. Additionally, we will run 2-3 competitions, where teams of up to three students can submit solutions for a more complex programming example and compete to reach the best performance.

Participation in the exercises is highly encouraged but neither mandatory for being admitted to the exam, nor affecting the grade of the exam.

## Prerequisites

This lecture assumes that you are familiar with the foundations of modern computer architecture and performance oriented, low-level programming in C and basic C++. You can check whether you have the required prerequisites in an online self-assessment test.

## Exam

The exam will cover all topics covered in the lecture (slides, classroom examples) and exercises. Due to the high number of participants it will be held as a written exam of 90 minutes.