Systementwurfs-Teamprojekt
You can find general information on the Systementwurfs-Teamprojekt (SET) at https://cs.uni-paderborn.de/ceg/teaching/courses/ws-201920/systementwurf-teamprojekt/.
Number Crunching with GPU-Tensor Cores
Algorithms in the area of machine learning (ML), especially Deep Leaning, require many linear-algebra operations on multidimensional matrices and vectors. In order to accelerate these computations and increase their efficiency NVidia has introduced so-called tensor cores with their Volta architecture (NVidia Youtube video about tensor cores). This new kind of compute units are especially tailored for matrix-matrix multiplications. One of these GPUs, for example, an NVidia RTX 2080 Ti can perform up to 100 TFlops (1014 floating-point operations per second in single/half precision) with tensor cores while only consuming 250 Watts of power.
At a comparable power usage, state-of-the-art CPU-based compute nodes can only perforn around 6 TFlops (6*1012 floating-point operations per second in single precision).
This project will try to harness the computational power and efficiency of the tensor cores for scientific programs from computational chemistry. Two computational hot-spots in the computational-chemistry code CP2K (https://www.cp2k.org/) have identified been detected and are expected to be suitable for the acceleration with tensor cores.
The project will be advised by an expert from computational chemistry who will handle all chemistry-related details and the integration into the scientific code.
No knowledge of chemistry is required for this project. Several Nvidia GPUs (RTX 2080 Ti) are already available.
Language for this project can be either German or English depending on the participants.
Interests:
- high-performance computing
- GPU-Programming
- programming in general (C or C++ experience is helpful)
- acceleration of scientific applications
- linear algebra
Goals:
- Use tensor cores to accelerate the quantum chemistry code CP2K with NVidia tensor cores on GPUs.
Two hot-spots:- matrix-matrix multiplications for small matrices in the underlying library DBCSR (https://www.cp2k.org/dbcsr)
- computation of the matrix-sign-function in the submatrix method (https://arxiv.org/abs/1710.10899)
- Performance modelling, implementation, optimization and testing of the optimized linear algebra methods
- Publication of results as open source codes
Challenges:
- programming tensor cores with cuBLAS, CUTLASS, or CUDA
- understanding some basic concepts from linear algebra (eigenvalue problems, matrix-sign-function, submatrix method)
Applications:
- NVidia CUDA (https://developer.nvidia.com/cuda-zone)
- NVidia CUTLASS (https://github.com/NVIDIA/cutlass)
- CP2K (https://www.cp2k.org/)
- compiler