This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to High Performance Computing for astronomical software development: Glossary

Key Points

Setting the Scene
  • Astronomical research requires large computing resources, which are not always available within a PC form-factor.

  • In order to run your code in a High-Performance Computing setting, special tools and techniques are needed.

Section 1: HPC basics
  • The topics covered in this section are HPC intro, Bura HPC facility and Slurm workload manager.

HPC Intro
  • Communication between different computer components, such as memory and arithmetic/logic unit (ALU), and between different nodes (computers) in a cluster, is often the main bottleneck of the system.

  • Modern supercomputers are usually assembled from the same parts as personal computers, however, the difference is in the numbers of CPUs, GPUs and memory units, and in the way how they are connected to one another.

  • Data storage organization varies from one HPC facility to another, so it is necessary to consult documentation when starting the work on a new supercomputer or cluster.

  • Login nodes must not be used for computationally heavy tasks, as it will slow down the work for all users of the cluster.

LSST HPC facilities and opportunities
  • Most of the LSST in-kind contributions are IDACs, whose primary function is to provide access to the data products, not run HPC.

  • Several of the IDACs are based on the already existing computational facilities that do have multiple CPU cores and occasionally GPUs, which may be accessible to the LSST data right holders.

  • The choice of an HPC facility for your project depends on which datasets you need, whether you can benefit from utilising GPUs, and whether the facility has or allows an easy installation of the necessary dependencies.

Bura access
  • Bura is a powerful supercomputer with CPU, GPGPU and SMP components

  • Bura can be accessed via a portal through a web-browser, or by installing VPN software and an SSH client

Command line basics
  • Shell skills enable efficient navigation and manipulation local and remote file systems

  • The shell can be used to identify who you are and what you have access to

  • The shell can be used to determine what is happening on a system and how you are using the system

Bura Setup
  • HPC systems use environment modules to manage shared software.

  • Use module avail and module spider to find software.

  • Use module load to add software to your environment and module purge to remove it.

  • Loaded modules are temporary and reset when you log out.

  • Python virtual environments (venv) isolate your project’s dependencies.

  • Always activate a virtual environment before installing packages with pip.

Introduction to Slurm workload manager
  • Slurm is a system for managing computing clusters and scheduling computing jobs

  • Slurm provides a set of commands which can configure, submit and control jobs on the cluster from the commandline

  • Jobs can be parallelized using scripts which provide the configuration and commands to be run

Section 2: Running code on Bura
  • We will rely on practical exercises to learn what different modes of program execution look like in real life and which tools we can use for performance analysis.

Intro code examples
  • Serial code is limited to a single thread of execution, while parallel code uses multiple cores or nodes.

Parallelising our code for CPU
  • Serial code is limited to a single thread of execution, while parallel code uses multiple cores or nodes.

  • OpenMP and MPI are popular for parallel CPU programming; CUDA is used for GPU programming.

  • High-level libraries like Numba and CuPy make GPU acceleration accessible from Python.

Implementing code examples for running on GPU
  • Serial code is limited to a single thread of execution, while parallel code uses multiple cores or nodes.

  • OpenMP and MPI are popular for parallel CPU programming; CUDA is used for GPU programming.

  • High-level libraries like Numba and CuPy make GPU acceleration accessible from Python.

Resource requirements
  • Different computational models (sequential, parallel, GPU) significantly impact runtime and efficiency.

  • Sequential CPU execution is simple but inefficient for large parameter spaces.

  • Parallel CPU (e.g., MPI or OpenMP) reduces runtime by distributing tasks but is limited by CPU core counts and communication overhead.

  • GPU computing can drastically accelerate tasks with massively parallel workloads like grid-based simulations.

  • Choosing the right computational model depends on the problem structure, resource availability, and cost-efficiency.

  • Effective Slurm job scripts should match the workload to the hardware: CPUs for serial/parallel, GPUs for highly parallelizable tasks.

  • Monitoring tools (like nvidia-smi, seff, top) help validate whether the resource request matches the actual usage.

  • Optimizing resource usage minimizes wait times in shared environments and improves overall throughput.

Resource optimization and monitoring for Serial Jobs
  • Sequential jobs only use a single CPU core, so requesting multiple cores wastes resources.

  • Monitoring resource usage helps match allocation to actual requirements.

  • htop provides a quick, interactive way to view CPU and memory consumption.

  • Always start with small test runs and scale resources based on profiling results.

Resource optimization and monitoring for Parallel Jobs
  • Match --nodes, --ntasks, and --cpus-per-task to the parallelism strategy (MPI vs OpenMP).

  • Avoid over-requesting resources—requesting more cores than used wastes allocations.

  • Monitor CPU and memory usage during job execution to guide resource tuning.

Wrap-up
  • This course teaches the basics of HPC, however, the topic itself is vast and may take a long time to master.

Glossary