Resource optimization and monitoring for Serial Jobs

Overview

Teaching: 30 min
Exercises: 10 min

Questions

How do we optimize and monitor resource usage for sequential jobs on an HPC system?

What tools can we use to profile CPU and memory usage for single-core jobs?

What are the best practices and common pitfalls when submitting sequential scripts?

Objectives

Understand how to allocate appropriate resources for sequential (single-core) jobs.

Learn how to monitor CPU and memory usage of sequential jobs on HPC systems.

Use both custom scripts and tools like htop to profile and optimize job performance.

Example

For understanding how we can utilise different resources available on the HPC for the same computational task, we take the example of a python code which calculates the Gravitational Deflection Angle defined in the following way:

Deflection Angle Formula

For light passing near a massive object, the deflection angle (α) in the weak-field approximation is given by:

α = 4GM / (c²b)

Where:

G = Gravitational constant (6.67430 × 10⁻¹¹ m³ kg⁻¹ s⁻²)
M = Mass of the lensing object (in kilograms)
c = Speed of light (299792458 m/s)
b = Impact parameter (the closest approach distance of the light ray to the mass, in meters)

Computational Task Description

Compute the deflection angle over a grid of:

Mass values: From 1 to 1000 solar masses (10³⁰ to 10³³ kg)
Impact parameters: From 10⁹ to 10¹² meters

Generate a 2D array where each entry corresponds to the deflection angle for a specific pair of mass and impact parameter. Now we will look at how we will implement this for the different resources available on the HPC.

Sequential Job Optimization

Sequential jobs run on a single CPU core and are suitable for tasks that cannot be parallelized. Before that let us again remind ourselves of the structure of a slurm script

Structure of a Slurm Script for a Sequential Job

#!/bin/bash
#SBATCH -J jobname                    # Job name for identification
#SBATCH -o outfile.%J                 # Standard output file (%J = job ID)
#SBATCH -e errorfile.%J               # Standard error file (%J = job ID)
#SBATCH --partition=computes_thin     # Use serial queue for single-core jobs
#SBATCH --nodes=1                     # Serial jobs only require 1 node
#SBATCH --ntasks=1                    # Serial jobs will also require only 1 core
./[programme executable name]         # Execute your program

Example: Gravitational Deflection Angle Sequential CPU

# File Name - example_serial.py
# This script computes the gravitational deflection angle of light around a massive object
# using a nested loop (sequential CPU calculation). It explores a parameter grid of masses
# and impact parameters, saves the computed results to disk, and generates a color plot
# of the deflection angles on a logarithmic scale.

# Import NumPy for numerical array operations, time for measuring execution time, os for appending paths, and matplotlib for plotting the results
import numpy as np
import time
import matplotlib.pyplot as plt
import os
import matplotlib.colors as colors

# Constants
G = 6.67430e-11
c = 299792458
M_sun = 1.98847e30

# Parameter grid
mass_grid = np.linspace(1, 1000, 10000)  # Solar masses
impact_grid = np.linspace(1e9, 1e12, 10000)  # meters

result = np.zeros((len(mass_grid), len(impact_grid)))

# Timing
start = time.time()

# Sequential computation
for i, M in enumerate(mass_grid):
    for j, b in enumerate(impact_grid):
        result[i, j] = (4 * G * M * M_sun) / (c**2 * b)

end = time.time()

print(f"CPU Sequential time: {end - start:.3f} seconds")

result = np.save("result_cpu.npy", result)
mass_grid = np.save("mass_grid_cpu.npy", mass_grid)
impact_grid = np.save("impact_grid_cpu.npy", impact_grid)

# Load data
result = np.load("result_cpu.npy")
mass_grid = np.load("mass_grid_cpu.npy")
impact_grid = np.load("impact_grid_cpu.npy")

# Create meshgrid
M, B = np.meshgrid(mass_grid / 1.989e30, impact_grid / 1e9, indexing='ij')

# Create output directory
os.makedirs("plots", exist_ok=True)

plt.figure(figsize=(8,6))
pcm = plt.pcolormesh(B, M, result,
                      norm=colors.LogNorm(vmin=result[result > 0].min(), vmax=result.max()),
                      shading='auto', cmap='plasma')

plt.colorbar(pcm, label='Deflection Angle (radians, log scale)')
plt.xlabel('Impact Parameter (Gm)')
plt.ylabel('Mass (Solar Masses)')
plt.title('Gravitational Deflection Angle - CPU')

plt.tight_layout()
plt.savefig("plots/deflection_angle_cpu.png", dpi=300)
plt.close()

print("CPU plot saved in 'plots/deflection_angle_cpu.png'")

 CPU Sequential time: 153.965 seconds
 CPU plot saved in 'plots/deflection_angle_cpu.png'

This code simulates gravitational lensing by computing how much light bends when passing near massive objects. It first defines key physical constants, then creates two grids: one for object masses (in solar masses) and one for impact parameters (the distance of closest approach). For every combination of mass and impact parameter, it calculates the deflection angle using the gravitational lensing formula and stores the results in a 2D array. The code measures and prints the runtime to highlight sequential execution speed, saves the computed data for reuse, and finally generates a log-scaled color plot showing how deflection varies with mass and distance, which is stored as an image for visualization.

Job Monitoring and Profiling

We would also want to monitor the resources for the job when we run the job, so we can decide if we allocated the right amount of resources for the job type. For this we will need to create a shell file which logs the CPU and Memory resource usage every five seconds. We can create that file using the code below

#File: monitor_resources.sh
#!/bin/bash
# Monitor CPU% and Memory usage of Python processes for the user (you)
# Saves results in a log file

OUTFILE="resource_usage_${SLURM_JOB_ID}.log"

# Create a header row for the log file
echo "Timestamp | CPU% | Memory(MB)" > "$OUTFILE"

# Repeat until stopped
while true
do
    # ps: shows running processes
    # -u $USER : only show processes owned by you
    # -o %cpu,rss,comm : output CPU%, memory (RSS in KB), and command name
    ps -u $USER -o %cpu,rss,comm \
    | awk '
        $3=="python" {                   # Only lines where command is "python"
            # strftime formats current date/time
            # $1 is CPU%, $2 is memory in KB — divide by 1024 for MB
            print strftime("%Y-%m-%d %H:%M:%S"), "|", $1, "|", $2/1024
        }
    ' >> "$OUTFILE"

    # sleep: pause for 5 seconds before checking again
    sleep 5
done

New Commands and Operators Introduced in this Script

We are using three shell commands and two shell operators:

ps (process status):
- The ps command lists processes running on the system. Here, ps -u $USER restricts the list to processes started by the current user. The option -o %cpu,rss,comm customizes the output to show only CPU usage percentage (%cpu), resident memory size in kilobytes (rss), and the command name (comm) like “python”.
awk:
- awk is a text processing tool that reads each line of input and allows us to filter or reformat it. In this script, we tell awk to only process lines where the third field (the command name) is “python”. It then prints the current timestamp (strftime), the CPU percentage, and memory converted from kilobytes to megabytes ($2/1024).
sleep:
- The sleep command pauses execution for a given number of seconds. Here, sleep 5 makes the script wait 5 seconds before checking the processes again, ensuring we don’t overload the system with constant checks and providing a readable sampling interval.
> (redirect output, overwrite)
- > creates or overwrites a file with the command’s output.
- In this script, it is used once to create the log file and write the header line, replacing any existing file with the same name.
>> (redirect output, append)
- >> appends output to an existing file instead of overwriting it.
- In this script, it is used inside the loop to append each new measurement below the header, so the log grows over time without losing previous entries.

We can now include a command to run this file in the slurm job script that we will use to run the sequential example on BURA.

Sequential Job Script for the Example

#!/bin/bash
#SBATCH --job-name=example_serial # Name of the Job 
#SBATCH --output=serial_%j.out # Name of the output file for the Job
#SBATCH --error=serial_%j.err # Name of the error file for the Job
#SBATCH --partition=computes_thin # Request the appropriate partition for the job 
#SBATCH --nodes=1 # Request the appropriate number of computing nodes required for the job
#SBATCH --ntasks=1 # This specifies how many processes will run across the nodes         
#SBATCH --time=00:10:00 # This specifies the maximum amount of time that the job will run for
#SBATCH --mem=16G # This specifies the amount of memory which will be allocated for the job

# Load required modules (This is a sanity check in case jobs are not running as required)
module list

# Activate your virtual environment (We have already activated this in terminal so this again a sanity check)
source interpython/bin/activate

# Start the resource monitor in the background.
# The "&" symbol is used so the monitor runs simultaneously with the main job instead of blocking it
# The monitor_resources.sh script must be in the same directory as the python file and the slurm script.
bash monitor_resources.sh &

# Run the main sequential job.
python example_serial.py

# Stop the resource monitor after the job finishes.
# "kill %1" is a terminal command which terminates the first background process started in this script. 
# Which in our case is monitor_resources.sh.
kill %1

# Print the date and time when the job completed.
echo "Job completed at $(date)"

# Print the name of the log file which was preapared using the resource monitor script
echo "Resource usage saved to resource_usage_${SLURM_JOB_ID}.log"

After we run the script, we can then cat into the resource_usage_${SLURM_JOB_ID}.log file to view the logged CPU and memory usage over time.

Viewing the Results

To quickly view the contents of the log file:

cat resource_usage_${SLURM_JOB_ID}.log

Timestamp           | CPU% | Memory(MB)
2025-08-17 15:32:01 | 95.0 | 1200
2025-08-17 15:32:06 | 99.2 | 1250
2025-08-17 15:32:11 | 100  | 1268
...

Interpreting the Results

CPU%

For sequential jobs, this should ideally be close to 100%, indicating that the single CPU core is being fully utilized.
If CPU% is much lower, the program may be waiting on I/O (e.g., reading/writing files) or could benefit from algorithmic optimization.

Memory (MB)

Shows how much RAM your job is using at each interval.
Compare the peak memory usage to the memory you requested with --mem.
If memory usage is consistently much lower than the allocation, you may safely reduce --mem in future runs.
If usage is close to or exceeds your allocation, increase --mem to prevent job crashes.

Quick Reference: Interpreting Resource Usage Patterns

Pattern	Meaning	Action to Take
High CPU% (~100%), Low Memory	Code is compute-bound (CPU is the bottleneck, memory not heavily used).	Keep memory request low, focus on algorithmic optimizations or parallelization.
Low CPU%, Low Memory	Code is I/O-bound (waiting on file reads/writes or network communication).	Optimize data access, use faster storage, or reduce unnecessary file operations.
High CPU%, High Memory	Code is both compute- and memory-intensive.	Ensure enough memory is requested (`--mem`), and consider algorithm/data structure optimizations.
Low CPU%, High Memory	Code is memory-bound (spending more time managing memory than doing compute).	Increase memory allocation, or optimize memory usage in the code (e.g., chunking large arrays).
Fluctuating CPU%, Stable Memory	Workload alternates between compute and idle states.	Check for inefficient loops or waiting on external processes; consider restructuring workload.
Stable CPU%, Growing Memory	Memory leak (usage increases steadily without bound).	Debug the code, check for objects/arrays not being freed, or optimize memory handling.

Why This Matters

Efficient allocation: Avoid over-requesting (slower queue times) or under-requesting (job failures).
System fairness: Using only what you need helps the scheduler place your jobs more efficiently.
Debugging: Sudden spikes in memory or drops in CPU can reveal inefficiencies or bugs in your program.

Best Practices and Common Pitfalls for Resource Allocation for Sequential Scripts

Resource Allocation Best Practices

Request only 1 core
- Sequential jobs run on a single core, so always set --cpus-per-task=1.
- Requesting more cores will not speed up the job and only wastes resources.
Request memory proportional to workload
- Estimate memory usage (for data arrays, grids, etc.) and add a small safety margin.
- Example: If job needs ~10 GB, request --mem=12G, not --mem=64G.
Use appropriate partitions/queues
- Submit sequential jobs to the serial or thin partitions if available, instead of compute-intensive queues.
Start with test runs
- Run with smaller problem sizes or shorter times first.
- Check logs and resource usage before scaling to full workloads.
Monitor and refine
- Use tools like htop, time, or resource monitoring scripts to profile performance.
- Adjust memory and runtime allocations based on measured usage.

Common Pitfalls for Sequential Jobs

Over-requesting resources

   # Bad: Requesting 32 cores for sequential code
   #SBATCH --cpus-per-task=32
   ./sequential_program
   
   # Good: Match core count to parallelization
   #SBATCH --cpus-per-task=1
   ./sequential_program

Key Points

Sequential jobs only use a single CPU core, so requesting multiple cores wastes resources.

Monitoring resource usage helps match allocation to actual requirements.

htop provides a quick, interactive way to view CPU and memory consumption.

Always start with small test runs and scale resources based on profiling results.

previous episode

Introduction to High Performance Computing for astronomical software development

next episode