Resource optimization and monitoring for Serial Jobs
Overview
Teaching: 30 min
Exercises: 10 minQuestions
How do we optimize and monitor resource usage for sequential jobs on an HPC system?
What tools can we use to profile CPU and memory usage for single-core jobs?
What are the best practices and common pitfalls when submitting sequential scripts?
Objectives
Understand how to allocate appropriate resources for sequential (single-core) jobs.
Learn how to monitor CPU and memory usage of sequential jobs on HPC systems.
Use both custom scripts and tools like
htop
to profile and optimize job performance.
Example
For understanding how we can utilise different resources available on the HPC for the same computational task, we take the example of a python code which calculates the Gravitational Deflection Angle defined in the following way:
Deflection Angle Formula
For light passing near a massive object, the deflection angle (α) in the weak-field approximation is given by:
α = 4GM / (c²b)
Where:
- G = Gravitational constant (6.67430 × 10⁻¹¹ m³ kg⁻¹ s⁻²)
- M = Mass of the lensing object (in kilograms)
- c = Speed of light (299792458 m/s)
- b = Impact parameter (the closest approach distance of the light ray to the mass, in meters)
Computational Task Description
Compute the deflection angle over a grid of:
- Mass values: From 1 to 1000 solar masses (10³⁰ to 10³³ kg)
- Impact parameters: From 10⁹ to 10¹² meters
Generate a 2D array where each entry corresponds to the deflection angle for a specific pair of mass and impact parameter. Now we will look at how we will implement this for the different resources available on the HPC.
Sequential Job Optimization
Sequential jobs run on a single CPU core and are suitable for tasks that cannot be parallelized. Before that let us again remind ourselves of the structure of a slurm script
Structure of a Slurm Script for a Sequential Job
#!/bin/bash
#SBATCH -J jobname # Job name for identification
#SBATCH -o outfile.%J # Standard output file (%J = job ID)
#SBATCH -e errorfile.%J # Standard error file (%J = job ID)
#SBATCH --partition=computes_thin # Use serial queue for single-core jobs
#SBATCH --nodes=1 # Serial jobs only require 1 node
#SBATCH --ntasks=1 # Serial jobs will also require only 1 core
./[programme executable name] # Execute your program
Example: Gravitational Deflection Angle Sequential CPU
# File Name - example_serial.py
# This script computes the gravitational deflection angle of light around a massive object
# using a nested loop (sequential CPU calculation). It explores a parameter grid of masses
# and impact parameters, saves the computed results to disk, and generates a color plot
# of the deflection angles on a logarithmic scale.
# Import NumPy for numerical array operations, time for measuring execution time, os for appending paths, and matplotlib for plotting the results
import numpy as np
import time
import matplotlib.pyplot as plt
import os
import matplotlib.colors as colors
# Constants
G = 6.67430e-11
c = 299792458
M_sun = 1.98847e30
# Parameter grid
mass_grid = np.linspace(1, 1000, 10000) # Solar masses
impact_grid = np.linspace(1e9, 1e12, 10000) # meters
result = np.zeros((len(mass_grid), len(impact_grid)))
# Timing
start = time.time()
# Sequential computation
for i, M in enumerate(mass_grid):
for j, b in enumerate(impact_grid):
result[i, j] = (4 * G * M * M_sun) / (c**2 * b)
end = time.time()
print(f"CPU Sequential time: {end - start:.3f} seconds")
result = np.save("result_cpu.npy", result)
mass_grid = np.save("mass_grid_cpu.npy", mass_grid)
impact_grid = np.save("impact_grid_cpu.npy", impact_grid)
# Load data
result = np.load("result_cpu.npy")
mass_grid = np.load("mass_grid_cpu.npy")
impact_grid = np.load("impact_grid_cpu.npy")
# Create meshgrid
M, B = np.meshgrid(mass_grid / 1.989e30, impact_grid / 1e9, indexing='ij')
# Create output directory
os.makedirs("plots", exist_ok=True)
plt.figure(figsize=(8,6))
pcm = plt.pcolormesh(B, M, result,
norm=colors.LogNorm(vmin=result[result > 0].min(), vmax=result.max()),
shading='auto', cmap='plasma')
plt.colorbar(pcm, label='Deflection Angle (radians, log scale)')
plt.xlabel('Impact Parameter (Gm)')
plt.ylabel('Mass (Solar Masses)')
plt.title('Gravitational Deflection Angle - CPU')
plt.tight_layout()
plt.savefig("plots/deflection_angle_cpu.png", dpi=300)
plt.close()
print("CPU plot saved in 'plots/deflection_angle_cpu.png'")
CPU Sequential time: 153.965 seconds
CPU plot saved in 'plots/deflection_angle_cpu.png'
This code simulates gravitational lensing by computing how much light bends when passing near massive objects. It first defines key physical constants, then creates two grids: one for object masses (in solar masses) and one for impact parameters (the distance of closest approach). For every combination of mass and impact parameter, it calculates the deflection angle using the gravitational lensing formula and stores the results in a 2D array. The code measures and prints the runtime to highlight sequential execution speed, saves the computed data for reuse, and finally generates a log-scaled color plot showing how deflection varies with mass and distance, which is stored as an image for visualization.
Job Monitoring and Profiling
We would also want to monitor the resources for the job when we run the job, so we can decide if we allocated the right amount of resources for the job type. For this we will need to create a shell file which logs the CPU and Memory resource usage every five seconds. We can create that file using the code below
#File: monitor_resources.sh
#!/bin/bash
# Monitor CPU% and Memory usage of Python processes for the user (you)
# Saves results in a log file
OUTFILE="resource_usage_${SLURM_JOB_ID}.log"
# Create a header row for the log file
echo "Timestamp | CPU% | Memory(MB)" > "$OUTFILE"
# Repeat until stopped
while true
do
# ps: shows running processes
# -u $USER : only show processes owned by you
# -o %cpu,rss,comm : output CPU%, memory (RSS in KB), and command name
ps -u $USER -o %cpu,rss,comm \
| awk '
$3=="python" { # Only lines where command is "python"
# strftime formats current date/time
# $1 is CPU%, $2 is memory in KB — divide by 1024 for MB
print strftime("%Y-%m-%d %H:%M:%S"), "|", $1, "|", $2/1024
}
' >> "$OUTFILE"
# sleep: pause for 5 seconds before checking again
sleep 5
done
New Commands and Operators Introduced in this Script
We are using three shell commands and two shell operators:
ps
(process status):- The ps command lists processes running on the system. Here, ps -u $USER restricts the list to processes started by the current user. The option -o %cpu,rss,comm customizes the output to show only CPU usage percentage (%cpu), resident memory size in kilobytes (rss), and the command name (comm) like “python”.
awk
:- awk is a text processing tool that reads each line of input and allows us to filter or reformat it. In this script, we tell awk to only process lines where the third field (the command name) is “python”. It then prints the current timestamp (strftime), the CPU percentage, and memory converted from kilobytes to megabytes ($2/1024).
sleep
:- The sleep command pauses execution for a given number of seconds. Here, sleep 5 makes the script wait 5 seconds before checking the processes again, ensuring we don’t overload the system with constant checks and providing a readable sampling interval.
>
(redirect output, overwrite)>
creates or overwrites a file with the command’s output.- In this script, it is used once to create the log file and write the header line, replacing any existing file with the same name.
>>
(redirect output, append)>>
appends output to an existing file instead of overwriting it.- In this script, it is used inside the loop to append each new measurement below the header, so the log grows over time without losing previous entries.
We can now include a command to run this file in the slurm job script that we will use to run the sequential example on BURA.
Sequential Job Script for the Example
#!/bin/bash
#SBATCH --job-name=example_serial # Name of the Job
#SBATCH --output=serial_%j.out # Name of the output file for the Job
#SBATCH --error=serial_%j.err # Name of the error file for the Job
#SBATCH --partition=computes_thin # Request the appropriate partition for the job
#SBATCH --nodes=1 # Request the appropriate number of computing nodes required for the job
#SBATCH --ntasks=1 # This specifies how many processes will run across the nodes
#SBATCH --time=00:10:00 # This specifies the maximum amount of time that the job will run for
#SBATCH --mem=16G # This specifies the amount of memory which will be allocated for the job
# Load required modules (This is a sanity check in case jobs are not running as required)
module list
# Activate your virtual environment (We have already activated this in terminal so this again a sanity check)
source interpython/bin/activate
# Start the resource monitor in the background.
# The "&" symbol is used so the monitor runs simultaneously with the main job instead of blocking it
# The monitor_resources.sh script must be in the same directory as the python file and the slurm script.
bash monitor_resources.sh &
# Run the main sequential job.
python example_serial.py
# Stop the resource monitor after the job finishes.
# "kill %1" is a terminal command which terminates the first background process started in this script.
# Which in our case is monitor_resources.sh.
kill %1
# Print the date and time when the job completed.
echo "Job completed at $(date)"
# Print the name of the log file which was preapared using the resource monitor script
echo "Resource usage saved to resource_usage_${SLURM_JOB_ID}.log"
After we run the script, we can then cat
into the resource_usage_${SLURM_JOB_ID}.log
file to view the logged CPU and memory usage over time.
Viewing the Results
To quickly view the contents of the log file:
cat resource_usage_${SLURM_JOB_ID}.log
Timestamp | CPU% | Memory(MB)
2025-08-17 15:32:01 | 95.0 | 1200
2025-08-17 15:32:06 | 99.2 | 1250
2025-08-17 15:32:11 | 100 | 1268
...
Interpreting the Results
CPU%
- For sequential jobs, this should ideally be close to 100%, indicating that the single CPU core is being fully utilized.
- If CPU% is much lower, the program may be waiting on I/O (e.g., reading/writing files) or could benefit from algorithmic optimization.
Memory (MB)
- Shows how much RAM your job is using at each interval.
- Compare the peak memory usage to the memory you requested with
--mem
. - If memory usage is consistently much lower than the allocation, you may safely reduce
--mem
in future runs. - If usage is close to or exceeds your allocation, increase
--mem
to prevent job crashes.
Quick Reference: Interpreting Resource Usage Patterns
Pattern | Meaning | Action to Take |
---|---|---|
High CPU% (~100%), Low Memory | Code is compute-bound (CPU is the bottleneck, memory not heavily used). | Keep memory request low, focus on algorithmic optimizations or parallelization. |
Low CPU%, Low Memory | Code is I/O-bound (waiting on file reads/writes or network communication). | Optimize data access, use faster storage, or reduce unnecessary file operations. |
High CPU%, High Memory | Code is both compute- and memory-intensive. | Ensure enough memory is requested (--mem ), and consider algorithm/data structure optimizations. |
Low CPU%, High Memory | Code is memory-bound (spending more time managing memory than doing compute). | Increase memory allocation, or optimize memory usage in the code (e.g., chunking large arrays). |
Fluctuating CPU%, Stable Memory | Workload alternates between compute and idle states. | Check for inefficient loops or waiting on external processes; consider restructuring workload. |
Stable CPU%, Growing Memory | Memory leak (usage increases steadily without bound). | Debug the code, check for objects/arrays not being freed, or optimize memory handling. |
Why This Matters
- Efficient allocation: Avoid over-requesting (slower queue times) or under-requesting (job failures).
- System fairness: Using only what you need helps the scheduler place your jobs more efficiently.
- Debugging: Sudden spikes in memory or drops in CPU can reveal inefficiencies or bugs in your program.
Best Practices and Common Pitfalls for Resource Allocation for Sequential Scripts
Resource Allocation Best Practices
- Request only 1 core
- Sequential jobs run on a single core, so always set
--cpus-per-task=1
. - Requesting more cores will not speed up the job and only wastes resources.
- Sequential jobs run on a single core, so always set
- Request memory proportional to workload
- Estimate memory usage (for data arrays, grids, etc.) and add a small safety margin.
- Example: If job needs ~10 GB, request
--mem=12G
, not--mem=64G
.
- Use appropriate partitions/queues
- Submit sequential jobs to the
serial
orthin
partitions if available, instead of compute-intensive queues.
- Submit sequential jobs to the
- Start with test runs
- Run with smaller problem sizes or shorter times first.
- Check logs and resource usage before scaling to full workloads.
- Monitor and refine
- Use tools like
htop
,time
, or resource monitoring scripts to profile performance. - Adjust memory and runtime allocations based on measured usage.
- Use tools like
Common Pitfalls for Sequential Jobs
Over-requesting resources
# Bad: Requesting 32 cores for sequential code
#SBATCH --cpus-per-task=32
./sequential_program
# Good: Match core count to parallelization
#SBATCH --cpus-per-task=1
./sequential_program
Key Points
Sequential jobs only use a single CPU core, so requesting multiple cores wastes resources.
Monitoring resource usage helps match allocation to actual requirements.
htop
provides a quick, interactive way to view CPU and memory consumption.Always start with small test runs and scale resources based on profiling results.