This lesson is still being designed and assembled (Pre-Alpha version)

Command line basics

Overview

Teaching: XX min
Exercises: YY min
Questions
  • What command line skills do I need to work with data on High Performing Computing (HPC)?

Objectives
  • Learn essential CLI commands used in data management and processing on HPC

The top 10 basic commands to learn

CLI stands for Command Line Interface.

It is a way to interact with a computer program by typing text commands into a terminal or console window, instead of using a graphical user interface (GUI) with buttons and menus.

When working with large datasets, pipeline logs, and configuration files — mastering the command line is essential. Whether you’re navigating a High Performance Computing (HPC) repo, inspecting files, or debugging processing failures, these Unix commands will be indispensable.

The following are general-purpose commands, and we may add LSST-specific notes where applicable.

Working with LSST data often involves accessing large-scale datasets stored in hierarchical directories, using symbolic links for shared data, and scripting reproducible data analysis pipelines. These are the fundamental commands every LSST astronomer should know.

File Preparation:needed to run for later exercsies

# Make a dummy data directory and populate it
mkdir -p 1.IntroHPC/1.CLI
echo "dummy input" > 1.IntroHPC/1.CLI/test.in
echo "file list" > 1.IntroHPC/1.CLI/test.files
touch 1.IntroHPC/1.CLI/14si.pspnc

Directory and File Operations

Setup (run once before these examples): ```bash

mkdir -p lsst_data/raw cd lsst_data touch image01.fits echo “instrument: LATISS” > config.yaml echo -e “INFO: Init\nFATAL: Calibration failed” > job.log ```

ls

List contents of a directory. Useful flags:

$ ls -alF

pwd, cd

To check and change the current directory:

$ pwd
$ cd /lsst_data/raw

mkdir, tree

Create directories and visualize structure:

$ mkdir -p repo/gen3/raw/20240101
$ tree repo/gen3

File Manipulation

cp, mv, rm

Basic operations:

$ cp image01.fits image02.fits
$ mv image02.fits image_raw.fits
$ rm image_raw.fits

ln

Create symbolic links to avoid data duplication:

$ ln -s /datasets/lsst/raw/image01.fits ./image01.fits

Viewing and Extracting Data

cat, less, grep

View and search YAML config or log files:

$ cat config.yaml
$ less job.log
$ grep "FATAL" job.log

Permissions and Metadata

chmod, chown, stat

Manage and inspect file attributes:

$ chmod 644 config.yaml
$ stat image01.fits

LSST-Specific Use Cases

Familiarity with bash, grep, find, and awk will accelerate your workflow.


Exercises

Exercise 1: Set up LSST-style directory

  1. Create a folder structure:
    lsst_cli/
    ├── visit001/
    │   ├── raw/
    │   ├── calexp/
    │   └── logs/
    ├── visit002/
    │   ├── raw/
    │   ├── calexp/
    │   └── logs/
    
  2. Populate each raw/ with image01.fits, and symbolic link to calexp.fits in calexp/.

  3. Add a process.yaml and log file in each logs/.

Use tree to verify.

Exercise 2: Analyze Logs

Using grep and less, identify all lines with “WARNING” or “FATAL” in the log files across visits.


Further Learning

Explore additional CLI tools:

ls

List all the files in a directory. Linux as many Operating Systems organize files in files and directories (also called folders).

$ ls
file0a  file0b  folder1  folder2 link0a  link2a

Some terminal offer color output so you can differentiate normal files from folders. You can make the difference more clear with this

$ ls -aCF
./  ../  file0a  file0b  folder1/  folder2/ link0a@  link2a@

You will see a two extra directories "." and "..". Those are special folders that refer to the current folder and the folder up in the tree. Directories have the suffix "/". Symbolic links, kind of shortcuts to other files or directories are indicated with the symbol "@".

Another option to get more information about the files in the system is:

$ ls -al
total 16
drwxr-xr-x    5 andjelka  staff   160 Jun 16 08:53 .
drwxr-xr-x+ 273 andjelka  staff  8736 Jun 16 08:52 ..
-rw-r--r--    1 andjelka  staff    19 Jun 16 08:53 config.yaml
-rw-r--r--    1 andjelka  staff     0 Jun 16 08:53 image01.fits
-rw-r--r--    1 andjelka  staff    37 Jun 16 08:53 job.log

Those characters on the first column indicate the permissions. The first character will be “d” for directories, “l” for symbolic links and “-“ for normal files. The next 3 characters are the permissions for “read”, “write” and “execute” for the owner. The next 3 are for the group, and the final 3 are for others. The meaning of “execute” for a file indicates that the file could be a script or binary executable. For a directory it means that you can see its contents.

cp

This command copies the contents of one file into another file. For example

$ cp file0b file0c

rm

This command deletes the contents of one file. For example

$ rm file0c

There is no such thing like a trash folder on a HPC system. Deleting a file should be consider an irreversible operation.

Recursive deletes can be done with

$ rm -rf folder_to_delete

Be extremely cautious deleting files recursively. You cannot damage the system as the files that you do not own you cannot delete. However, you can delete all your files forever.

mv

This command moves a files from one directory to another. It also can be used to rename files or directories.

$ mv file0b file0c

pwd

It is easy to get lost when you move in complex directory structures. pwd will tell you the current directory.

$ pwd
/Users/andjelka/Documents/LSST/interpython/interpython_hpc

cd

This command moves you to the directory indicated as an argument, if no argument is given, it returns to your home directory.

$ cd folder1

cat and tac

When you want to see the contents of a text file, the command cat displays the contents on the screen. It is also useful when you want to concatenate the contents of several files.

$ cat star_A_lc.csv
time,brightness
0.0,90.5
0.5,91.1
1.0,88.9
1.5,92.2
2.0,89.3
2.5,90.8
3.0,87.7...

To concatenate files you need to use the symbol ">" indicating that you want to redirect the output of a command into a file

$ cat file1 file2 file3 > file_all

The command tac shows the files in reverse starting from the last line back to the first one.

more and less

Sometimes text files, as those created as product of simulations are too large to be seen in one screen, the command “more” shows the files one screen at a time. The command "less" offers more functionality and should be the tool of choice to see large text files.

$ less OUT

ln

This command allow to create links between files. Used wisely could help you save time when traveling frequently to deep directories. By default it creates hard links. Hard links are like copies, but they make references to the same place in disk. Symbolic links are better in many cases because you can cross file systems and partitions. To create a symbolic link

$ ln -s file1 link_to_file1

grep

The grep command extract from its input the lines containing a specified string or regular expression. It is a powerful command for extracting specific information from large files. Consider for example

$ grep time  star_A_lc.csv
time,brightness
$ grep 88.9  star_A_lc.csv
1.0,88.9
  ...

Create a light curve directory with empty csv files created with touch command/use provided csv files:

mkdir -p lightcurves
cd lightcurves
touch star_A_lc.csv star_B_lc.csv star_C_lc.csv
ln -s star_A_lc.csv brightest_star.csv

ls – List Light Curve Files

List files:

$ ls
star_A_lc.csv  star_B_lc.csv  star_C_lc.csv  brightest_star.csv

Use -F and -a for extra detail:

$ ls -aF
./  ../  star_A_lc.csv  star_B_lc.csv  star_C_lc.csv  brightest_star.csv@

Long format with metadata:

$ ls -al
-rw-r--r--  1 user  staff  1024 Jun 16 09:00 star_A_lc.csv
lrwxr-xr-x  1 user  staff    15 Jun 16 09:01 brightest_star.csv -> star_A_lc.csv

cp – Copy a Light Curve File

$ cp star_B_lc.csv backup_star_B.csv

rm – Delete a Corrupted Light Curve

$ rm star_C_lc.csv

mv – Rename Light Curve

$ mv star_B_lc.csv star_B_epoch1.csv

pwd – Show Working Directory

$ pwd
/home/user/...../lightcurves

cd – Move between directroies

$ cd ../images

cat and tac – Inspect or Reverse Light Curve

cat star_A_lc.csv
tac star_A_lc.csv

Combine curves:

cat star_A_lc.csv star_B_epoch1.csv > merged_lc.csv

more and less – View Long Curves

$ less star_A_lc.csv

ln – Create Alias for Light Curve

ln -s star_B_epoch1.csv variable_star.csv

grep – Extract Brightness Above Threshold

grep ',[89][0-9]\.[0-9]*' star_A_lc.csv

Regular expressions offers ways to specified text strings that could vary in several ways and allow commands such as grep to extract those strings efficiently. We will see more about regular expressions on our third day devoted to data processing.

More commands

The 10 commands above, will give you enough tools to move files around and travel the directory tree. The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system. These are the core utilities which are expected to exist on every operating system.

If you want to know about the whole set of coreutils execute:

info coreutils

Each command has its own manual. You can access those manuals with

man <COMMAND>

Output of entire files

cat                    Concatenate and write files
tac                    Concatenate and write files in reverse
nl                     Number lines and write files
od                     Write files in octal or other formats
base64                 Transform data into printable data

Formatting file contents

fmt                    Reformat paragraph text
numfmt                 Reformat numbers
pr                     Paginate or columnate files for printing
fold                   Wrap input lines to fit in specified width

Output of parts of files

head                   Output the first part of files
tail                   Output the last part of files
split                  Split a file into fixed-size pieces
csplit                 Split a file into context-determined pieces

Summarizing files

wc                     Print newline, word, and byte counts
sum                    Print checksum and block counts
cksum                  Print CRC checksum and byte counts
md5sum                 Print or check MD5 digests
sha1sum                Print or check SHA-1 digests
sha2 utilities                   Print or check SHA-2 digests

Operating on sorted files

sort                   Sort text files
shuf                   Shuffle text files
uniq                   Uniquify files
comm                   Compare two sorted files line by line
ptx                    Produce a permuted index of file contents
tsort                  Topological sort

Operating on fields

cut                    Print selected parts of lines
paste                  Merge lines of files
join                   Join lines on a common field

Operating on characters

tr                     Translate, squeeze, and/or delete characters
expand                 Convert tabs to spaces
unexpand               Convert spaces to tabs

Directory listing

ls                     List directory contents
dir                    Briefly list directory contents
vdir                   Verbosely list directory contents
dircolors              Color setup for 'ls'

Basic operations

cp                     Copy files and directories
dd                     Convert and copy a file
install                Copy files and set attributes
mv                     Move (rename) files
rm                     Remove files or directories
shred                  Remove files more securely

Special file types

link                   Make a hard link via the link syscall
ln                     Make links between files
mkdir                  Make directories
mkfifo                 Make FIFOs (named pipes)
mknod                  Make block or character special files
readlink               Print value of a symlink or canonical file name
rmdir                  Remove empty directories
unlink                 Remove files via unlink syscall

Changing file attributes

chown                  Change file owner and group
chgrp                  Change group ownership
chmod                  Change access permissions
touch                  Change file timestamps

Disk usage

df                     Report file system disk space usage
du                     Estimate file space usage
stat                   Report file or file system status
sync                   Synchronize data on disk with memory
truncate               Shrink or extend the size of a file

Printing text

echo                   Print a line of text
printf                 Format and print data
yes                    Print a string until interrupted

Conditions

false                  Do nothing, unsuccessfully
true                   Do nothing, successfully
test                   Check file types and compare values
expr                   Evaluate expressions
tee                    Redirect output to multiple files or processes

File name manipulation

basename               Strip directory and suffix from a file name
dirname                Strip last file name component
pathchk                Check file name validity and portability
mktemp                 Create temporary file or directory
realpath               Print resolved file names

Working context

pwd                    Print working directory
stty                   Print or change terminal characteristics
printenv               Print all or some environment variables
tty                    Print file name of terminal on standard input

User information

id                     Print user identity
logname                Print current login name
whoami                 Print effective user ID
groups                 Print group names a user is in
users                  Print login names of users currently logged in
who                    Print who is currently logged in

System context

arch                   Print machine hardware name
date                   Print or set system date and time
nproc                  Print the number of processors
uname                  Print system information
hostname               Print or set system name
hostid                 Print numeric host identifier
uptime                 Print system uptime and load

Modified command

chroot                 Run a command with a different root directory
env                    Run a command in a modified environment
nice                   Run a command with modified niceness
nohup                  Run a command immune to hangups
stdbuf                 Run a command with modified I/O buffering
timeout                Run a command with a time limit

Process control

kill                   Sending a signal to processes

Delaying

sleep                  Delay for a specified time

Numeric operations

factor                 Print prime factors
seq                    Print numeric sequences

Exercise: Using the Command Line Interface

  1. Create 4 folders A, B, C, D and inside each of them create a three more: X, Y and Z. At the end you should have 12 subfolders. Use the command tree to ensure you create the correct tree.

Solution

You should get:

$ tree
.
├── A
│   ├── X
│   ├── Y
│   └── Z
├── B
│   ├── X
│   ├── Y
│   └── Z
├── C
│   ├── X
│   ├── Y
│   └── Z
└── D
   ├── X
   ├── Y
   └── Z
  1. Lets copy some files in those folders. From the data folder lightcurve and two csv files 1.IntroHPC/1.CLI, there are 3 files t17.in, t17.files and 14si.pspnc. Using the command line tools create copies of “t17.in” and “t17.files” inside each of those folders and symbolic link for 14si.pspnc. Both “t17.in” and “t17.files” are text files that we want to edit, but 14si.pspnc is just a relatively big file that we just need to use for the simulation, we do not want to make copies of if, just symbolic links and save disk space.

Solution

Step-by-step CLI commands:

# Step 1: Create the main folders
mkdir -p A/X A/Y A/Z B/X B/Y B/Z C/X C/Y C/Z D/X D/Y D/Z

# Step 2: Confirm structure
tree

Output should be:

.
├── A
│   ├── X
│   ├── Y
│   └── Z
├── B
│   ├── X
│   ├── Y
│   └── Z
├── C
│   ├── X
│   ├── Y
│   └── Z
└── D
    ├── X
    ├── Y
    └── Z

File Preparation:

# Make a dummy data directory and populate it
mkdir -p 1.IntroHPC/1.CLI
echo "dummy input" > 1.IntroHPC/1.CLI/test.in
echo "file list" > 1.IntroHPC/1.CLI/test.files
touch 1.IntroHPC/1.CLI/14si.pspnc
for folder in A B C D; do
  for sub in X Y Z; do
    cp 1.IntroHPC/1.CLI/test.in $folder/$sub/
    cp 1.IntroHPC/1.CLI/test.files $folder/$sub/
    ln -s ../../../1.IntroHPC/1.CLI/14si.pspnc $folder/$sub/14si.pspnc
  done
done

Verify

tree A
cat A/X/t17.in
ls -l A/X/14si.pspnc

Midnight Commander

GNU Midnight Commander is a visual file manager. mc feature a rich full-screen text mode application that allows you to copy, move and delete files and whole directory trees. Sometimes using a text-based user interface is convenient, in order to use mc just enter the command on the terminal

mc

There are several keystrokes that can be used to work with mc, most of them comes from typing the F1 to F10 keys. On Mac you need to press the “fn” key, on gnome (Linux), you need to disable the interpretation of the Function keys for gnome-terminal.

Exercise: Using the Command Line Interface

Use mc to create a folder E and subfolders X, Y and Z, copy the same files as we did for the previous exercise.

Exercise: Create LSST-style Visit Directory Structure

Use the CLI to create the following:

lsst_cli/
├── visit001/
│   ├── raw/
│   ├── calexp/
│   └── logs/
├── visit002/
│   ├── raw/
│   ├── calexp/
│   └── logs/

Then:

Exercise: Analyze Simulated Pipeline Logs

Use grep to find all lines in all job.log files containing “FATAL” or “WARNING”.

$ grep -rE 'FATAL|WARNING' lsst_cli/

Midnight Commander

GNU Midnight Commander is a visual file manager. mc feature a rich full-screen text mode application that allows you to copy, move and delete files and whole directory trees. Sometimes using a text-based user interface is convenient, in order to use mc just enter the command on the terminal

mc

There are several keystrokes that can be used to work with mc, most of them comes from typing the F1 to F10 keys. On Mac you need to press the “fn” key, on gnome (Linux), you need to disable the interpretation of the Function keys for gnome-terminal.

Exercise: Using the Command Line Interface

Use mc to create a folder E and subfolders X, Y and Z, copy the same files as we did for the previous exercise.

Key Points

  • Basic CLI skills enable efficient navigation and manipulation of data repositories

  • Use man to explore arguments for command-line tools