Command line basics
Overview
Teaching: XX min
Exercises: YY minQuestions
What command line skills do I need to work with data on High Performing Computing (HPC)?
Objectives
Learn essential CLI commands used in data management and processing on HPC
The top 10 basic commands to learn
CLI stands for Command Line Interface.
It is a way to interact with a computer program by typing text commands into a terminal or console window, instead of using a graphical user interface (GUI) with buttons and menus.
When working with large datasets, pipeline logs, and configuration files — mastering the command line is essential. Whether you’re navigating a High Performance Computing (HPC) repo, inspecting files, or debugging processing failures, these Unix commands will be indispensable.
The following are general-purpose commands, and we may add LSST-specific notes where applicable.
Working with LSST data often involves accessing large-scale datasets stored in hierarchical directories, using symbolic links for shared data, and scripting reproducible data analysis pipelines. These are the fundamental commands every LSST astronomer should know.
File Preparation:needed to run for later exercsies
# Make a dummy data directory and populate it mkdir -p 1.IntroHPC/1.CLI echo "dummy input" > 1.IntroHPC/1.CLI/test.in echo "file list" > 1.IntroHPC/1.CLI/test.files touch 1.IntroHPC/1.CLI/14si.pspnc
Directory and File Operations
Setup (run once before these examples): ```bash
mkdir -p lsst_data/raw cd lsst_data touch image01.fits echo “instrument: LATISS” > config.yaml echo -e “INFO: Init\nFATAL: Calibration failed” > job.log ```
ls
List contents of a directory. Useful flags:
-l
: long format-a
: include hidden files-F
: append indicator (e.g./
for directory,@
for symlink)
$ ls -alF
-a
: Show all files, including hidden ones (those starting with.
like.bashrc
)-C
: Display in columns-F
: Append file type indicators:/
for directories@
for symbolic links*
for executables
pwd
, cd
To check and change the current directory:
$ pwd
$ cd /lsst_data/raw
mkdir
, tree
Create directories and visualize structure:
$ mkdir -p repo/gen3/raw/20240101
$ tree repo/gen3
File Manipulation
cp
, mv
, rm
Basic operations:
$ cp image01.fits image02.fits
$ mv image02.fits image_raw.fits
$ rm image_raw.fits
ln
Create symbolic links to avoid data duplication:
$ ln -s /datasets/lsst/raw/image01.fits ./image01.fits
Viewing and Extracting Data
cat
, less
, grep
View and search YAML config or log files:
$ cat config.yaml
$ less job.log
$ grep "FATAL" job.log
Permissions and Metadata
chmod
, chown
, stat
Manage and inspect file attributes:
$ chmod 644 config.yaml
$ stat image01.fits
LSST-Specific Use Cases
Familiarity with
bash
,grep
,find
, andawk
will accelerate your workflow.
Exercises
Exercise 1: Set up LSST-style directory
- Create a folder structure:
lsst_cli/ ├── visit001/ │ ├── raw/ │ ├── calexp/ │ └── logs/ ├── visit002/ │ ├── raw/ │ ├── calexp/ │ └── logs/
-
Populate each
raw/
withimage01.fits
, and symbolic link tocalexp.fits
incalexp/
. - Add a
process.yaml
and log file in eachlogs/
.
Use tree
to verify.
Exercise 2: Analyze Logs
Using grep
and less
, identify all lines with “WARNING” or “FATAL” in the log files across visits.
Further Learning
Explore additional CLI tools:
awk
,cut
,xargs
eups
,conda
for environment setup
ls
List all the files in a directory. Linux as many Operating Systems organize files in files and directories (also called folders).
$ ls
file0a file0b folder1 folder2 link0a link2a
Some terminal offer color output so you can differentiate normal files from folders. You can make the difference more clear with this
$ ls -aCF
./ ../ file0a file0b folder1/ folder2/ link0a@ link2a@
You will see a two extra directories "."
and ".."
. Those are special folders that refer to the current folder and the folder up in the tree.
Directories have the suffix "/"
. Symbolic links, kind of shortcuts to other files or directories are indicated with the symbol "@"
.
Another option to get more information about the files in the system is:
$ ls -al
total 16
drwxr-xr-x 5 andjelka staff 160 Jun 16 08:53 .
drwxr-xr-x+ 273 andjelka staff 8736 Jun 16 08:52 ..
-rw-r--r-- 1 andjelka staff 19 Jun 16 08:53 config.yaml
-rw-r--r-- 1 andjelka staff 0 Jun 16 08:53 image01.fits
-rw-r--r-- 1 andjelka staff 37 Jun 16 08:53 job.log
Those characters on the first column indicate the permissions. The first character will be “d” for directories, “l” for symbolic links and “-“ for normal files. The next 3 characters are the permissions for “read”, “write” and “execute” for the owner. The next 3 are for the group, and the final 3 are for others. The meaning of “execute” for a file indicates that the file could be a script or binary executable. For a directory it means that you can see its contents.
cp
This command copies the contents of one file into another file. For example
$ cp file0b file0c
rm
This command deletes the contents of one file. For example
$ rm file0c
There is no such thing like a trash folder on a HPC system. Deleting a file should be consider an irreversible operation.
Recursive deletes can be done with
$ rm -rf folder_to_delete
Be extremely cautious deleting files recursively. You cannot damage the system as the files that you do not own you cannot delete. However, you can delete all your files forever.
mv
This command moves a files from one directory to another. It also can be used to rename files or directories.
$ mv file0b file0c
pwd
It is easy to get lost when you move in complex directory structures. pwd will tell you the current directory.
$ pwd
/Users/andjelka/Documents/LSST/interpython/interpython_hpc
cd
This command moves you to the directory indicated as an argument, if no argument is given, it returns to your home directory.
$ cd folder1
cat and tac
When you want to see the contents of a text file, the command cat displays the contents on the screen. It is also useful when you want to concatenate the contents of several files.
$ cat star_A_lc.csv
time,brightness
0.0,90.5
0.5,91.1
1.0,88.9
1.5,92.2
2.0,89.3
2.5,90.8
3.0,87.7...
To concatenate files you need to use the symbol ">"
indicating that you want to redirect the output of a command into a file
$ cat file1 file2 file3 > file_all
The command tac shows the files in reverse starting from the last line back to the first one.
more and less
Sometimes text files, as those created as product of simulations are too large to be seen in one screen, the command “more” shows the files one screen at a time. The command "less"
offers more functionality and should be the tool of choice to see large text files.
$ less OUT
ln
This command allow to create links between files. Used wisely could help you save time when traveling frequently to deep directories. By default it creates hard links. Hard links are like copies, but they make references to the same place in disk. Symbolic links are better in many cases because you can cross file systems and partitions. To create a symbolic link
$ ln -s file1 link_to_file1
grep
The grep command extract from its input the lines containing a specified string or regular expression. It is a powerful command for extracting specific information from large files. Consider for example
$ grep time star_A_lc.csv
time,brightness
$ grep 88.9 star_A_lc.csv
1.0,88.9
...
Create a light curve directory with empty csv files created with touch command/use provided csv files:
mkdir -p lightcurves
cd lightcurves
touch star_A_lc.csv star_B_lc.csv star_C_lc.csv
ln -s star_A_lc.csv brightest_star.csv
ls
– List Light Curve Files
List files:
$ ls
star_A_lc.csv star_B_lc.csv star_C_lc.csv brightest_star.csv
Use -F
and -a
for extra detail:
$ ls -aF
./ ../ star_A_lc.csv star_B_lc.csv star_C_lc.csv brightest_star.csv@
Long format with metadata:
$ ls -al
-rw-r--r-- 1 user staff 1024 Jun 16 09:00 star_A_lc.csv
lrwxr-xr-x 1 user staff 15 Jun 16 09:01 brightest_star.csv -> star_A_lc.csv
cp
– Copy a Light Curve File
$ cp star_B_lc.csv backup_star_B.csv
rm
– Delete a Corrupted Light Curve
$ rm star_C_lc.csv
mv
– Rename Light Curve
$ mv star_B_lc.csv star_B_epoch1.csv
pwd
– Show Working Directory
$ pwd
/home/user/...../lightcurves
cd
– Move between directroies
$ cd ../images
cat
and tac
– Inspect or Reverse Light Curve
cat star_A_lc.csv
tac star_A_lc.csv
Combine curves:
cat star_A_lc.csv star_B_epoch1.csv > merged_lc.csv
more
and less
– View Long Curves
$ less star_A_lc.csv
ln
– Create Alias for Light Curve
ln -s star_B_epoch1.csv variable_star.csv
grep
– Extract Brightness Above Threshold
grep ',[89][0-9]\.[0-9]*' star_A_lc.csv
Regular expressions offers ways to specified text strings that could vary in several ways and allow commands such as grep to extract those strings efficiently. We will see more about regular expressions on our third day devoted to data processing.
More commands
The 10 commands above, will give you enough tools to move files around and travel the directory tree. The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system. These are the core utilities which are expected to exist on every operating system.
If you want to know about the whole set of coreutils execute:
info coreutils
Each command has its own manual. You can access those manuals with
man <COMMAND>
Output of entire files
cat Concatenate and write files tac Concatenate and write files in reverse nl Number lines and write files od Write files in octal or other formats base64 Transform data into printable data
Formatting file contents
fmt Reformat paragraph text numfmt Reformat numbers pr Paginate or columnate files for printing fold Wrap input lines to fit in specified width
Output of parts of files
head Output the first part of files tail Output the last part of files split Split a file into fixed-size pieces csplit Split a file into context-determined pieces
Summarizing files
wc Print newline, word, and byte counts sum Print checksum and block counts cksum Print CRC checksum and byte counts md5sum Print or check MD5 digests sha1sum Print or check SHA-1 digests sha2 utilities Print or check SHA-2 digests
Operating on sorted files
sort Sort text files shuf Shuffle text files uniq Uniquify files comm Compare two sorted files line by line ptx Produce a permuted index of file contents tsort Topological sort
Operating on fields
cut Print selected parts of lines paste Merge lines of files join Join lines on a common field
Operating on characters
tr Translate, squeeze, and/or delete characters expand Convert tabs to spaces unexpand Convert spaces to tabs
Directory listing
ls List directory contents dir Briefly list directory contents vdir Verbosely list directory contents dircolors Color setup for 'ls'
Basic operations
cp Copy files and directories dd Convert and copy a file install Copy files and set attributes mv Move (rename) files rm Remove files or directories shred Remove files more securely
Special file types
link Make a hard link via the link syscall ln Make links between files mkdir Make directories mkfifo Make FIFOs (named pipes) mknod Make block or character special files readlink Print value of a symlink or canonical file name rmdir Remove empty directories unlink Remove files via unlink syscall
Changing file attributes
chown Change file owner and group chgrp Change group ownership chmod Change access permissions touch Change file timestamps
Disk usage
df Report file system disk space usage du Estimate file space usage stat Report file or file system status sync Synchronize data on disk with memory truncate Shrink or extend the size of a file
Printing text
echo Print a line of text printf Format and print data yes Print a string until interrupted
Conditions
false Do nothing, unsuccessfully true Do nothing, successfully test Check file types and compare values expr Evaluate expressions tee Redirect output to multiple files or processes
File name manipulation
basename Strip directory and suffix from a file name dirname Strip last file name component pathchk Check file name validity and portability mktemp Create temporary file or directory realpath Print resolved file names
Working context
pwd Print working directory stty Print or change terminal characteristics printenv Print all or some environment variables tty Print file name of terminal on standard input
User information
id Print user identity logname Print current login name whoami Print effective user ID groups Print group names a user is in users Print login names of users currently logged in who Print who is currently logged in
System context
arch Print machine hardware name date Print or set system date and time nproc Print the number of processors uname Print system information hostname Print or set system name hostid Print numeric host identifier uptime Print system uptime and load
Modified command
chroot Run a command with a different root directory env Run a command in a modified environment nice Run a command with modified niceness nohup Run a command immune to hangups stdbuf Run a command with modified I/O buffering timeout Run a command with a time limit
Process control
kill Sending a signal to processes
Delaying
sleep Delay for a specified time
Numeric operations
factor Print prime factors seq Print numeric sequences
Exercise: Using the Command Line Interface
- Create 4 folders
A
,B
,C
,D
and inside each of them create a three more:X
,Y
andZ
. At the end you should have 12 subfolders. Use the command tree to ensure you create the correct tree.Solution
You should get:
$ tree . ├── A │ ├── X │ ├── Y │ └── Z ├── B │ ├── X │ ├── Y │ └── Z ├── C │ ├── X │ ├── Y │ └── Z └── D ├── X ├── Y └── Z
- Lets copy some files in those folders. From the data folder lightcurve and two csv files
1.IntroHPC/1.CLI
, there are 3 filest17.in
,t17.files
and14si.pspnc
. Using the command line tools create copies of “t17.in” and “t17.files” inside each of those folders and symbolic link for14si.pspnc
. Both “t17.in” and “t17.files” are text files that we want to edit, but14si.pspnc
is just a relatively big file that we just need to use for the simulation, we do not want to make copies of if, just symbolic links and save disk space.
Solution
Step-by-step CLI commands:
# Step 1: Create the main folders mkdir -p A/X A/Y A/Z B/X B/Y B/Z C/X C/Y C/Z D/X D/Y D/Z # Step 2: Confirm structure tree
Output should be:
. ├── A │ ├── X │ ├── Y │ └── Z ├── B │ ├── X │ ├── Y │ └── Z ├── C │ ├── X │ ├── Y │ └── Z └── D ├── X ├── Y └── Z
File Preparation:
# Make a dummy data directory and populate it mkdir -p 1.IntroHPC/1.CLI echo "dummy input" > 1.IntroHPC/1.CLI/test.in echo "file list" > 1.IntroHPC/1.CLI/test.files touch 1.IntroHPC/1.CLI/14si.pspnc
Copy and link files
for folder in A B C D; do for sub in X Y Z; do cp 1.IntroHPC/1.CLI/test.in $folder/$sub/ cp 1.IntroHPC/1.CLI/test.files $folder/$sub/ ln -s ../../../1.IntroHPC/1.CLI/14si.pspnc $folder/$sub/14si.pspnc done done
Verify
tree A cat A/X/t17.in ls -l A/X/14si.pspnc
Midnight Commander
GNU Midnight Commander is a visual file manager. mc feature a rich full-screen text mode application that allows you to copy, move and delete files and whole directory trees. Sometimes using a text-based user interface is convenient, in order to use mc just enter the command on the terminal
mc
There are several keystrokes that can be used to work with mc, most of them comes from typing the F1 to F10 keys. On Mac you need to press the “fn” key, on gnome (Linux), you need to disable the interpretation of the Function keys for gnome-terminal.
Exercise: Using the Command Line Interface
Use mc to create a folder E and subfolders X, Y and Z, copy the same files as we did for the previous exercise.
Exercise: Create LSST-style Visit Directory Structure
Use the CLI to create the following:
lsst_cli/
├── visit001/
│ ├── raw/
│ ├── calexp/
│ └── logs/
├── visit002/
│ ├── raw/
│ ├── calexp/
│ └── logs/
Then:
- Add dummy files
image01.fits
into eachraw/
folder. - Create symbolic links from
calexp/calexp.fits
to../raw/image01.fits
. - Create YAML files in each
logs/
folder with config info and dummyjob.log
files withWARNING
andFATAL
strings.
Exercise: Analyze Simulated Pipeline Logs
Use grep
to find all lines in all job.log
files containing “FATAL” or “WARNING”.
$ grep -rE 'FATAL|WARNING' lsst_cli/
Midnight Commander
GNU Midnight Commander is a visual file manager. mc feature a rich full-screen text mode application that allows you to copy, move and delete files and whole directory trees. Sometimes using a text-based user interface is convenient, in order to use mc just enter the command on the terminal
mc
There are several keystrokes that can be used to work with mc, most of them comes from typing the F1 to F10 keys. On Mac you need to press the “fn” key, on gnome (Linux), you need to disable the interpretation of the Function keys for gnome-terminal.
Exercise: Using the Command Line Interface
Use mc to create a folder E and subfolders X, Y and Z, copy the same files as we did for the previous exercise.
Key Points
Basic CLI skills enable efficient navigation and manipulation of data repositories
Use man to explore arguments for command-line tools