Bash programming and Linux command-line tools#
In our course IN3110 – Problemløsning med høynivå-språk we have so far only used Python (mostly; Cython). In this lecture we will add to the family of languages as we will discuss shell(Bash) scripting. The takehome message is that (simple) scripts combined with other command-line utities can provide elegant solutions and powerful pre/processing pipelines for processing data.
A bit of history - there were/are many shells#
1979: Bourne shell (
sh
)1978: C and TC shell (
csh
andtcsh
)1989: Bourne Again shell (
bash
)Bash derivatives:
1983: Korn shell (
ksh
),1990: Z shell (
zsh
)2002: Dash (
dash
),
Why learn Bash?#
Learning Bash means learning the roots of scripting
Bash, are frequently encountered on Unix systems
Bash is the dominating command interpreter and scripting language
Shell scripts evolve naturally from a workflow:
A sequence of commands you use often are placed in a file
Command-line options are introduced to enable different options to be passed to the commands
Introducing variables, if tests, loops enables more complex program flow
At some point pre- and postprocessing becomes too advanced for bash, at which point (parts of) the script should be ported to Python or other tools
In this lecture we imagine that we find ourselves working e.g. on some Linux cluster where we cannot get the admin permission to install Python modules or text editors we have available on our machines. We will try to get things done with utilities that are commonly installed by default.
What Bash is good for#
File and directory management
Systems management (build scripts)
Combining other scripts and commands
Rapid prototyping of more advanced scripts
Very simple output processing, plotting etc.
Some common tasks in Bash#
file writing and managing files and directories (creation, deletion, renaming)
for-loops
running an application
combining applications (pipes)
file globbing, testing file types
What Bash is not good for#
Cross-platform portability
Graphics, GUIs
Interface with libraries or legacy code
More advanced post processing and plotting
Calculations, math etc.
Installation#
All our examples can be run under Bash, and many in the Bourne shell
Differences in operating systems:
Mac OSX:
/bin/sh
is just a link to Bash (/bin/bash
).Ubuntu:
/bin/sh
is a link to Dash, a minimal, but much faster shell than bash. Alternatively/bin/bash
Windows: bash is available through
cygwin
or the Linux-Subsystem in Windows 10.
Use within jupyter notebooks: We will use line magic !
or cell magic `` to run the shell commands in the notebook.
Alternatively We can install a bash kernel and use it within the notebook (Kernel>Set kernel)
Bash tutorial#
You will see a number of Bash/Unix commands in this lecture. The new commands will be highlighted with a ⚠️ .
!echo "Hello from bash"
Hello from bash
Function is called by giving its named followed by arguments. ⚠️ echo
prints text to screen.
We could write the above source code into a source file, here ./scripts/hello_world.sh
VIM intermezzo#
To stick to our scenario of being stuck on a cluster where there is no VScode/SublimeText and what not let us use VIM for editing. VIM is a powerful text editor (i(M)proving it predecessor VI editor) - here we will only scratch its surface (no macros, advanced search). In some sense the philosphy behind VIM is that a painter first picks his instrument (mode selection), places it on the canvas (navigation) before starting to draw (e.g. editing).
Navigation
ESC
to leave the current mode. Then press
0
jump to line beginning$
jump to line endh
,l
,j
,k
to move left, right, down or upgg
to jump to the start of the file, orG
to jump to the endw
to jump forward a word orb
to move back a word
Manipulation/Editing
Pressing
i
enters edit mode (you can type as you want)Pressing
x
,dw
,dd
deletes respectivel a character, word or entire linePressing NUMBER before the command in general repeats it NUMBER times
Pressing
.
repeats the previous actionctrl+a
jumps to the end of the line and enters edit modes
(substitute) deletes the character under cursor and enters edit modeu
undoes‘v’ enters visual mode
:w
saves the buffer to file
Search
/
enters search mode. After specifying the pattern pressingn
will move forward to the next match, whileN
searches backward
:q
or:q!
A great reference to learn more about VIM is the book Practical VIM: Edit Text at the Speed of Thought.
Back to Bash#
! cat ./scripts/hello_world.sh
#!/bin/bash
# This is a regular comment line
echo "hello world!"
Here the lines starting with hash #
interpreted as comments.
Above we have used ⚠️ cat
command to view the file content. Later we will see that it can be used for reading and writing too.
Now we could try to run the script only to find that we get and error
! ./scripts/hello_world.sh
hello world!
The issue is that the file is not executable. We can see this with ⚠️ ls
command (where we specify the “-l” flag to get a long output)
! ls -l ./scripts/hello_world.sh
-rwxrw-r-- 1 mirok mirok 65 sep. 5 16:42 ./scripts/hello_world.sh
The permisions are r(ead), w(rite), x(execute) and are specified gor user groups owner(u)/group(g)/other(o).
For fix we use the ⚠️ chmod
command. In particular, below we add execution permission to the user (group)
%%bash
chmod u+x ./scripts/hello_world.sh
ls -l ./scripts/hello_world.sh
-rwxrwxr-x 1 mirok mirok 65 sep. 5 16:42 ./scripts/hello_world.sh
Now we can finally execute
! ./scripts/hello_world.sh
hello world!
Now that the code run we could ask about who actually run/interpreted it. Bash uses itself as default interpreter, if not otherwise specified. We can be explicit about the interpreter:
! cat scripts/hello_world_bang.sh
#!/bin/bash
# This is a regular comment line
echo "hello world!"
print?
Observe that the first line starting with shebang
, i.e. #!
specifies the interpreter to use for the script. The second line, starting with the hash, #
, is a comment.
Note We could have specified a different interpreter/shell as by giving instead the first line /usr/bin/sh
. Let’s see what sort of shell that is
! man sh
DASH(1) BSD General Commands Manual DASH(1)
NAME
dash — command interpreter (shell)
SYNOPSIS
dash [-aCefnuvxIimqVEbp] [+aCefnuvxIimqVEbp] [-o option_name]
[+o option_name] [command_file [argument ...]]
dash -c [-aCefnuvxIimqVEbp] [+aCefnuvxIimqVEbp] [-o option_name]
[+o option_name] command_string [command_name [argument ...]]
dash -s [-aCefnuvxIimqVEbp] [+aCefnuvxIimqVEbp] [-o option_name]
[+o option_name] [argument ...]
DESCRIPTION
dash is the standard command interpreter for the system. The current
version of dash is in the process of being changed to conform with the
POSIX 1003.2 and 1003.2a specifications for the shell. This version has
many features which make it appear similar in some respects to the Korn
shell, but it is not a Korn shell clone (see ksh(1)). Only features des‐
ignated by POSIX, plus a few Berkeley extensions, are being incorporated
into this shell. This man page is not intended to be a tutorial or a
complete specification of the shell.
Overview
The shell is a command that reads lines from either a file or the termi‐
nal, interprets them, and generally executes other commands. It is the
program that is running when a user logs into the system (although a user
can select a different shell with the chsh(1) command). The shell imple‐
ments a language that has flow control constructs, a macro facility that
provides a variety of features in addition to data storage, along with
built in history and line editing capabilities. It incorporates many
features to aid interactive use and has the advantage that the interpre‐
tative language is common to both interactive and non-interactive use
(shell scripts). That is, commands can be typed directly to the running
shell or can be put into a file and the file can be executed directly by
the shell.
Invocation
If no args are present and if the standard input of the shell is con‐
nected to a terminal (or if the -i flag is set), and the -c option is not
present, the shell is considered an interactive shell. An interactive
shell generally prompts before each command and handles programming and
command errors differently (as described below). When first starting,
the shell inspects argument 0, and if it begins with a dash ‘-’, the
shell is also considered a login shell. This is normally done automati‐
cally by the system when the user first logs in. A login shell first
reads commands from the files /etc/profile and .profile if they exist.
If the environment variable ENV is set on entry to an interactive shell,
or is set in the .profile of a login shell, the shell next reads commands
from the file named in ENV. Therefore, a user should place commands that
are to be executed only at login time in the .profile file, and commands
that are executed for every interactive shell inside the ENV file. To
set the ENV variable to some file, place the following line in your
.profile of your home directory
ENV=$HOME/.shinit; export ENV
substituting for “.shinit” any filename you wish.
If command line arguments besides the options have been specified, then
the shell treats the first argument as the name of a file from which to
read commands (a shell script), and the remaining arguments are set as
the positional parameters of the shell ($1, $2, etc). Otherwise, the
shell reads commands from its standard input.
Argument List Processing
All of the single letter options that have a corresponding name can be
used as an argument to the -o option. The set -o name is provided next
to the single letter option in the description below. Specifying a dash
“-” turns the option on, while using a plus “+” disables the option. The
following options can be set from the command line or with the set
builtin (described later).
-a allexport Export all variables assigned to.
-c Read commands from the command_string operand in‐
stead of from the standard input. Special parame‐
ter 0 will be set from the command_name operand
and the positional parameters ($1, $2, etc.) set
from the remaining argument operands.
-C noclobber Don't overwrite existing files with “>”.
-e errexit If not interactive, exit immediately if any
untested command fails. The exit status of a com‐
mand is considered to be explicitly tested if the
command is used to control an if, elif, while, or
until; or if the command is the left hand operand
of an “&&” or “||” operator.
-f noglob Disable pathname expansion.
-n noexec If not interactive, read commands but do not exe‐
cute them. This is useful for checking the syntax
of shell scripts.
-u nounset Write a message to standard error when attempting
to expand a variable that is not set, and if the
shell is not interactive, exit immediately.
-v verbose The shell writes its input to standard error as it
is read. Useful for debugging.
-x xtrace Write each command to standard error (preceded by
a ‘+ ’) before it is executed. Useful for debug‐
ging.
-I ignoreeof Ignore EOF's from input when interactive.
-i interactive Force the shell to behave interactively.
-l Make dash act as if it had been invoked as a login
shell.
-m monitor Turn on job control (set automatically when inter‐
active).
-s stdin Read commands from standard input (set automati‐
cally if no file arguments are present). This op‐
tion has no effect when set after the shell has
already started running (i.e. with set).
-V vi Enable the built-in vi(1) command line editor
(disables -E if it has been set).
-E emacs Enable the built-in emacs(1) command line editor
(disables -V if it has been set).
-b notify Enable asynchronous notification of background job
completion. (UNIMPLEMENTED for 4.4alpha)
-p priviliged Do not attempt to reset effective uid if it does
not match uid. This is not set by default to help
avoid incorrect usage by setuid root programs via
system(3) or popen(3).
Lexical Structure
The shell reads input in terms of lines from a file and breaks it up into
words at whitespace (blanks and tabs), and at certain sequences of char‐
acters that are special to the shell called “operators”. There are two
types of operators: control operators and redirection operators (their
meaning is discussed later). Following is a list of operators:
Control operators:
& && ( ) ; ;; | || <newline>
Redirection operators:
< > >| << >> <& >& <<- <>
Quoting
Quoting is used to remove the special meaning of certain characters or
words to the shell, such as operators, whitespace, or keywords. There
are three types of quoting: matched single quotes, matched double quotes,
and backslash.
Backslash
A backslash preserves the literal meaning of the following character,
with the exception of ⟨newline⟩. A backslash preceding a ⟨newline⟩ is
treated as a line continuation.
Single Quotes
Enclosing characters in single quotes preserves the literal meaning of
all the characters (except single quotes, making it impossible to put
single-quotes in a single-quoted string).
Double Quotes
Enclosing characters within double quotes preserves the literal meaning
of all characters except dollarsign ($), backquote (`), and backslash
(\). The backslash inside double quotes is historically weird, and
serves to quote only the following characters:
$ ` " \ <newline>.
Otherwise it remains literal.
Reserved Words
Reserved words are words that have special meaning to the shell and are
recognized at the beginning of a line and after a control operator. The
following are reserved words:
! elif fi while case
else for then { }
do done until if esac
Their meaning is discussed later.
Aliases
An alias is a name and corresponding value set using the alias(1) builtin
command. Whenever a reserved word may occur (see above), and after
checking for reserved words, the shell checks the word to see if it
matches an alias. If it does, it replaces it in the input stream with
its value. For example, if there is an alias called “lf” with the value
“ls -F”, then the input:
lf foobar ⟨return⟩
would become
ls -F foobar ⟨return⟩
Aliases provide a convenient way for naive users to create shorthands for
commands without having to learn how to create functions with arguments.
They can also be used to create lexically obscure code. This use is dis‐
couraged.
Commands
The shell interprets the words it reads according to a language, the
specification of which is outside the scope of this man page (refer to
the BNF in the POSIX 1003.2 document). Essentially though, a line is
read and if the first word of the line (or after a control operator) is
not a reserved word, then the shell has recognized a simple command.
Otherwise, a complex command or some other special construct may have
been recognized.
Simple Commands
If a simple command has been recognized, the shell performs the following
actions:
1. Leading words of the form “name=value” are stripped off and
assigned to the environment of the simple command. Redirect‐
ion operators and their arguments (as described below) are
stripped off and saved for processing.
2. The remaining words are expanded as described in the section
called “Expansions”, and the first remaining word is consid‐
ered the command name and the command is located. The remain‐
ing words are considered the arguments of the command. If no
command name resulted, then the “name=value” variable assign‐
ments recognized in item 1 affect the current shell.
3. Redirections are performed as described in the next section.
Redirections
Redirections are used to change where a command reads its input or sends
its output. In general, redirections open, close, or duplicate an exist‐
ing reference to a file. The overall format used for redirection is:
[n] redir-op file
where redir-op is one of the redirection operators mentioned previously.
Following is a list of the possible redirections. The [n] is an optional
number between 0 and 9, as in ‘3’ (not ‘[3]’), that refers to a file de‐
scriptor.
[n]> file Redirect standard output (or n) to file.
[n]>| file Same, but override the -C option.
[n]>> file Append standard output (or n) to file.
[n]< file Redirect standard input (or n) from file.
[n1]<&n2 Copy file descriptor n2 as stdout (or fd n1). fd n2.
[n]<&- Close standard input (or n).
[n1]>&n2 Copy file descriptor n2 as stdin (or fd n1). fd n2.
[n]>&- Close standard output (or n).
[n]<> file Open file for reading and writing on standard input (or
n).
The following redirection is often called a “here-document”.
[n]<< delimiter
here-doc-text ...
delimiter
All the text on successive lines up to the delimiter is saved away and
made available to the command on standard input, or file descriptor n if
it is specified. If the delimiter as specified on the initial line is
quoted, then the here-doc-text is treated literally, otherwise the text
is subjected to parameter expansion, command substitution, and arithmetic
expansion (as described in the section on “Expansions”). If the operator
is “<<-” instead of “<<”, then leading tabs in the here-doc-text are
stripped.
Search and Execution
There are three types of commands: shell functions, builtin commands, and
normal programs – and the command is searched for (by name) in that or‐
der. They each are executed in a different way.
When a shell function is executed, all of the shell positional parameters
(except $0, which remains unchanged) are set to the arguments of the
shell function. The variables which are explicitly placed in the envi‐
ronment of the command (by placing assignments to them before the func‐
tion name) are made local to the function and are set to the values
given. Then the command given in the function definition is executed.
The positional parameters are restored to their original values when the
command completes. This all occurs within the current shell.
Shell builtins are executed internally to the shell, without spawning a
new process.
Otherwise, if the command name doesn't match a function or builtin, the
command is searched for as a normal program in the file system (as de‐
scribed in the next section). When a normal program is executed, the
shell runs the program, passing the arguments and the environment to the
program. If the program is not a normal executable file (i.e., if it
does not begin with the "magic number" whose ASCII representation is
"#!", so execve(2) returns ENOEXEC then) the shell will interpret the
program in a subshell. The child shell will reinitialize itself in this
case, so that the effect will be as if a new shell had been invoked to
handle the ad-hoc shell script, except that the location of hashed com‐
mands located in the parent shell will be remembered by the child.
Note that previous versions of this document and the source code itself
misleadingly and sporadically refer to a shell script without a magic
number as a "shell procedure".
Path Search
When locating a command, the shell first looks to see if it has a shell
function by that name. Then it looks for a builtin command by that name.
If a builtin command is not found, one of two things happen:
1. Command names containing a slash are simply executed without per‐
forming any searches.
2. The shell searches each entry in PATH in turn for the command. The
value of the PATH variable should be a series of entries separated
by colons. Each entry consists of a directory name. The current
directory may be indicated implicitly by an empty directory name, or
explicitly by a single period.
Command Exit Status
Each command has an exit status that can influence the behaviour of other
shell commands. The paradigm is that a command exits with zero for nor‐
mal or success, and non-zero for failure, error, or a false indication.
The man page for each command should indicate the various exit codes and
what they mean. Additionally, the builtin commands return exit codes, as
does an executed shell function.
If a command consists entirely of variable assignments then the exit sta‐
tus of the command is that of the last command substitution if any, oth‐
erwise 0.
Complex Commands
Complex commands are combinations of simple commands with control opera‐
tors or reserved words, together creating a larger complex command. More
generally, a command is one of the following:
• simple command
• pipeline
• list or compound-list
• compound command
• function definition
Unless otherwise stated, the exit status of a command is that of the last
simple command executed by the command.
Pipelines
A pipeline is a sequence of one or more commands separated by the control
operator |. The standard output of all but the last command is connected
to the standard input of the next command. The standard output of the
last command is inherited from the shell, as usual.
The format for a pipeline is:
[!] command1 [| command2 ...]
The standard output of command1 is connected to the standard input of
command2. The standard input, standard output, or both of a command is
considered to be assigned by the pipeline before any redirection speci‐
fied by redirection operators that are part of the command.
If the pipeline is not in the background (discussed later), the shell
waits for all commands to complete.
If the reserved word ! does not precede the pipeline, the exit status is
the exit status of the last command specified in the pipeline. Other‐
wise, the exit status is the logical NOT of the exit status of the last
command. That is, if the last command returns zero, the exit status is
1; if the last command returns greater than zero, the exit status is
zero.
Because pipeline assignment of standard input or standard output or both
takes place before redirection, it can be modified by redirection. For
example:
$ command1 2>&1 | command2
sends both the standard output and standard error of command1 to the
standard input of command2.
A ; or ⟨newline⟩ terminator causes the preceding AND-OR-list (described
next) to be executed sequentially; a & causes asynchronous execution of
the preceding AND-OR-list.
Note that unlike some other shells, each process in the pipeline is a
child of the invoking shell (unless it is a shell builtin, in which case
it executes in the current shell – but any effect it has on the environ‐
ment is wiped).
Background Commands – &
If a command is terminated by the control operator ampersand (&), the
shell executes the command asynchronously – that is, the shell does not
wait for the command to finish before executing the next command.
The format for running a command in background is:
command1 & [command2 & ...]
If the shell is not interactive, the standard input of an asynchronous
command is set to /dev/null.
Lists – Generally Speaking
A list is a sequence of zero or more commands separated by newlines,
semicolons, or ampersands, and optionally terminated by one of these
three characters. The commands in a list are executed in the order they
are written. If command is followed by an ampersand, the shell starts
the command and immediately proceeds onto the next command; otherwise it
waits for the command to terminate before proceeding to the next one.
Short-Circuit List Operators
“&&” and “||” are AND-OR list operators. “&&” executes the first com‐
mand, and then executes the second command if and only if the exit status
of the first command is zero. “||” is similar, but executes the second
command if and only if the exit status of the first command is nonzero.
“&&” and “||” both have the same priority.
Flow-Control Constructs – if, while, for, case
The syntax of the if command is
if list
then list
[ elif list
then list ] ...
[ else list ]
fi
The syntax of the while command is
while list
do list
done
The two lists are executed repeatedly while the exit status of the first
list is zero. The until command is similar, but has the word until in
place of while, which causes it to repeat until the exit status of the
first list is zero.
The syntax of the for command is
for variable [ in [ word ... ] ]
do list
done
The words following in are expanded, and then the list is executed re‐
peatedly with the variable set to each word in turn. Omitting in word
... is equivalent to in "$@".
The syntax of the break and continue command is
break [ num ]
continue [ num ]
Break terminates the num innermost for or while loops. Continue contin‐
ues with the next iteration of the innermost loop. These are implemented
as builtin commands.
The syntax of the case command is
case word in
[(]pattern) list ;;
...
esac
The pattern can actually be one or more patterns (see Shell Patterns de‐
scribed later), separated by “|” characters. The “(” character before
the pattern is optional.
Grouping Commands Together
Commands may be grouped by writing either
(list)
or
{ list; }
The first of these executes the commands in a subshell. Builtin commands
grouped into a (list) will not affect the current shell. The second form
does not fork another shell so is slightly more efficient. Grouping com‐
mands together this way allows you to redirect their output as though
they were one program:
{ printf " hello " ; printf " world\n" ; } > greeting
Note that “}” must follow a control operator (here, “;”) so that it is
recognized as a reserved word and not as another command argument.
Functions
The syntax of a function definition is
name () command
A function definition is an executable statement; when executed it in‐
stalls a function named name and returns an exit status of zero. The
command is normally a list enclosed between “{” and “}”.
Variables may be declared to be local to a function by using a local com‐
mand. This should appear as the first statement of a function, and the
syntax is
local [variable | -] ...
Local is implemented as a builtin command.
When a variable is made local, it inherits the initial value and exported
and readonly flags from the variable with the same name in the surround‐
ing scope, if there is one. Otherwise, the variable is initially unset.
The shell uses dynamic scoping, so that if you make the variable x local
to function f, which then calls function g, references to the variable x
made inside g will refer to the variable x declared inside f, not to the
global variable named x.
The only special parameter that can be made local is “-”. Making “-” lo‐
cal any shell options that are changed via the set command inside the
function to be restored to their original values when the function re‐
turns.
The syntax of the return command is
return [exitstatus]
It terminates the currently executing function. Return is implemented as
a builtin command.
Variables and Parameters
The shell maintains a set of parameters. A parameter denoted by a name
is called a variable. When starting up, the shell turns all the environ‐
ment variables into shell variables. New variables can be set using the
form
name=value
Variables set by the user must have a name consisting solely of alphabet‐
ics, numerics, and underscores - the first of which must not be numeric.
A parameter can also be denoted by a number or a special character as ex‐
plained below.
Positional Parameters
A positional parameter is a parameter denoted by a number (n > 0). The
shell sets these initially to the values of its command line arguments
that follow the name of the shell script. The set builtin can also be
used to set or reset them.
Special Parameters
A special parameter is a parameter denoted by one of the following spe‐
cial characters. The value of the parameter is listed next to its char‐
acter.
* Expands to the positional parameters, starting from one.
When the expansion occurs within a double-quoted string it
expands to a single field with the value of each parameter
separated by the first character of the IFS variable, or by
a ⟨space⟩ if IFS is unset.
@ Expands to the positional parameters, starting from one.
When the expansion occurs within double-quotes, each posi‐
tional parameter expands as a separate argument. If there
are no positional parameters, the expansion of @ generates
zero arguments, even when @ is double-quoted. What this ba‐
sically means, for example, is if $1 is “abc” and $2 is “def
ghi”, then "$@" expands to the two arguments:
"abc" "def ghi"
# Expands to the number of positional parameters.
? Expands to the exit status of the most recent pipeline.
- (Hyphen.) Expands to the current option flags (the single-letter op‐
tion names concatenated into a string) as specified on invo‐
cation, by the set builtin command, or implicitly by the
shell.
$ Expands to the process ID of the invoked shell. A subshell
retains the same value of $ as its parent.
! Expands to the process ID of the most recent background com‐
mand executed from the current shell. For a pipeline, the
process ID is that of the last command in the pipeline.
0 (Zero.) Expands to the name of the shell or shell script.
WWoorrdd EExxppaannssiioonnss
This clause describes the various expansions that are performed on words.
Not all expansions are performed on every word, as explained later.
Tilde expansions, parameter expansions, command substitutions, arithmetic
expansions, and quote removals that occur within a single word expand to
a single field. It is only field splitting or pathname expansion that
can create multiple fields from a single word. The single exception to
this rule is the expansion of the special parameter @ within double-
quotes, as was described above.
The order of word expansion is:
1. Tilde Expansion, Parameter Expansion, Command Substitution, Arith‐
metic Expansion (these all occur at the same time).
2. Field Splitting is performed on fields generated by step (1) unless
the IFS variable is null.
3. Pathname Expansion (unless set --ff is in effect).
4. Quote Removal.
The $ character is used to introduce parameter expansion, command substi‐
tution, or arithmetic evaluation.
TTiillddee EExxppaannssiioonn ((ssuubbssttiittuuttiinngg aa uusseerr''ss hhoommee ddiirreeccttoorryy))
A word beginning with an unquoted tilde character (~) is subjected to
tilde expansion. All the characters up to a slash (/) or the end of the
word are treated as a username and are replaced with the user's home di‐
rectory. If the username is missing (as in _~_/_f_o_o_b_a_r), the tilde is re‐
placed with the value of the _H_O_M_E variable (the current user's home di‐
rectory).
PPaarraammeetteerr EExxppaannssiioonn
The format for parameter expansion is as follows:
${expression}
where expression consists of all characters until the matching “}”. Any
“}” escaped by a backslash or within a quoted string, and characters in
embedded arithmetic expansions, command substitutions, and variable ex‐
pansions, are not examined in determining the matching “}”.
The simplest form for parameter expansion is:
${parameter}
The value, if any, of parameter is substituted.
The parameter name or symbol can be enclosed in braces, which are op‐
tional except for positional parameters with more than one digit or when
parameter is followed by a character that could be interpreted as part of
the name. If a parameter expansion occurs inside double-quotes:
1. Pathname expansion is not performed on the results of the expansion.
2. Field splitting is not performed on the results of the expansion,
with the exception of @.
In addition, a parameter expansion can be modified by using one of the
following formats.
${parameter:-word} Use Default Values. If parameter is unset or null,
the expansion of word is substituted; otherwise,
the value of parameter is substituted.
${parameter:=word} Assign Default Values. If parameter is unset or
null, the expansion of word is assigned to parame‐
ter. In all cases, the final value of parameter is
substituted. Only variables, not positional param‐
eters or special parameters, can be assigned in
this way.
${parameter:?[word]} Indicate Error if Null or Unset. If parameter is
unset or null, the expansion of word (or a message
indicating it is unset if word is omitted) is writ‐
ten to standard error and the shell exits with a
nonzero exit status. Otherwise, the value of pa‐
rameter is substituted. An interactive shell need
not exit.
${parameter:+word} Use Alternative Value. If parameter is unset or
null, null is substituted; otherwise, the expansion
of word is substituted.
In the parameter expansions shown previously, use of the colon in the
format results in a test for a parameter that is unset or null; omission
of the colon results in a test for a parameter that is only unset.
${#parameter} String Length. The length in characters of the
value of parameter.
The following four varieties of parameter expansion provide for substring
processing. In each case, pattern matching notation (see _S_h_e_l_l
_P_a_t_t_e_r_n_s), rather than regular expression notation, is used to evaluate
the patterns. If parameter is * or @, the result of the expansion is un‐
specified. Enclosing the full parameter expansion string in double-
quotes does not cause the following four varieties of pattern characters
to be quoted, whereas quoting characters within the braces has this ef‐
fect.
${parameter%word} Remove Smallest Suffix Pattern. The word is ex‐
panded to produce a pattern. The parameter expan‐
sion then results in parameter, with the smallest
portion of the suffix matched by the pattern
deleted.
${parameter%%word} Remove Largest Suffix Pattern. The word is ex‐
panded to produce a pattern. The parameter expan‐
sion then results in parameter, with the largest
portion of the suffix matched by the pattern
deleted.
${parameter#word} Remove Smallest Prefix Pattern. The word is ex‐
panded to produce a pattern. The parameter expan‐
sion then results in parameter, with the smallest
portion of the prefix matched by the pattern
deleted.
${parameter##word} Remove Largest Prefix Pattern. The word is ex‐
panded to produce a pattern. The parameter expan‐
sion then results in parameter, with the largest
portion of the prefix matched by the pattern
deleted.
CCoommmmaanndd SSuubbssttiittuuttiioonn
Command substitution allows the output of a command to be substituted in
place of the command name itself. Command substitution occurs when the
command is enclosed as follows:
$(command)
or (“backquoted” version):
`command`
The shell expands the command substitution by executing command in a sub‐
shell environment and replacing the command substitution with the stan‐
dard output of the command, removing sequences of one or more ⟨newline⟩s
at the end of the substitution. (Embedded ⟨newline⟩s before the end of
the output are not removed; however, during field splitting, they may be
translated into ⟨space⟩s, depending on the value of IFS and quoting that
is in effect.)
AArriitthhmmeettiicc EExxppaannssiioonn
Arithmetic expansion provides a mechanism for evaluating an arithmetic
expression and substituting its value. The format for arithmetic expan‐
sion is as follows:
$((expression))
The expression is treated as if it were in double-quotes, except that a
double-quote inside the expression is not treated specially. The shell
expands all tokens in the expression for parameter expansion, command
substitution, and quote removal.
Next, the shell treats this as an arithmetic expression and substitutes
the value of the expression.
WWhhiittee SSppaaccee SSpplliittttiinngg ((FFiieelldd SSpplliittttiinngg))
After parameter expansion, command substitution, and arithmetic expansion
the shell scans the results of expansions and substitutions that did not
occur in double-quotes for field splitting and multiple fields can re‐
sult.
The shell treats each character of the IFS as a delimiter and uses the
delimiters to split the results of parameter expansion and command sub‐
stitution into fields.
PPaatthhnnaammee EExxppaannssiioonn ((FFiillee NNaammee GGeenneerraattiioonn))
Unless the --ff flag is set, file name generation is performed after word
splitting is complete. Each word is viewed as a series of patterns, sep‐
arated by slashes. The process of expansion replaces the word with the
names of all existing files whose names can be formed by replacing each
pattern with a string that matches the specified pattern. There are two
restrictions on this: first, a pattern cannot match a string containing a
slash, and second, a pattern cannot match a string starting with a period
unless the first character of the pattern is a period. The next section
describes the patterns used for both Pathname Expansion and the ccaassee com‐
mand.
SShheellll PPaatttteerrnnss
A pattern consists of normal characters, which match themselves, and
meta-characters. The meta-characters are “!”, “*”, “?”, and “[”. These
characters lose their special meanings if they are quoted. When command
or variable substitution is performed and the dollar sign or back quotes
are not double quoted, the value of the variable or the output of the
command is scanned for these characters and they are turned into meta-
characters.
An asterisk (“*”) matches any string of characters. A question mark
matches any single character. A left bracket (“[”) introduces a charac‐
ter class. The end of the character class is indicated by a (“]”); if
the “]” is missing then the “[” matches a “[” rather than introducing a
character class. A character class matches any of the characters between
the square brackets. A range of characters may be specified using a mi‐
nus sign. The character class may be complemented by making an exclama‐
tion point the first character of the character class.
To include a “]” in a character class, make it the first character listed
(after the “!”, if any). To include a minus sign, make it the first or
last character listed.
BBuuiillttiinnss
This section lists the builtin commands which are builtin because they
need to perform some operation that can't be performed by a separate
process. In addition to these, there are several other commands that may
be builtin for efficiency (e.g. printf(1), echo(1), test(1), etc).
:
true A null command that returns a 0 (true) exit value.
. file
The commands in the specified file are read and executed by the
shell.
alias [_n_a_m_e[_=_s_t_r_i_n_g _._._.]]
If _n_a_m_e_=_s_t_r_i_n_g is specified, the shell defines the alias _n_a_m_e with
value _s_t_r_i_n_g. If just _n_a_m_e is specified, the value of the alias
_n_a_m_e is printed. With no arguments, the aalliiaass builtin prints the
names and values of all defined aliases (see uunnaalliiaass).
bg [_j_o_b] _._._.
Continue the specified jobs (or the current job if no jobs are
given) in the background.
command [--pp] [--vv] [--VV] _c_o_m_m_a_n_d [_a_r_g _._._.]
Execute the specified command but ignore shell functions when
searching for it. (This is useful when you have a shell function
with the same name as a builtin command.)
--pp search for command using a PATH that guarantees to find all
the standard utilities.
--VV Do not execute the command but search for the command and
print the resolution of the command search. This is the
same as the type builtin.
--vv Do not execute the command but search for the command and
print the absolute pathname of utilities, the name for
builtins or the expansion of aliases.
cd _-
cd [--LLPP] [_d_i_r_e_c_t_o_r_y]
Switch to the specified directory (default HOME). If an entry for
CDPATH appears in the environment of the ccdd command or the shell
variable CDPATH is set and the directory name does not begin with
a slash, then the directories listed in CDPATH will be searched
for the specified directory. The format of CDPATH is the same as
that of PATH. If a single dash is specified as the argument, it
will be replaced by the value of OLDPWD. The ccdd command will
print out the name of the directory that it actually switched to
if this is different from the name that the user gave. These may
be different either because the CDPATH mechanism was used or be‐
cause the argument is a single dash. The --PP option causes the
physical directory structure to be used, that is, all symbolic
links are resolved to their respective values. The --LL option
turns off the effect of any preceding --PP options.
echo [--nn] _a_r_g_s_._._.
Print the arguments on the standard output, separated by spaces.
Unless the --nn option is present, a newline is output following the
arguments.
If any of the following sequences of characters is encountered
during output, the sequence is not output. Instead, the specified
action is performed:
\b A backspace character is output.
\c Subsequent output is suppressed. This is normally used at
the end of the last argument to suppress the trailing new‐
line that eecchhoo would otherwise output.
\e Outputs an escape character (ESC).
\f Output a form feed.
\n Output a newline character.
\r Output a carriage return.
\t Output a (horizontal) tab character.
\v Output a vertical tab.
\0_d_i_g_i_t_s
Output the character whose value is given by zero to three
octal digits. If there are zero digits, a nul character
is output.
\\ Output a backslash.
All other backslash sequences elicit undefined behaviour.
eval _s_t_r_i_n_g _._._.
Concatenate all the arguments with spaces. Then re-parse and exe‐
cute the command.
exec [_c_o_m_m_a_n_d _a_r_g _._._.]
Unless command is omitted, the shell process is replaced with the
specified program (which must be a real program, not a shell
builtin or function). Any redirections on the eexxeecc command are
marked as permanent, so that they are not undone when the eexxeecc
command finishes.
exit [_e_x_i_t_s_t_a_t_u_s]
Terminate the shell process. If _e_x_i_t_s_t_a_t_u_s is given it is used as
the exit status of the shell; otherwise the exit status of the
preceding command is used.
export _n_a_m_e _._._.
export --pp
The specified names are exported so that they will appear in the
environment of subsequent commands. The only way to un-export a
variable is to unset it. The shell allows the value of a variable
to be set at the same time it is exported by writing
export name=value
With no arguments the export command lists the names of all ex‐
ported variables. With the --pp option specified the output will be
formatted suitably for non-interactive use.
fc [--ee _e_d_i_t_o_r] [_f_i_r_s_t [_l_a_s_t]]
fc --ll [--nnrr] [_f_i_r_s_t [_l_a_s_t]]
fc --ss [_o_l_d_=_n_e_w] [_f_i_r_s_t]
The ffcc builtin lists, or edits and re-executes, commands previ‐
ously entered to an interactive shell.
--ee editor
Use the editor named by editor to edit the commands. The
editor string is a command name, subject to search via the
PATH variable. The value in the FCEDIT variable is used as
a default when --ee is not specified. If FCEDIT is null or
unset, the value of the EDITOR variable is used. If EDITOR
is null or unset, ed(1) is used as the editor.
--ll (ell)
List the commands rather than invoking an editor on them.
The commands are written in the sequence indicated by the
first and last operands, as affected by --rr, with each com‐
mand preceded by the command number.
--nn Suppress command numbers when listing with -l.
--rr Reverse the order of the commands listed (with --ll) or
edited (with neither --ll nor --ss).
--ss Re-execute the command without invoking an editor.
first
last Select the commands to list or edit. The number of previ‐
ous commands that can be accessed are determined by the
value of the HISTSIZE variable. The value of first or last
or both are one of the following:
[+]number
A positive number representing a command number;
command numbers can be displayed with the --ll option.
--nnuummbbeerr
A negative decimal number representing the command
that was executed number of commands previously.
For example, -1 is the immediately previous command.
string
A string indicating the most recently entered command that
begins with that string. If the old=new operand is not
also specified with --ss, the string form of the first oper‐
and cannot contain an embedded equal sign.
The following environment variables affect the execution of fc:
FCEDIT Name of the editor to use.
HISTSIZE The number of previous commands that are accessible.
fg [_j_o_b]
Move the specified job or the current job to the foreground.
getopts _o_p_t_s_t_r_i_n_g _v_a_r
The POSIX ggeettooppttss command, not to be confused with the _B_e_l_l _L_a_b_s
-derived getopt(1).
The first argument should be a series of letters, each of which
may be optionally followed by a colon to indicate that the option
requires an argument. The variable specified is set to the parsed
option.
The ggeettooppttss command deprecates the older getopt(1) utility due to
its handling of arguments containing whitespace.
The ggeettooppttss builtin may be used to obtain options and their argu‐
ments from a list of parameters. When invoked, ggeettooppttss places the
value of the next option from the option string in the list in the
shell variable specified by _v_a_r and its index in the shell vari‐
able OPTIND. When the shell is invoked, OPTIND is initialized to
1. For each option that requires an argument, the ggeettooppttss builtin
will place it in the shell variable OPTARG. If an option is not
allowed for in the _o_p_t_s_t_r_i_n_g, then OPTARG will be unset.
_o_p_t_s_t_r_i_n_g is a string of recognized option letters (see
getopt(3)). If a letter is followed by a colon, the option is ex‐
pected to have an argument which may or may not be separated from
it by white space. If an option character is not found where ex‐
pected, ggeettooppttss will set the variable _v_a_r to a “?”; ggeettooppttss will
then unset OPTARG and write output to standard error. By specify‐
ing a colon as the first character of _o_p_t_s_t_r_i_n_g all errors will be
ignored.
After the last option ggeettooppttss will return a non-zero value and set
_v_a_r to “?”.
The following code fragment shows how one might process the argu‐
ments for a command that can take the options [a] and [b], and the
option [c], which requires an argument.
while getopts abc: f
do
case $f in
a | b) flag=$f;;
c) carg=$OPTARG;;
\?) echo $USAGE; exit 1;;
esac
done
shift `expr $OPTIND - 1`
This code will accept any of the following as equivalent:
cmd -acarg file file
cmd -a -c arg file file
cmd -carg -a file file
cmd -a -carg -- file file
hash --rrvv _c_o_m_m_a_n_d _._._.
The shell maintains a hash table which remembers the locations of
commands. With no arguments whatsoever, the hhaasshh command prints
out the contents of this table. Entries which have not been
looked at since the last ccdd command are marked with an asterisk;
it is possible for these entries to be invalid.
With arguments, the hhaasshh command removes the specified commands
from the hash table (unless they are functions) and then locates
them. With the --vv option, hash prints the locations of the com‐
mands as it finds them. The --rr option causes the hash command to
delete all the entries in the hash table except for functions.
pwd [--LLPP]
builtin command remembers what the current directory is rather
than recomputing it each time. This makes it faster. However, if
the current directory is renamed, the builtin version of ppwwdd will
continue to print the old name for the directory. The --PP option
causes the physical value of the current working directory to be
shown, that is, all symbolic links are resolved to their respec‐
tive values. The --LL option turns off the effect of any preceding
--PP options.
read [--pp _p_r_o_m_p_t] [--rr] _v_a_r_i_a_b_l_e [_._._.]
The prompt is printed if the --pp option is specified and the stan‐
dard input is a terminal. Then a line is read from the standard
input. The trailing newline is deleted from the line and the line
is split as described in the section on word splitting above, and
the pieces are assigned to the variables in order. At least one
variable must be specified. If there are more pieces than vari‐
ables, the remaining pieces (along with the characters in IFS that
separated them) are assigned to the last variable. If there are
more variables than pieces, the remaining variables are assigned
the null string. The rreeaadd builtin will indicate success unless
EOF is encountered on input, in which case failure is returned.
By default, unless the --rr option is specified, the backslash “\”
acts as an escape character, causing the following character to be
treated literally. If a backslash is followed by a newline, the
backslash and the newline will be deleted.
readonly _n_a_m_e _._._.
readonly --pp
The specified names are marked as read only, so that they cannot
be subsequently modified or unset. The shell allows the value of
a variable to be set at the same time it is marked read only by
writing
readonly name=value
With no arguments the readonly command lists the names of all read
only variables. With the --pp option specified the output will be
formatted suitably for non-interactive use.
printf _f_o_r_m_a_t [_a_r_g_u_m_e_n_t_s _._._.]
pprriinnttff formats and prints its arguments, after the first, under
control of the _f_o_r_m_a_t. The _f_o_r_m_a_t is a character string which
contains three types of objects: plain characters, which are sim‐
ply copied to standard output, character escape sequences which
are converted and copied to the standard output, and format speci‐
fications, each of which causes printing of the next successive
_a_r_g_u_m_e_n_t.
The _a_r_g_u_m_e_n_t_s after the first are treated as strings if the corre‐
sponding format is either bb, cc or ss; otherwise it is evaluated as
a C constant, with the following extensions:
•• A leading plus or minus sign is allowed.
•• If the leading character is a single or double quote,
the value is the ASCII code of the next character.
The format string is reused as often as necessary to satisfy the
_a_r_g_u_m_e_n_t_s. Any extra format specifications are evaluated with
zero or the null string.
Character escape sequences are in backslash notation as defined in
ANSI X3.159-1989 (“ANSI C89”). The characters and their meanings
are as follows:
\\aa Write a <bell> character.
\\bb Write a <backspace> character.
\\ee Write an <escape> (ESC) character.
\\ff Write a <form-feed> character.
\\nn Write a <new-line> character.
\\rr Write a <carriage return> character.
\\tt Write a <tab> character.
\\vv Write a <vertical tab> character.
\\\\ Write a backslash character.
\\_n_u_m Write an 8-bit character whose ASCII value is the
1-, 2-, or 3-digit octal number _n_u_m.
Each format specification is introduced by the percent character
(``%''). The remainder of the format specification includes, in
the following order:
Zero or more of the following flags:
## A `#' character specifying that the value should
be printed in an ``alternative form''. For bb, cc,
dd, and ss formats, this option has no effect. For
the oo format the precision of the number is in‐
creased to force the first character of the output
string to a zero. For the xx (XX) format, a non-
zero result has the string 0x (0X) prepended to
it. For ee, EE, ff, gg, and GG formats, the result
will always contain a decimal point, even if no
digits follow the point (normally, a decimal point
only appears in the results of those formats if a
digit follows the decimal point). For gg and GG
formats, trailing zeros are not removed from the
result as they would otherwise be.
-- A minus sign `-' which specifies _l_e_f_t _a_d_j_u_s_t_m_e_n_t
of the output in the indicated field;
++ A `+' character specifying that there should al‐
ways be a sign placed before the number when using
signed formats.
‘ ’ A space specifying that a blank should be left be‐
fore a positive number for a signed format. A `+'
overrides a space if both are used;
00 A zero `0' character indicating that zero-padding
should be used rather than blank-padding. A `-'
overrides a `0' if both are used;
Field Width:
An optional digit string specifying a _f_i_e_l_d _w_i_d_t_h; if the
output string has fewer characters than the field width it
will be blank-padded on the left (or right, if the left-
adjustment indicator has been given) to make up the field
width (note that a leading zero is a flag, but an embedded
zero is part of a field width);
Precision:
An optional period, ‘..’, followed by an optional digit
string giving a _p_r_e_c_i_s_i_o_n which specifies the number of
digits to appear after the decimal point, for ee and ff for‐
mats, or the maximum number of bytes to be printed from a
string (bb and ss formats); if the digit string is missing,
the precision is treated as zero;
Format:
A character which indicates the type of format to use (one
of ddiioouuxxXXffwwEEggGGbbccss).
A field width or precision may be ‘**’ instead of a digit string.
In this case an _a_r_g_u_m_e_n_t supplies the field width or precision.
The format characters and their meanings are:
ddiioouuXXxx The _a_r_g_u_m_e_n_t is printed as a signed decimal (d or i),
unsigned octal, unsigned decimal, or unsigned hexadec‐
imal (X or x), respectively.
ff The _a_r_g_u_m_e_n_t is printed in the style [-]ddd..ddd where
the number of d's after the decimal point is equal to
the precision specification for the argument. If the
precision is missing, 6 digits are given; if the pre‐
cision is explicitly 0, no digits and no decimal point
are printed.
eeEE The _a_r_g_u_m_e_n_t is printed in the style [-]d..dddee±dd
where there is one digit before the decimal point and
the number after is equal to the precision specifica‐
tion for the argument; when the precision is missing,
6 digits are produced. An upper-case E is used for an
`E' format.
ggGG The _a_r_g_u_m_e_n_t is printed in style ff or in style ee (EE)
whichever gives full precision in minimum space.
bb Characters from the string _a_r_g_u_m_e_n_t are printed with
backslash-escape sequences expanded.
The following additional backslash-escape sequences
are supported:
\\cc Causes ddaasshh to ignore any remaining characters
in the string operand containing it, any re‐
maining string operands, and any additional
characters in the format operand.
\\00_n_u_m Write an 8-bit character whose ASCII value is
the 1-, 2-, or 3-digit octal number _n_u_m.
cc The first character of _a_r_g_u_m_e_n_t is printed.
ss Characters from the string _a_r_g_u_m_e_n_t are printed until
the end is reached or until the number of bytes indi‐
cated by the precision specification is reached; if
the precision is omitted, all characters in the string
are printed.
%% Print a `%'; no argument is used.
In no case does a non-existent or small field width cause trunca‐
tion of a field; padding takes place only if the specified field
width exceeds the actual width.
set [{ --ooppttiioonnss | ++ooppttiioonnss | ---- }}] _a_r_g _._._.
The sseett command performs three different functions.
With no arguments, it lists the values of all shell variables.
If options are given, it sets the specified option flags, or
clears them as described in the section called _A_r_g_u_m_e_n_t _L_i_s_t
_P_r_o_c_e_s_s_i_n_g. As a special case, if the option is -o or +o and no
argument is supplied, the shell prints the settings of all its op‐
tions. If the option is -o, the settings are printed in a human-
readable format; if the option is +o, the settings are printed in
a format suitable for reinput to the shell to affect the same op‐
tion settings.
The third use of the set command is to set the values of the
shell's positional parameters to the specified args. To change
the positional parameters without changing any options, use “--”
as the first argument to set. If no args are present, the set
command will clear all the positional parameters (equivalent to
executing “shift $#”.)
shift [_n]
Shift the positional parameters n times. A sshhiifftt sets the value
of _$_1 to the value of _$_2, the value of _$_2 to the value of _$_3, and
so on, decreasing the value of _$_# by one. If n is greater than
the number of positional parameters, sshhiifftt will issue an error
message, and exit with return status 2.
test _e_x_p_r_e_s_s_i_o_n
[ _e_x_p_r_e_s_s_i_o_n ]]
The tteesstt utility evaluates the expression and, if it evaluates to
true, returns a zero (true) exit status; otherwise it returns 1
(false). If there is no expression, test also returns 1 (false).
All operators and flags are separate arguments to the tteesstt util‐
ity.
The following primaries are used to construct expression:
--bb _f_i_l_e True if _f_i_l_e exists and is a block special file.
--cc _f_i_l_e True if _f_i_l_e exists and is a character special file.
--dd _f_i_l_e True if _f_i_l_e exists and is a directory.
--ee _f_i_l_e True if _f_i_l_e exists (regardless of type).
--ff _f_i_l_e True if _f_i_l_e exists and is a regular file.
--gg _f_i_l_e True if _f_i_l_e exists and its set group ID flag is
set.
--hh _f_i_l_e True if _f_i_l_e exists and is a symbolic link.
--kk _f_i_l_e True if _f_i_l_e exists and its sticky bit is set.
--nn _s_t_r_i_n_g True if the length of _s_t_r_i_n_g is nonzero.
--pp _f_i_l_e True if _f_i_l_e is a named pipe (FIFO).
--rr _f_i_l_e True if _f_i_l_e exists and is readable.
--ss _f_i_l_e True if _f_i_l_e exists and has a size greater than
zero.
--tt _f_i_l_e___d_e_s_c_r_i_p_t_o_r
True if the file whose file descriptor number is
_f_i_l_e___d_e_s_c_r_i_p_t_o_r is open and is associated with a
terminal.
--uu _f_i_l_e True if _f_i_l_e exists and its set user ID flag is set.
--ww _f_i_l_e True if _f_i_l_e exists and is writable. True indicates
only that the write flag is on. The file is not
writable on a read-only file system even if this
test indicates true.
--xx _f_i_l_e True if _f_i_l_e exists and is executable. True indi‐
cates only that the execute flag is on. If _f_i_l_e is
a directory, true indicates that _f_i_l_e can be
searched.
--zz _s_t_r_i_n_g True if the length of _s_t_r_i_n_g is zero.
--LL _f_i_l_e True if _f_i_l_e exists and is a symbolic link. This
operator is retained for compatibility with previous
versions of this program. Do not rely on its exis‐
tence; use --hh instead.
--OO _f_i_l_e True if _f_i_l_e exists and its owner matches the effec‐
tive user id of this process.
--GG _f_i_l_e True if _f_i_l_e exists and its group matches the effec‐
tive group id of this process.
--SS _f_i_l_e True if _f_i_l_e exists and is a socket.
_f_i_l_e_1 --nntt _f_i_l_e_2
True if _f_i_l_e_1 and _f_i_l_e_2 exist and _f_i_l_e_1 is newer
than _f_i_l_e_2.
_f_i_l_e_1 --oott _f_i_l_e_2
True if _f_i_l_e_1 and _f_i_l_e_2 exist and _f_i_l_e_1 is older
than _f_i_l_e_2.
_f_i_l_e_1 --eeff _f_i_l_e_2
True if _f_i_l_e_1 and _f_i_l_e_2 exist and refer to the same
file.
_s_t_r_i_n_g True if _s_t_r_i_n_g is not the null string.
_s_1 == _s_2 True if the strings _s_1 and _s_2 are identical.
_s_1 !!== _s_2 True if the strings _s_1 and _s_2 are not identical.
_s_1 << _s_2 True if string _s_1 comes before _s_2 based on the ASCII
value of their characters.
_s_1 >> _s_2 True if string _s_1 comes after _s_2 based on the ASCII
value of their characters.
_n_1 --eeqq _n_2 True if the integers _n_1 and _n_2 are algebraically
equal.
_n_1 --nnee _n_2 True if the integers _n_1 and _n_2 are not algebraically
equal.
_n_1 --ggtt _n_2 True if the integer _n_1 is algebraically greater than
the integer _n_2.
_n_1 --ggee _n_2 True if the integer _n_1 is algebraically greater than
or equal to the integer _n_2.
_n_1 --lltt _n_2 True if the integer _n_1 is algebraically less than
the integer _n_2.
_n_1 --llee _n_2 True if the integer _n_1 is algebraically less than or
equal to the integer _n_2.
These primaries can be combined with the following operators:
!! _e_x_p_r_e_s_s_i_o_n True if _e_x_p_r_e_s_s_i_o_n is false.
_e_x_p_r_e_s_s_i_o_n_1 --aa _e_x_p_r_e_s_s_i_o_n_2
True if both _e_x_p_r_e_s_s_i_o_n_1 and _e_x_p_r_e_s_s_i_o_n_2 are true.
_e_x_p_r_e_s_s_i_o_n_1 --oo _e_x_p_r_e_s_s_i_o_n_2
True if either _e_x_p_r_e_s_s_i_o_n_1 or _e_x_p_r_e_s_s_i_o_n_2 are true.
((_e_x_p_r_e_s_s_i_o_n)) True if expression is true.
The --aa operator has higher precedence than the --oo operator.
times Print the accumulated user and system times for the shell and for
processes run from the shell. The return status is 0.
trap [_a_c_t_i_o_n _s_i_g_n_a_l _._._.]
Cause the shell to parse and execute action when any of the speci‐
fied signals are received. The signals are specified by signal
number or as the name of the signal. If _s_i_g_n_a_l is 0 or EXIT, the
action is executed when the shell exits. _a_c_t_i_o_n may be empty
(''), which causes the specified signals to be ignored. With
_a_c_t_i_o_n omitted or set to `-' the specified signals are set to
their default action. When the shell forks off a subshell, it re‐
sets trapped (but not ignored) signals to the default action. The
ttrraapp command has no effect on signals that were ignored on entry
to the shell. ttrraapp without any arguments cause it to write a list
of signals and their associated action to the standard output in a
format that is suitable as an input to the shell that achieves the
same trapping results.
Examples:
trap
List trapped signals and their corresponding action
trap '' INT QUIT tstp 30
Ignore signals INT QUIT TSTP USR1
trap date INT
Print date upon receiving signal INT
type [_n_a_m_e _._._.]
Interpret each name as a command and print the resolution of the
command search. Possible resolutions are: shell keyword, alias,
shell builtin, command, tracked alias and not found. For aliases
the alias expansion is printed; for commands and tracked aliases
the complete pathname of the command is printed.
ulimit [--HH | --SS] [--aa | --ttffddssccmmllppnnvv [_v_a_l_u_e]]
Inquire about or set the hard or soft limits on processes or set
new limits. The choice between hard limit (which no process is
allowed to violate, and which may not be raised once it has been
lowered) and soft limit (which causes processes to be signaled but
not necessarily killed, and which may be raised) is made with
these flags:
--HH set or inquire about hard limits
--SS set or inquire about soft limits. If neither --HH nor
--SS is specified, the soft limit is displayed or both
limits are set. If both are specified, the last one
wins.
The limit to be interrogated or set, then, is chosen by specifying
any one of these flags:
--aa show all the current limits
--tt show or set the limit on CPU time (in seconds)
--ff show or set the limit on the largest file that can be
created (in 512-byte blocks)
--dd show or set the limit on the data segment size of a
process (in kilobytes)
--ss show or set the limit on the stack size of a process
(in kilobytes)
--cc show or set the limit on the largest core dump size
that can be produced (in 512-byte blocks)
--mm show or set the limit on the total physical memory
that can be in use by a process (in kilobytes)
--ll show or set the limit on how much memory a process can
lock with mlock(2) (in kilobytes)
--pp show or set the limit on the number of processes this
user can have at one time
--nn show or set the limit on the number files a process
can have open at once
--vv show or set the limit on the total virtual memory that
can be in use by a process (in kilobytes)
--rr show or set the limit on the real-time scheduling pri‐
ority of a process
If none of these is specified, it is the limit on file size that
is shown or set. If value is specified, the limit is set to that
number; otherwise the current limit is displayed.
Limits of an arbitrary process can be displayed or set using the
sysctl(8) utility.
umask [_m_a_s_k]
Set the value of umask (see umask(2)) to the specified octal
value. If the argument is omitted, the umask value is printed.
unalias [--aa] [_n_a_m_e]
If _n_a_m_e is specified, the shell removes that alias. If --aa is
specified, all aliases are removed.
unset [--ffvv] _n_a_m_e _._._.
The specified variables and functions are unset and unexported.
If --ff or --vv is specified, the corresponding function or variable
is unset, respectively. If a given name corresponds to both a
variable and a function, and no options are given, only the vari‐
able is unset.
wait [_j_o_b]
Wait for the specified job to complete and return the exit status
of the last process in the job. If the argument is omitted, wait
for all jobs to complete and return an exit status of zero.
CCoommmmaanndd LLiinnee EEddiittiinngg
When ddaasshh is being used interactively from a terminal, the current com‐
mand and the command history (see ffcc in _B_u_i_l_t_i_n_s) can be edited using vi-
mode command-line editing. This mode uses commands, described below,
similar to a subset of those described in the vi man page. The command
‘set -o vi’ enables vi-mode editing and places sh into vi insert mode.
With vi-mode enabled, sh can be switched between insert mode and command
mode. It is similar to vi: typing ⟨ESC⟩ enters vi command mode. Hitting
⟨return⟩ while in command mode will pass the line to the shell.
EEXXIITT SSTTAATTUUSS
Errors that are detected by the shell, such as a syntax error, will cause
the shell to exit with a non-zero exit status. If the shell is not an
interactive shell, the execution of the shell file will be aborted. Oth‐
erwise the shell will return the exit status of the last command exe‐
cuted, or if the exit builtin is used with a numeric argument, it will
return the argument.
EENNVVIIRROONNMMEENNTT
HOME Set automatically by login(1) from the user's login directory
in the password file (passwd(4)). This environment variable
also functions as the default argument for the cd builtin.
PATH The default search path for executables. See the above sec‐
tion _P_a_t_h _S_e_a_r_c_h.
CDPATH The search path used with the cd builtin.
MAIL The name of a mail file, that will be checked for the arrival
of new mail. Overridden by MAILPATH.
MAILCHECK The frequency in seconds that the shell checks for the arrival
of mail in the files specified by the MAILPATH or the MAIL
file. If set to 0, the check will occur at each prompt.
MAILPATH A colon “:” separated list of file names, for the shell to
check for incoming mail. This environment setting overrides
the MAIL setting. There is a maximum of 10 mailboxes that can
be monitored at once.
PS1 The primary prompt string, which defaults to “$ ”, unless you
are the superuser, in which case it defaults to “# ”.
PS2 The secondary prompt string, which defaults to “> ”.
PS4 Output before each line when execution trace (set -x) is en‐
abled, defaults to “+ ”.
IFS Input Field Separators. This is normally set to ⟨space⟩,
⟨tab⟩, and ⟨newline⟩. See the _W_h_i_t_e _S_p_a_c_e _S_p_l_i_t_t_i_n_g section
for more details.
TERM The default terminal setting for the shell. This is inherited
by children of the shell, and is used in the history editing
modes.
HISTSIZE The number of lines in the history buffer for the shell.
PWD The logical value of the current working directory. This is
set by the ccdd command.
OLDPWD The previous logical value of the current working directory.
This is set by the ccdd command.
PPID The process ID of the parent process of the shell.
FFIILLEESS
_$_H_O_M_E_/_._p_r_o_f_i_l_e
_/_e_t_c_/_p_r_o_f_i_l_e
SSEEEE AALLSSOO
csh(1), echo(1), getopt(1), ksh(1), login(1), printf(1), test(1),
getopt(3), passwd(5), environ(7), sysctl(8)
HHIISSTTOORRYY
ddaasshh is a POSIX-compliant implementation of /bin/sh that aims to be as
small as possible. ddaasshh is a direct descendant of the NetBSD version of
ash (the Almquist SHell), ported to Linux in early 1997. It was renamed
to ddaasshh in 2002.
BBUUGGSS
Setuid shell scripts should be avoided at all costs, as they are a sig‐
nificant security risk.
PS1, PS2, and PS4 should be subject to parameter expansion before being
displayed.
BSD January 19, 2003 BSD
Using the ⚠️ man
(manual) command we see that on this mashing sh
points to the dash
shell.
For convenience we will use the cell `` magic in the the rest of the lecture to write our scripts
For figuring out binding of commands to different executables (you can have several python interpreters alongside each other on your system) use ⚠️ which
!which python
/home/mirok/miniconda3/envs/in3110/bin/python
Variables#
Assign a variable by
var=value
(NOTE no spaces around=
!)Retrieve the value of the variable by
${var}
or$var
%%bash
#!/usr/bin/bash
cmd=echo # Functions can be passed around
greet="Hi"
${cmd} ${greet} world $greet!
# Undefined variables result in empty string
${cmd} ${greet} ${world}!
Hi world Hi!
Hi !
There are also special variables defined in the environment. By convention their names are all uppercase. As an example, recall that when running hello_world.sh
above we have specified the full path to the script.
In particular, the following would give an error
! ./scripts/hello_world.sh
hello world!
To fix this problem,
recall the role of PYTHONPATH
in looking up Python modules by the Python interpreter. In fact PYTHONPATH
is environmental variable
! echo $PYTHONPATH # NOTE that this could be empty
Similar role is played by the environmental variable PATH
which
specifies directories to look for program executables.
! echo $PATH
/home/mirok/Documents/Teaching/UiO-IN3110.github.io/lectures/command-line/scripts/scripts:/home/mirok/Documents/Software/AnaMorph/bin:/home/mirok/Documents/Software/visit3_3_3.linux-x86_64/bin:/home/mirok/Documents/Software/Fiji.app:/home/mirok/Documents/Software/julia:/home/mirok/Documents/Software/ParaView-5.11.0-MPI-Linux-Python3.9-x86_64/bin:/home/mirok/miniconda3/envs/in3110/bin:/home/mirok/miniconda3/condabin:/home/mirok/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
What we would like to do to run our script just as hello_world.sh
is to modify the env. Consider the following
%%bash
new_PATH="$PWD/scripts:$PATH"
echo $new_PATH
/home/mirok/Documents/Teaching/UiO-IN3110.github.io/lectures/command-line/scripts:/home/mirok/Documents/Teaching/UiO-IN3110.github.io/lectures/command-line/scripts/scripts:/home/mirok/Documents/Software/AnaMorph/bin:/home/mirok/Documents/Software/visit3_3_3.linux-x86_64/bin:/home/mirok/Documents/Software/Fiji.app:/home/mirok/Documents/Software/julia:/home/mirok/Documents/Software/ParaView-5.11.0-MPI-Linux-Python3.9-x86_64/bin:/home/mirok/miniconda3/envs/in3110/bin:/home/mirok/miniconda3/condabin:/home/mirok/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
Here we have computed the value assigned to new_PATH
by using ⚠️ pwd
command and building up the string. Note that we prepend to the list to get higher precedence to our directory. To update the PATH
we could continue as follows
%%bash
new_PATH="$PWD/scripts:$PATH"
export PATH=$new_PATH # PATH is set
echo $PATH
# Navigate somewhere else so that we don't get lucky
cd $HOME
echo "Now at" $PWD
# Call
echo
hello_world.sh
/home/mirok/Documents/Teaching/UiO-IN3110.github.io/lectures/command-line/scripts:/home/mirok/Documents/Teaching/UiO-IN3110.github.io/lectures/command-line/scripts/scripts:/home/mirok/Documents/Software/AnaMorph/bin:/home/mirok/Documents/Software/visit3_3_3.linux-x86_64/bin:/home/mirok/Documents/Software/Fiji.app:/home/mirok/Documents/Software/julia:/home/mirok/Documents/Software/ParaView-5.11.0-MPI-Linux-Python3.9-x86_64/bin:/home/mirok/miniconda3/envs/in3110/bin:/home/mirok/miniconda3/condabin:/home/mirok/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
Now at /home/mirok
hello world!
Here we have used the command ⚠️ cd
to change the directory to HOME
which is an environment variable holding the user home directory, here
! echo $HOME
/home/mirok
NOTE There is a pitfall in each notebook cell execution is its own process. In particular, the exported variables will not be reflected in the next (not child) processes.
! echo $PATH
/home/mirok/Documents/Software/AnaMorph/bin:/home/mirok/Documents/Software/visit3_3_3.linux-x86_64/bin:/home/mirok/Documents/Software/Fiji.app:/home/mirok/Documents/Software/julia:/home/mirok/Documents/Software/ParaView-5.11.0-MPI-Linux-Python3.9-x86_64/bin:/home/mirok/miniconda3/envs/in3110/bin:/home/mirok/miniconda3/condabin:/home/mirok/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
So we will do this outside in the terminal/in one running shell session. We can put this process to sleep by ctrl+z
. After the setting we can bring it back to (f)ore(g)round by ⚠️ fg
. Alternatively, we can resume the sleeping process in the (b)ack(g)round ⚠️ bg
.
Some other examples of setting variables on computations
%%bash
weekday=$(date +"%A %Y-%m-%d %H:%M:%S") # date +"%A" is a bash command to display the day of the week
echo "Today is $weekday."
Today is onsdag 2023-11-01 12:43:00.
%%bash
# Here we just use a different syntax to get it
files=`ls ..`
echo $files
13_scikit_learn 14-julia-ml about best_practices command-line mixed-programming numerical-python pandas Peer-review information.ipynb production pull-request python regular-expressions tips_and_tricks visualisation web web-servers
As said before command ⚠️ ls
lists content of a directory.
Typed variables#
By default variables are un-typed, and treated as character arrays
%%bash
x=5
x=$x++5
echo $x
5++5
We can be explicit about the type of variable
%%bash
declare -i b # define an integer variable b
a=5
b=$a+5
echo $b
10
Or express that the variable is constant/read-only
%%bash
declare -r r=10
echo $r
r=5
10
bash: line 3: r: readonly variable
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
Cell In [29], line 1
----> 1 get_ipython().run_cell_magic('bash', '', 'declare -r r=10 \necho $r\nr=5\n')
File ~/miniconda3/envs/in3110/lib/python3.9/site-packages/IPython/core/interactiveshell.py:2417, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
2415 with self.builtin_trap:
2416 args = (magic_arg_s, cell)
-> 2417 result = fn(*args, **kwargs)
2418 return result
File ~/miniconda3/envs/in3110/lib/python3.9/site-packages/IPython/core/magics/script.py:153, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell)
151 else:
152 line = script
--> 153 return self.shebang(line, cell)
File ~/miniconda3/envs/in3110/lib/python3.9/site-packages/IPython/core/magics/script.py:305, in ScriptMagics.shebang(self, line, cell)
300 if args.raise_error and p.returncode != 0:
301 # If we get here and p.returncode is still None, we must have
302 # killed it but not yet seen its return code. We don't wait for it,
303 # in case it's stuck in uninterruptible sleep. -9 = SIGKILL
304 rc = p.returncode or -9
--> 305 raise CalledProcessError(rc, cell)
CalledProcessError: Command 'b'declare -r r=10 \necho $r\nr=5\n'' returned non-zero exit status 1.
Bash also support array
type
%%bash
declare -a array=("foo" "bar") # array
echo ${array[0]} # First array value
echo ${array[@]} # All array values
echo ${#array} # !!!Array size
echo ${#array[@]} # But
foo
foo bar
3
2
Flow control and functions#
For flow we shall discuss if
, case
and for
and while
loops
if
statement
%%bash
declare name="Joe2"
# Here we are comparing 2 strings
if [ $name == "Joe" ]
then
echo "Joseph"
else
echo "Don't know"
fi
Don't know
Note [
is not a bracket(for grouping)
%%bash
declare -i -r number=10
# Here we are comparing numbers
if [ $number -gt 10 ] # -eq -le
then
echo "The variable is greater than 10."
else
echo "The variable is at most 10"
fi
The variable is at most 10
We can do if
-elif
branching and the tests can be combined with &&
(AND) or ||
(OR). Below we also introduce parameter expansion { }
to grab substrings or get length of strings and (( )
to perform some simple arithmetic
%%bash
declare name="Blph"
# Joey
if [ $name == "Joe" ]
then
echo Name is Joe
fi
# AND
if [ ${name: 0:1} == "J" ] && [ ${name: -1:1} == "y" ]
then
echo The first letter is J and last is y
# OR
elif [ ${name: 0:1} == "A" ] || [ ${#name} -eq $((2+2)) ]
then
echo The first letter is A or name length is $((1+3))
else
[ ${#name} -eq 5 ] && echo "Don't know for 5 char long name"
fi
# NOTE: we add this "success" expression so that ipython does not complain
# about notzero exit status
# We could also use
exit 0
The first letter is A or name length is 4
⚠️ exit
with status flag/number is used to indicate succesful or failed execution. 0 means success. These is a special variable which captures exit code of the preceeding call
%%bash
name="Joey"
echo 1 ${name: -1:0}
echo 2 ${name: -1:1}
echo 3 ${name: -1:2}
1
2 y
3 y
Let’s illustrate the exit status
%%bash
name="alex" # alexa
[ ${#name} -eq 5 ] && echo "Exec only when name 5"
if [ "$?" == "0" ]
then
echo There was no problem
else
echo There was a problem
fi
There was a problem
There are handy tests for existence of files/directories. For example we can check
%%bash
dir='scripts'
if [ -d $dir ]
then
echo There is $dir directory
cp -r $dir $dir.bk
ls . # . is a current directoy, .. is the one above
echo
if [ -x "$dir/hello_world.sh" ]
then
echo $dir contains executable
fi
fi
There is scripts directory
allPDFs.tar
allPDFs.tar.gz
Bash - interactive lecture.ipynb
Bash - interactive lecture.slides.html
cmdline_bash.ipynb
data
figs
hello-world
hw.sh
Makefile
ottar_scicomm.pdf
results
run_and_test.sh
scripts
scripts.bk
Valgkort_2023.pdf
scripts contains executable
Here we have used the copy command ⚠️ cp
with a -r
recursive switch.
Other test switches
-h
FILE - True if the FILE exists and is a symbolic link.-r
FILE - True if the FILE exists and is readable.-w
FILE - True if the FILE exists and is writable.-x
FILE - True if the FILE exists and is executable.-d
FILE - True if the FILE exists and is a directory.-e
FILE - True if the FILE exists and is a file, regardless of type-f
FILE - True if the FILE exists and is a regular file (not a directory or device)
case
statement
To simplify writing nested if
statements especially if branching is a case analysis/pattern matching we use case
construct. This will be useful e.g. for parsing command line arguments (see later)
%%bash
place="Oslo"
case $place in
Oslo)
m=4;; # ;; indicates end of case
Bergen)
m=5;;
*)
m=-1
esac
echo $m
4
for
loop
Consider this setup where we run over bunch of parameters to perform a “simulation” whose result we want to store
%%bash
experiments="first second third"
dir=results
if [ -d $dir ]
then
echo $dir exists
else
mkdir $dir
fi
declare -i counter
counter=0
for e in $experiments
do
echo running $e
sleep 0.2
touch $e.txt # Touch/create empty file with that name
cp $e.txt $dir # Back it up
rm -vf $e.txt # Remove the original
((counter=counter+1)) # Increase the counter
done
echo Performed $counter experiments
results exists
running first
removed 'first.txt'
running second
removed 'second.txt'
running third
removed 'third.txt'
Performed 3 experiments
Here we have used a make directory command ⚠️ mkdir
, the simulation was mocked up by ⚠️ sleep
command which delays the execution by arg seconds and the results were created by ⚠️ touch
. Finally we removed the original results by ⚠️ rm
.
Previus example illustrates a common situation where the tasks in the loop could execute in parallel as opposed to serial
as done previosly. Lunching the tasks in parallel can be done with &
%%bash
experiments="first second third"
for e in $experiments
do
sleep 1 && echo Launched $e
done
Launched first
Launched second
Launched third
In contrast the parallel execution as expected runs quicker
%%bash
experiments="first second third"
for e in $experiments
do
sleep 1 && echo Launched $e &
done
Launched first
Launched third
Launched second
while
loop
Consider the task of counting lines in a file
%%bash
filename="./data/text.txt"
declare -i count; count=0
echo "Start counting..."
# loop over all lines of file
while read p
do
# echo $p
# increase line counter
((count++))
done < $filename
echo "done"
echo "Number of lines in $file: $count"
wc -l $filename # We compare with a builtin
Start counting...
done
Number of lines in : 13
13 ./data/text.txt
Color printing by setting terminal properties
%%bash
declare -i index; index=1
normal=$(tput setaf 9)
while [ $index -le 4 ]
do
tput setaf $index # Foreground
tput setab $((index+1)) # Background
echo Index is $index
tput setaf 9 # Restore
((index++))
done
Index is 1
Index is 2
Index is 3
Index is 4
Functions#
Functions are declared by function
keyword and called with their name followed by arguments. Note that by default variables inside the function body are global
%%bash
myresult="Nothing"
function greet
{
echo "greet was called"
myresult='some value' # Global
insideresult="What" # Global
}
echo $myresult
greet # Call
echo $myresult $insideresult
Nothing
greet was called
some value What
Arguments of the function can be parsed with special accessors
%%bash
function foo
{
echo "foo called with $# arguments" # $# is the arg count
echo "The first one is $0" # NOTE the zero argument is not the first one from the user!
# $1 $2 etc
# Show all of them
declare -i n; n=1
for arg in $@; do
echo "command-line argument no. $n is <$arg>"
((n++))
done
}
foo This
echo
foo This That
foo called with 1 arguments
The first one is bash
command-line argument no. 1 is <This>
foo called with 2 arguments
The first one is bash
command-line argument no. 1 is <This>
command-line argument no. 2 is <That>
Or we can process them in an array-style
%%bash
function bar
{
while [ $# -gt 0 ]
do
option=$1; # load arg into option
shift; # move $1 pointer
case "$option" in
-n)
name=$1
shift
;;
-a)
age=$1; shift; ;;
*)
echo "$0: invalid option \"$option\""; exit 1;;
esac
done
echo $name is $age years old
}
bar -n "Jim"
#echo
bar -a 30 -n Ana
echo "Exit status "$?
echo
# bar -a 30 -b Ana
Jim is years old
Ana is 30 years old
Exit status 0
Combining bash commands#
Unix processes uses the following three standard streams as preconnected input and output communication channels:
user input is passed to the standard input
STDIN
streamnormal information is passed to the standard output
STDOUT
streamerror information is passed to the standard error
STDERR
stream.
The streams can be redireced
STDOUT
to file
Bash redirects >
pass STDOUT
to a file:
./myscript.sh > myfile.txt
same as above, but appends output to an existing file
./myscript.sh >> myfile.txt
%%bash
chmod u+x ./scripts/hello_world_bang.sh
./scripts/hello_world_bang.sh > ./data/foo.txt
cat ./data/foo.txt
echo
for i in {1..5}
do
./scripts/hello_world_bang.sh >> ./data/foo.txt
done
cat ./data/foo.txt
hello world!
hello world!
hello world!
hello world!
hello world!
hello world!
hello world!
File to STDIN
Use the <
redirect to send a file to STDIN
:
%%bash
wc -w < ./data/text.txt # Count the number of words and print to STDOUT
echo
wc -w < ./data/text.txt > ./data/word_stat.txt # Same as above, but save STDOUT output to file
wc -l < ./data/text.txt > ./data/line_stat.txt
wc -m < ./data/text.txt > ./data/char_stat.txt # Characters
echo
cat ./data/word_stat.txt ./data/line_stat.txt ./data/char_stat.txt
35
35
13
239
⚠️ wc
prints the word(-w
), line(-l
) or character(-m
) counts for a file
You can specify which stream to redirect. [STREAM]>
. Valid values for STREAM
is 1
for stdout, 2
for stderr and &
for both.
./compile_model.sh # stdout and stderr are displayed on the terminal
./compile_model.sh 1> out.txt # Redirect stdout to file, same as >
./compile_model.sh 2> err.txt # Redirect stderr to file
./compile_model.sh &> outerr.txt # Redirect stdout and stderr to file
Combining bash commands: Pipes
The bash pipe |
connects STDOUT
of one command to STDIN
of another. Let’s look at some pipeline examples
Print the file content (here single column data) in a sorted way
! head -5 ./data/names.txt
journal
lineage
excavate
charismatic
rank
%%bash
# Look first how many
wc -l < ./data/names.txt
cat ./data/names.txt | sort
40
aaapath
autonomy
autonomy
biscuit
charismatic
cruel
daughter
decrease
decrease
demonstrator
demonstrator
drawer
excavate
facade
joke
joke
journal
laaaandscape
letter
liability
liability
lineage
lung
magnitude
mall
man
maniac
manipulation
maximum
maximum
missile
noble
paaalace
paaaot
rank
reign
relieve
straaaeam
suggest
suggest
Note that we get a possibly a very long list. To only look at a selection we could extend the pipilene with calls to
⚠️ head
, ⚠️ tail
and ⚠️ more
which “zoom” on beginning, end or yield chunks of the text.
!cat ./data/names.txt | sort | head -3
aaapath
autonomy
autonomy
!cat ./data/names.txt | sort | tail -3
straaaeam
suggest
suggest
# NOTE: not notebook friendly as it expects some user interaction - run in terminal
# cat ./data/names.txt | sort | more -2
Introduce T junction
Buiding on the previous example we might want to only get the count of unique words. This can be accomplised by adding ⚠️ uniq
to the pipiline
! cat ./data/names.txt | sort | uniq | wc -l
33
However, wouldn’t it be useful to have the list of unique words too? This is where ⚠️ tee
comes in, introducing a T junction in the pipeline redirecting the partial output to a file
%%bash
cat ./data/names.txt | sort | uniq | tee ./data/unique_name.txt | wc -l
echo
head -6 ./data/unique_name.txt
33
aaapath
autonomy
biscuit
charismatic
cruel
daughter
Combine with variables
As an example we wish to build news app. We begin by retrieving the data using ⚠️ curl
running in -s
silent mode. Let’s see what we work with
%%bash
#
# Fall back if net is down
cat ./data/nrk_data.txt | grep newsfeed__message-title
<h3 class="kur-newsfeed__message-title">Samlet rester fra Meierigården – søker etter levninger</h3>
<h3 class="kur-newsfeed__message-title">Medier: Tre svensker pågrepet for drap i Bosnia</h3>
<h3 class="kur-newsfeed__message-title">Veskeforbud på større svenske arrangementer</h3>
<h3 class="kur-newsfeed__message-title">Nye angrep mot opprørere nord i Myanmar</h3>
<h3 class="kur-newsfeed__message-title">EUs utenrikssjef uttrykker sjokk etter israelsk angrep mot flyktningleir</h3>
<h3 class="kur-newsfeed__message-title">Mann i 30-årene bedro flere titalls personer med løfter om fotballbilletter</h3>
<h3 class="kur-newsfeed__message-title">Nytt angrep mot flyktningleir i Gaza</h3>
<h3 class="kur-newsfeed__message-title">TV 2 har anmeldt demonstranter som stormet «Skal vi danse»-scenen</h3>
<h3 class="kur-newsfeed__message-title">Irans utenriksminister: – Konsekvensene blir alvorlige</h3>
<h3 class="kur-newsfeed__message-title">Dømt til over fem års fengsel for grov vold i Vika</h3>
Our next step is to extract information from this text. Specifically, we are after the first headline. One possibility is to split (as in Python string) based on some delimiter and working with “fields” / elemnts of the resulting array. This is the functionality of ⚠️ cut -d DELIMITER -fINDEX
! cat ./data/nrk_data.txt | grep newsfeed__message-title | head -1 | cut -d '>' -f2
Samlet rester fra Meierigården – søker etter levninger</h3
Following the same logic we can get
%%bash
title=`cat ./data/nrk_data.txt | grep newsfeed__message-title | head -1 | cut -d '>' -f2 | cut -d '<' -f1`
echo $title
Samlet rester fra Meierigården – søker etter levninger
At this point we know the basics and are in position to “glue” different programs together. We have seen a few already, e.g. cut
, sort
. In the following we cover a few more which could are useful in the scientific workflow.
Text manipulation utilities - grep
, awk
and sed
#
grep
global regular expression print#
Grep searches input file, looks at them line by line, prints if there is a match until there are no more lines. Recall our list or words
! head -10 ./data/names.txt
journal
lineage
excavate
charismatic
rank
missile
biscuit
reign
letter
paaalace
By using grep we can answer questions like:
Are there lines containing “ma”?_
! grep "ma" ./data/names.txt
charismatic
magnitude
man
mall
maniac
maximum
manipulation
maximum
What are the lines and line numbers containing “ma”? (
-n
)
! grep -n "ma" ./data/names.txt
4:charismatic
16:magnitude
21:man
23:mall
25:maniac
26:maximum
27:manipulation
34:maximum
Or which do not (-v
flag for lines that do not match)
! grep -v "ma" ./data/names.txt
journal
lineage
excavate
rank
missile
biscuit
reign
letter
paaalace
paaaot
straaaeam
aaapath
laaaandscape
drawer
lung
noble
relieve
facade
daughter
cruel
suggest
decrease
demonstrator
joke
autonomy
liability
suggest
decrease
demonstrator
joke
autonomy
liability
How many lines match ? (
-c
)
! grep -c "ma" ./data/names.txt
8
Of course we now know that the same could have been accomplised e.g. with pipes
! grep "ma" ./data/names.txt | wc -l
8
There is support for regular expression in the search word. By default it is limited.
# Use -l to print files containing lines with regexp nu* in them
! grep -l "nu*" ../*/*.ipynb
../13_scikit_learn/scikit-learn-1.ipynb
../13_scikit_learn/scikit-learn-1-presentation.ipynb
../13_scikit_learn/scikit-learn-2.ipynb
../14-julia-ml/julia_examples.ipynb
../14-julia-ml/python_examples.ipynb
../14-julia-ml/stokes_pinns.ipynb
../about/About the course.ipynb
../about/Introduction to git.ipynb
../about/Scripting vs regular programming.ipynb
../best_practices/Best practices.ipynb
../command-line/Bash - interactive lecture.ipynb
../command-line/cmdline_bash.ipynb
../mixed-programming/mixed_programming_cython.ipynb
../mixed-programming/mixed_programming_introduction.ipynb
../mixed-programming/Numba.ipynb
../mixed-programming/Profiling and Optimizing with IPython.ipynb
../numerical-python/exercises.ipynb
../numerical-python/numerical_python.ipynb
../numerical-python/python_profiling.ipynb
../pandas/API-exercises.ipynb
../pandas/Pandas_exercises.ipynb
../pandas/Pandas.ipynb
../pandas/PublicAPIs.ipynb
../production/environments.ipynb
../production/sphinx-docs.ipynb
../pull-request/Peer review assignment 5.ipynb
../python/exercises.ipynb
../python/ipython.ipynb
../python/more_python.ipynb
../python/packages_and_testing.ipynb
../python/python_summary-classes.ipynb
../python/python_summary.ipynb
../python/python_summary-typing.ipynb
../regular-expressions/regular-expressions.ipynb
../tips_and_tricks/bash_rc_alias.ipynb
../tips_and_tricks/Builtin Superheroes.ipynb
../tips_and_tricks/git_branches.ipynb
../tips_and_tricks/git_gui.ipynb
../tips_and_tricks/gitignore.ipynb
../tips_and_tricks/git_ssh_keys.ipynb
../tips_and_tricks/ipython_embed.ipynb
../tips_and_tricks/prettier_git.ipynb
../tips_and_tricks/ssh_keys.ipynb
../visualisation/altair.ipynb
../visualisation/corona-data.ipynb
../visualisation/gendata.ipynb
../visualisation/maps.ipynb
../visualisation/matplotlib.ipynb
../visualisation/visualisation.ipynb
../web/Introduction to HTML.ipynb
../web-servers/Introduction to HTML - Forms.ipynb
../web-servers/Introduction to webservers.ipynb
../web-servers/monty-hall-game.ipynb
../web-servers/monty-hall-rest.ipynb
../web/web.ipynb
../web/Web scraping.ipynb
With egrep
we have the full power
! egrep -l "np|numpy|python|import" ../*/*.ipynb
../13_scikit_learn/scikit-learn-1.ipynb
../13_scikit_learn/scikit-learn-1-presentation.ipynb
../13_scikit_learn/scikit-learn-2.ipynb
../14-julia-ml/julia_examples.ipynb
../14-julia-ml/python_examples.ipynb
../14-julia-ml/stokes_pinns.ipynb
../about/About the course.ipynb
../about/Introduction to git.ipynb
../about/Scripting vs regular programming.ipynb
../best_practices/Best practices.ipynb
../command-line/Bash - interactive lecture.ipynb
../command-line/cmdline_bash.ipynb
../mixed-programming/mixed_programming_cython.ipynb
../mixed-programming/mixed_programming_introduction.ipynb
../mixed-programming/Numba.ipynb
../mixed-programming/Profiling and Optimizing with IPython.ipynb
../numerical-python/exercises.ipynb
../numerical-python/numerical_python.ipynb
../numerical-python/python_profiling.ipynb
../pandas/API-exercises.ipynb
../pandas/Pandas_exercises.ipynb
../pandas/Pandas.ipynb
../pandas/PublicAPIs.ipynb
../production/environments.ipynb
../production/sphinx-docs.ipynb
../pull-request/Peer review assignment 5.ipynb
../python/exercises.ipynb
../python/ipython.ipynb
../python/more_python.ipynb
../python/packages_and_testing.ipynb
../python/python_summary-classes.ipynb
../python/python_summary.ipynb
../python/python_summary-typing.ipynb
../regular-expressions/regular-expressions.ipynb
../tips_and_tricks/bash_rc_alias.ipynb
../tips_and_tricks/Builtin Superheroes.ipynb
../tips_and_tricks/git_gui.ipynb
../tips_and_tricks/ipython_embed.ipynb
../tips_and_tricks/prettier_git.ipynb
../visualisation/altair.ipynb
../visualisation/corona-data.ipynb
../visualisation/gendata.ipynb
../visualisation/maps.ipynb
../visualisation/matplotlib.ipynb
../visualisation/visualisation.ipynb
../web/Introduction to HTML.ipynb
../web-servers/Introduction to HTML - Forms.ipynb
../web-servers/Introduction to webservers.ipynb
../web-servers/monty-hall-game.ipynb
../web-servers/monty-hall-rest.ipynb
../web/web.ipynb
../web/Web scraping.ipynb
awk
is a text pattern scanning and processing language. It operates on lines of the input file which it sees as being made of fields marked by a separator. This allows to extract information and do further processing.
Let’s use awk
to extract the file permission column
%%bash
# Unpack this
ls -lrvrh
echo
ls -lrvrh | awk '{print $1}' | head -5
total 912K
drwxrwxr-x 4 mirok mirok 4,0K nov. 1 13:16 scripts.bk
drwxrwxr-x 3 mirok mirok 4,0K nov. 1 12:28 scripts
-rwxrwxr-x 1 mirok mirok 127 sep. 5 16:42 run_and_test.sh
drwxrwxr-x 2 mirok mirok 4,0K nov. 1 10:12 results
-rw-rw-r-- 1 mirok mirok 33K nov. 1 11:02 ottar_scicomm.pdf
-rw-rw-r-- 1 mirok mirok 30 sep. 5 16:42 hw.sh
drwxrwxr-x 2 mirok mirok 4,0K sep. 5 16:42 hello-world
drwxrwxr-x 2 mirok mirok 4,0K sep. 5 16:42 figs
drwxrwxr-x 2 mirok mirok 4,0K nov. 1 10:49 data
-rw-rw-r-- 1 mirok mirok 189K nov. 1 13:48 cmdline_bash.ipynb
-rw-rw-r-- 1 mirok mirok 80K nov. 1 11:02 allPDFs.tar.gz
-rw-rw-r-- 1 mirok mirok 90K nov. 1 11:02 allPDFs.tar
-rw-rw-r-- 1 mirok mirok 51K nov. 1 11:02 Valgkort_2023.pdf
-rw-rw-r-- 1 mirok mirok 180 sep. 5 16:42 Makefile
-rw-rw-r-- 1 mirok mirok 401K sep. 5 16:42 Bash - interactive lecture.slides.html
-rw-rw-r-- 1 mirok mirok 18K sep. 5 16:42 Bash - interactive lecture.ipynb
total
drwxrwxr-x
drwxrwxr-x
-rwxrwxr-x
drwxrwxr-x
Combined with grep we can get the total number of executables
%%bash
ls -l | awk '{print $1}' | egrep "x."
echo
ls -l | awk '{print $1}' | egrep -c "x."
drwxrwxr-x
drwxrwxr-x
drwxrwxr-x
drwxrwxr-x
-rwxrwxr-x
drwxrwxr-x
drwxrwxr-x
7
and cout their size in bytes
# Unpack
!ls -l | awk '{print $1, $5}' | egrep "x." | awk 'BEGIN {sum=0} {sum=sum+$2} END {print sum}'
24703
Of course the delimiter can be specified. For example with a CSV file from the Pandas lecture we would work with a comma separator
%%bash
head -5 ./data/used_car_sales.csv
echo
awk -F "," '{print $1}' ./data/used_car_sales.csv | head -10
"ID","pricesold","yearsold","zipcode","Mileage","Make","Model","Year","Trim","Engine","BodyType","NumCylinders","DriveType"
"121144","3500","2020","430**","101249","Chrysler","300 Series","2006","TOURING","3.5L MPI 24-VALVE HO V6","Sedan","6","RWD"
"155642","29000","2020","386**","25165","Chevrolet","Corvette","2007","","","Coupe","0",""
"59517","4000","2019","33707","210500","Chevrolet","Silverado 2500","2002","LT","6.6L Turbo Diesel Duramax","Crew Cab Pickup","8","4WD"
"56873","10010","2019","01501","21632","Chevrolet","Camaro","1987","","350","Coupe","8","RWD"
"ID"
"121144"
"155642"
"59517"
"56873"
"5550"
"46260"
"73673"
"84557"
"15603"
sed
stream editor allows us to do text transformation on the input stream, e.g. filter, perform substitutions. Here we will run with -e
to embed sed
.
The first usecase we consider is sed -e 's/pattern/substitute/' file
where we run ins s
substitution mode. sed
with consume the stream and for each mathc on a line peform the substition.
!head -8 ./data/names_columns.txt
journal maa
lineage lineage
excavate excavate
charismatic charismatic
rank maa
missile missile
biscuit biscuit
reign reign
! sed -e 's/ma*/XXX/g' ./data/names_columns.txt | head -8
journal XXX
lineage lineage
excavate excavate
charisXXXtic charisXXXtic
rank XXX
XXXissile XXXissile
biscuit biscuit
reign reign
Note that /g
above stands for greedy
execution.Note that /g
above stands for greedy
execution. Can you spot the difference?
! sed -e 's/ma*/XXX/' ./data/names_columns.txt | head -8
journal XXX
lineage lineage
excavate excavate
charisXXXtic charismatic
rank XXX
XXXissile missile
biscuit biscuit
reign reign
We can redirect the output to a new file with
sed -e 's/ma*/XXX/g' ./data/names_columns.txt > ./data/names_modif.txt
or perform the substituion in place
sed -e -i 's/ma*/XXX/g' ./data/names_columns.txt
The patterns can be full on regular expressions. Let’s use sed
to hide numbers from the phone book (where we pretend that all numbers have only 3 digits)
! head -5 ./data/contacts.txt
# This
# is a
# comment
123 joe
333 miro
! sed -e 's/[0-9][0-9][0-9]/xxxy/g' ./data/contacts.txt
# This
# is a
# comment
xxxy joe
xxxy miro
ana
xxxy peter
lucy xxxy
Another usecase is to perform an action on a match. First action we will use is p
for print. Let’s print all the directories using sed
!ls -l | sed -n -e '/^d/ p' # vs no -n
drwxrwxr-x 2 mirok mirok 4096 nov. 1 10:49 data
drwxrwxr-x 2 mirok mirok 4096 sep. 5 16:42 figs
drwxrwxr-x 2 mirok mirok 4096 sep. 5 16:42 hello-world
drwxrwxr-x 2 mirok mirok 4096 nov. 1 10:12 results
drwxrwxr-x 3 mirok mirok 4096 nov. 1 12:28 scripts
drwxrwxr-x 4 mirok mirok 4096 nov. 1 13:16 scripts.bk
Another action is d
for delete. Suppose you would like to remove all the comments (from say your python code)
! sed -e '/^#/ d' ./data/contacts.txt
# We could redirect with > or -i for inplace
123 joe
333 miro
ana
233 peter
lucy 222
sed
also understand line numbers so we could for example delete some 10 lines of the long CSV file
%%bash
echo size before `ls -lrvt ./data/used_car_sales.csv | awk '{print $5}'`
sed -i -e '2,20 d' ./data/used_car_sales.csv
echo size after `ls -lrvt ./data/used_car_sales.csv | awk '{print $5}'`
size before 13057047
size after 13055021
For more information see the nice summary by Matt Probert. We forgot to emphasize that ⚠️ grep
, awk
and sed
are more additions to our family of seen programs/commands.
File manipulation utilities - find
, tar
and gzip
#
Assume that we have run some analysis on remote machine. When the computations are done we would like to gather the data and compress them for easier transfer.
⚠️ find
visits all files in a directory tree and can execute one or more commands for every file
find source [specifiers]
We can specify the name
and type
(regular (f)ile, (d)irectory)
! find ./scripts/ -name "hello*" -type f
./scripts/hello_world_bang.sh
./scripts/hello_world.sh
In case the source tree is very deep it is good idea to limit the depth of the tree traveral
! find $HOME -maxdepth 2 -name "*.py" -type f
/home/mirok/Downloads/ip_hdg_poisson.py
/home/mirok/Downloads/emi_test_single.py
/home/mirok/Downloads/rami_mesh_refine.py
/home/mirok/Downloads/train_x2.py
/home/mirok/Downloads/hdg_primal_poisson.py
/home/mirok/Downloads/stokes-3d.py
/home/mirok/Downloads/clement.py
/home/mirok/Downloads/stokes-3d(1).py
/home/mirok/Downloads/darcy_robin_dirichlet.py
/home/mirok/Downloads/single_test.py
The name specifier can combine several filters
# Or find all log and PDF files
! find $HOME -maxdepth 1 \( -name '*.log' -o -name '*.pdf' \) -type f
/home/mirok/ottar_scicomm.pdf
/home/mirok/Valgkort_2023.pdf
/home/mirok/burgas_vienna.pdf
/home/mirok/ottar_scicomm.log
We can also run a command for each file:
find rootdir -name filenamespec -exec command {} \; -print
# {} is the current filename
Let’s use this to print a more detailed info about the file
!find $HOME -maxdepth 1 \( -name '*.log' -o -name '*.pdf' \) -type f -exec ls -lrvt {} \;
-rw-rw-r-- 1 mirok mirok 33646 aug. 23 12:44 /home/mirok/ottar_scicomm.pdf
-rw-rw-r-- 1 mirok mirok 52093 aug. 2 10:31 /home/mirok/Valgkort_2023.pdf
-rw-rw-r-- 1 mirok mirok 28948 juni 23 09:27 /home/mirok/burgas_vienna.pdf
-rw-rw-r-- 1 mirok mirok 11409 aug. 23 12:44 /home/mirok/ottar_scicomm.log
We can perform several actions. Below we copy cp
the file in addition to printing some more info.
%%bash
find $HOME -maxdepth 1 \( -name '*.log' -o -name '*.pdf' \) -type f -size +30k -exec ls -lrvt {} \; -exec cp "{}" . \;
ls *.pdf
-rw-rw-r-- 1 mirok mirok 33646 aug. 23 12:44 /home/mirok/ottar_scicomm.pdf
-rw-rw-r-- 1 mirok mirok 52093 aug. 2 10:31 /home/mirok/Valgkort_2023.pdf
ottar_scicomm.pdf
Valgkort_2023.pdf
Note that we have narrowed the search by a size
specifier. The unit above is k(ilobytes).
Now that we can find things. Let’s compress them.
The ⚠️ tar
command can pack single files or all files in a directory tree into one file, which can be unpacked later.
tar -cvf myfiles.tar mytree file1 file2
# dest sources
# options:
# c: pack, v: list name of files, f: pack into file
# unpack the mytree tree and the files file1 and file2:
tar -xvf myfiles.tar
# options:
# x: extract (unpack)
The tarfile can be compressed with ⚠️ gzip
gzip mytar.tar
# result: mytar.tar.gz
Let’s deal with these PDFs that are lying around
%%bash
[ -e allPDFs.tar ] && rm allPDFs.tar
[ -e allPDFs.tar.gz ] && rm allPDFs.tar.gz
tar -cvf allPDFs.tar `find . -name '*.pdf' -print`
gzip -k allPDFs.tar
echo
ls -lrvth allPDFs.*
./ottar_scicomm.pdf
./Valgkort_2023.pdf
-rw-rw-r-- 1 mirok mirok 80K nov. 1 11:02 allPDFs.tar.gz
-rw-rw-r-- 1 mirok mirok 90K nov. 1 11:02 allPDFs.tar
Here we have ran gzip
with -k
keep flag, otherwise the tar file would be removed.
We started this section assuming the scenario that we find ourselves on some remote machine. How do we get there?
Remote connection utilities#
Here are some commands that come in handy when working with remote machines. They are all ⚠️
ping
is the machine connected?ssh
to connect over SSH,-X
or-Y
switch for window forwarding, i.e. graphics
ssh username@hostname
scp
secured copy,-r
for directories
ssh username@hostname:/path/to/source destination
hostname
how is the machine called?whoami
what is my user namewho
who else is logged inps
what are the running processestop
see how much resources are used
We demo most of the above commands outside in the terminal. We make one exception below to see some of the concepts discussed today in action
%%bash
# evalApply is name of my machine and I have SSH server runing on it
machine=evalApply
ping $machine -c 1 &> /dev/null
if [ $? -gt 0 ]; then
echo Connection to $machine cannot be established
else
echo Connection to $machine can be established
fi
Connection to evalApply can be established
Plotting utilities#
Now that we have data we may want to do some visual exploration. One option is to GNUPlot. Note that the program does not ship with Ubuntu by default and needs to be installed. Gnuplot offers interactive plotting (somewhat like building up the plot in ipython). It can also exacure scripts. For example, below is a rather intuite way of producing a plot from data
plot "data1_leg.txt" using 1:2 title 'L0' with linespoints lt 3 lc rgb 'red', \
"data2_leg.txt" using 1:3 title 'L1' with linespoints
This can be entered on a prompt when gnuplot is running
gnuplot
gnuplot> COMMANDS HERE
or if we have stored the source in a file, say foo.txt
, we can get the plot by gnuplot -p foo.txt
. Nice feature of GNUPlot is the ability to generate plots for LaTex.
Note that GNUPlot is not limited to line plots, cf. the gallery of examples
!./scripts/tori.gplot
/bin/bash: ./scripts/tori.gplot: /usr/bin/gnuplot: bad interpreter: No such file or directory