Bash programming and Linux command-line tools#

In our course IN3110 – Problemløsning med høynivå-språk we have so far only used Python (mostly; Cython). In this lecture we will add to the family of languages as we will discuss shell(Bash) scripting. The takehome message is that (simple) scripts combined with other command-line utities can provide elegant solutions and powerful pre/processing pipelines for processing data.

A bit of history - there were/are many shells#

  • 1979: Bourne shell (sh)

  • 1978: C and TC shell (csh and tcsh)

  • 1989: Bourne Again shell (bash)

  • Bash derivatives:

    • 1983: Korn shell (ksh),

    • 1990: Z shell (zsh)

    • 2002: Dash (dash),

Why learn Bash?#

  • Learning Bash means learning the roots of scripting

  • Bash, are frequently encountered on Unix systems

  • Bash is the dominating command interpreter and scripting language

Shell scripts evolve naturally from a workflow:

  1. A sequence of commands you use often are placed in a file

  2. Command-line options are introduced to enable different options to be passed to the commands

  3. Introducing variables, if tests, loops enables more complex program flow

  4. At some point pre- and postprocessing becomes too advanced for bash, at which point (parts of) the script should be ported to Python or other tools

In this lecture we imagine that we find ourselves working e.g. on some Linux cluster where we cannot get the admin permission to install Python modules or text editors we have available on our machines. We will try to get things done with utilities that are commonly installed by default.

What Bash is good for#

  • File and directory management

  • Systems management (build scripts)

  • Combining other scripts and commands

  • Rapid prototyping of more advanced scripts

  • Very simple output processing, plotting etc.

Some common tasks in Bash#

  • file writing and managing files and directories (creation, deletion, renaming)

  • for-loops

  • running an application

  • combining applications (pipes)

  • file globbing, testing file types

What Bash is not good for#

  • Cross-platform portability

  • Graphics, GUIs

  • Interface with libraries or legacy code

  • More advanced post processing and plotting

  • Calculations, math etc.

Installation#

  • All our examples can be run under Bash, and many in the Bourne shell

  • Differences in operating systems:

    • Mac OSX: /bin/sh is just a link to Bash (/bin/bash).

    • Ubuntu: /bin/sh is a link to Dash, a minimal, but much faster shell than bash. Alternatively /bin/bash

    • Windows: bash is available through cygwin or the Linux-Subsystem in Windows 10.

Use within jupyter notebooks: We will use line magic ! or cell magic `` to run the shell commands in the notebook.

Alternatively We can install a bash kernel and use it within the notebook (Kernel>Set kernel)

Bash tutorial#

You will see a number of Bash/Unix commands in this lecture. The new commands will be highlighted with a ⚠️ .

!echo "Hello from bash"
Hello from bash

Function is called by giving its named followed by arguments. ⚠️ echo prints text to screen.

We could write the above source code into a source file, here ./scripts/hello_world.sh

VIM intermezzo#

To stick to our scenario of being stuck on a cluster where there is no VScode/SublimeText and what not let us use VIM for editing. VIM is a powerful text editor (i(M)proving it predecessor VI editor) - here we will only scratch its surface (no macros, advanced search). In some sense the philosphy behind VIM is that a painter first picks his instrument (mode selection), places it on the canvas (navigation) before starting to draw (e.g. editing).

Navigation ESC to leave the current mode. Then press

  • 0 jump to line beginning

  • $ jump to line end

  • h, l, j, k to move left, right, down or up

  • gg to jump to the start of the file, or G to jump to the end

  • w to jump forward a word or b to move back a word

Manipulation/Editing

  • Pressing i enters edit mode (you can type as you want)

  • Pressing x, dw, dd deletes respectivel a character, word or entire line

  • Pressing NUMBER before the command in general repeats it NUMBER times

  • Pressing . repeats the previous action

  • ctrl+a jumps to the end of the line and enters edit mode

  • s (substitute) deletes the character under cursor and enters edit mode

  • u undoes

  • ‘v’ enters visual mode

  • :w saves the buffer to file

Search

  • / enters search mode. After specifying the pattern pressing n will move forward to the next match, while N searches backward

Exiting

  • :q or :q!

A great reference to learn more about VIM is the book Practical VIM: Edit Text at the Speed of Thought.

Back to Bash#

! cat ./scripts/hello_world.sh
#!/bin/bash
# This is a regular comment line
echo "hello world!"

Here the lines starting with hash # interpreted as comments.

Above we have used ⚠️ cat command to view the file content. Later we will see that it can be used for reading and writing too.

Now we could try to run the script only to find that we get and error

! ./scripts/hello_world.sh
hello world!

The issue is that the file is not executable. We can see this with ⚠️ ls command (where we specify the “-l” flag to get a long output)

! ls -l ./scripts/hello_world.sh
-rwxrw-r-- 1 mirok mirok 65 sep.   5 16:42 ./scripts/hello_world.sh

The permisions are r(ead), w(rite), x(execute) and are specified gor user groups owner(u)/group(g)/other(o).

For fix we use the ⚠️ chmod command. In particular, below we add execution permission to the user (group)

%%bash
chmod u+x ./scripts/hello_world.sh
ls -l ./scripts/hello_world.sh
-rwxrwxr-x 1 mirok mirok 65 sep.   5 16:42 ./scripts/hello_world.sh

Now we can finally execute

! ./scripts/hello_world.sh
hello world!

Now that the code run we could ask about who actually run/interpreted it. Bash uses itself as default interpreter, if not otherwise specified. We can be explicit about the interpreter:

! cat scripts/hello_world_bang.sh
#!/bin/bash
# This is a regular comment line
echo "hello world!"
print?

Observe that the first line starting with shebang, i.e. #! specifies the interpreter to use for the script. The second line, starting with the hash, #, is a comment.

Note We could have specified a different interpreter/shell as by giving instead the first line /usr/bin/sh. Let’s see what sort of shell that is

! man sh
DASH(1)                   BSD General Commands Manual                  DASH(1)

NAME
     dash — command interpreter (shell)

SYNOPSIS
     dash [-aCefnuvxIimqVEbp] [+aCefnuvxIimqVEbp] [-o option_name]
          [+o option_name] [command_file [argument ...]]
     dash -c [-aCefnuvxIimqVEbp] [+aCefnuvxIimqVEbp] [-o option_name]
          [+o option_name] command_string [command_name [argument ...]]
     dash -s [-aCefnuvxIimqVEbp] [+aCefnuvxIimqVEbp] [-o option_name]
          [+o option_name] [argument ...]

DESCRIPTION
     dash is the standard command interpreter for the system.  The current
     version of dash is in the process of being changed to conform with the
     POSIX 1003.2 and 1003.2a specifications for the shell.  This version has
     many features which make it appear similar in some respects to the Korn
     shell, but it is not a Korn shell clone (see ksh(1)).  Only features des‐
     ignated by POSIX, plus a few Berkeley extensions, are being incorporated
     into this shell.  This man page is not intended to be a tutorial or a
     complete specification of the shell.

   Overview
     The shell is a command that reads lines from either a file or the termi‐
     nal, interprets them, and generally executes other commands.  It is the
     program that is running when a user logs into the system (although a user
     can select a different shell with the chsh(1) command).  The shell imple‐
     ments a language that has flow control constructs, a macro facility that
     provides a variety of features in addition to data storage, along with
     built in history and line editing capabilities.  It incorporates many
     features to aid interactive use and has the advantage that the interpre‐
     tative language is common to both interactive and non-interactive use
     (shell scripts).  That is, commands can be typed directly to the running
     shell or can be put into a file and the file can be executed directly by
     the shell.

   Invocation
     If no args are present and if the standard input of the shell is con‐
     nected to a terminal (or if the -i flag is set), and the -c option is not
     present, the shell is considered an interactive shell.  An interactive
     shell generally prompts before each command and handles programming and
     command errors differently (as described below).  When first starting,
     the shell inspects argument 0, and if it begins with a dash ‘-’, the
     shell is also considered a login shell.  This is normally done automati‐
     cally by the system when the user first logs in.  A login shell first
     reads commands from the files /etc/profile and .profile if they exist.
     If the environment variable ENV is set on entry to an interactive shell,
     or is set in the .profile of a login shell, the shell next reads commands
     from the file named in ENV.  Therefore, a user should place commands that
     are to be executed only at login time in the .profile file, and commands
     that are executed for every interactive shell inside the ENV file.  To
     set the ENV variable to some file, place the following line in your
     .profile of your home directory

           ENV=$HOME/.shinit; export ENV

     substituting for “.shinit” any filename you wish.

     If command line arguments besides the options have been specified, then
     the shell treats the first argument as the name of a file from which to
     read commands (a shell script), and the remaining arguments are set as
     the positional parameters of the shell ($1, $2, etc).  Otherwise, the
     shell reads commands from its standard input.

   Argument List Processing
     All of the single letter options that have a corresponding name can be
     used as an argument to the -o option.  The set -o name is provided next
     to the single letter option in the description below.  Specifying a dash
     “-” turns the option on, while using a plus “+” disables the option.  The
     following options can be set from the command line or with the set
     builtin (described later).

           -a allexport     Export all variables assigned to.

           -c               Read commands from the command_string operand in‐
                            stead of from the standard input.  Special parame‐
                            ter 0 will be set from the command_name operand
                            and the positional parameters ($1, $2, etc.)  set
                            from the remaining argument operands.

           -C noclobber     Don't overwrite existing files with “>”.

           -e errexit       If not interactive, exit immediately if any
                            untested command fails.  The exit status of a com‐
                            mand is considered to be explicitly tested if the
                            command is used to control an if, elif, while, or
                            until; or if the command is the left hand operand
                            of an “&&” or “||” operator.

           -f noglob        Disable pathname expansion.

           -n noexec        If not interactive, read commands but do not exe‐
                            cute them.  This is useful for checking the syntax
                            of shell scripts.

           -u nounset       Write a message to standard error when attempting
                            to expand a variable that is not set, and if the
                            shell is not interactive, exit immediately.

           -v verbose       The shell writes its input to standard error as it
                            is read.  Useful for debugging.

           -x xtrace        Write each command to standard error (preceded by
                            a ‘+ ’) before it is executed.  Useful for debug‐
                            ging.

           -I ignoreeof     Ignore EOF's from input when interactive.

           -i interactive   Force the shell to behave interactively.

           -l               Make dash act as if it had been invoked as a login
                            shell.

           -m monitor       Turn on job control (set automatically when inter‐
                            active).

           -s stdin         Read commands from standard input (set automati‐
                            cally if no file arguments are present).  This op‐
                            tion has no effect when set after the shell has
                            already started running (i.e. with set).

           -V vi            Enable the built-in vi(1) command line editor
                            (disables -E if it has been set).

           -E emacs         Enable the built-in emacs(1) command line editor
                            (disables -V if it has been set).

           -b notify        Enable asynchronous notification of background job
                            completion.  (UNIMPLEMENTED for 4.4alpha)

           -p priviliged    Do not attempt to reset effective uid if it does
                            not match uid. This is not set by default to help
                            avoid incorrect usage by setuid root programs via
                            system(3) or popen(3).

   Lexical Structure
     The shell reads input in terms of lines from a file and breaks it up into
     words at whitespace (blanks and tabs), and at certain sequences of char‐
     acters that are special to the shell called “operators”.  There are two
     types of operators: control operators and redirection operators (their
     meaning is discussed later).  Following is a list of operators:

           Control operators:
                 & && ( ) ; ;; | || <newline>

           Redirection operators:
                 < > >| << >> <& >& <<- <>

   Quoting
     Quoting is used to remove the special meaning of certain characters or
     words to the shell, such as operators, whitespace, or keywords.  There
     are three types of quoting: matched single quotes, matched double quotes,
     and backslash.

   Backslash
     A backslash preserves the literal meaning of the following character,
     with the exception of ⟨newline⟩.  A backslash preceding a ⟨newline⟩ is
     treated as a line continuation.

   Single Quotes
     Enclosing characters in single quotes preserves the literal meaning of
     all the characters (except single quotes, making it impossible to put
     single-quotes in a single-quoted string).

   Double Quotes
     Enclosing characters within double quotes preserves the literal meaning
     of all characters except dollarsign ($), backquote (`), and backslash
     (\).  The backslash inside double quotes is historically weird, and
     serves to quote only the following characters:
           $ ` " \ <newline>.
     Otherwise it remains literal.

   Reserved Words
     Reserved words are words that have special meaning to the shell and are
     recognized at the beginning of a line and after a control operator.  The
     following are reserved words:

           !       elif    fi      while   case
           else    for     then    {       }
           do      done    until   if      esac

     Their meaning is discussed later.

   Aliases
     An alias is a name and corresponding value set using the alias(1) builtin
     command.  Whenever a reserved word may occur (see above), and after
     checking for reserved words, the shell checks the word to see if it
     matches an alias.  If it does, it replaces it in the input stream with
     its value.  For example, if there is an alias called “lf” with the value
     “ls -F”, then the input:

           lf foobar ⟨return⟩

     would become

           ls -F foobar ⟨return⟩

     Aliases provide a convenient way for naive users to create shorthands for
     commands without having to learn how to create functions with arguments.
     They can also be used to create lexically obscure code.  This use is dis‐
     couraged.

   Commands
     The shell interprets the words it reads according to a language, the
     specification of which is outside the scope of this man page (refer to
     the BNF in the POSIX 1003.2 document).  Essentially though, a line is
     read and if the first word of the line (or after a control operator) is
     not a reserved word, then the shell has recognized a simple command.
     Otherwise, a complex command or some other special construct may have
     been recognized.

   Simple Commands
     If a simple command has been recognized, the shell performs the following
     actions:

           1.   Leading words of the form “name=value” are stripped off and
                assigned to the environment of the simple command.  Redirect‐
                ion operators and their arguments (as described below) are
                stripped off and saved for processing.

           2.   The remaining words are expanded as described in the section
                called “Expansions”, and the first remaining word is consid‐
                ered the command name and the command is located.  The remain‐
                ing words are considered the arguments of the command.  If no
                command name resulted, then the “name=value” variable assign‐
                ments recognized in item 1 affect the current shell.

           3.   Redirections are performed as described in the next section.

   Redirections
     Redirections are used to change where a command reads its input or sends
     its output.  In general, redirections open, close, or duplicate an exist‐
     ing reference to a file.  The overall format used for redirection is:

           [n] redir-op file

     where redir-op is one of the redirection operators mentioned previously.
     Following is a list of the possible redirections.  The [n] is an optional
     number between 0 and 9, as in ‘3’ (not ‘[3]’), that refers to a file de‐
     scriptor.

           [n]> file   Redirect standard output (or n) to file.

           [n]>| file  Same, but override the -C option.

           [n]>> file  Append standard output (or n) to file.

           [n]< file   Redirect standard input (or n) from file.

           [n1]<&n2    Copy file descriptor n2 as stdout (or fd n1).  fd n2.

           [n]<&-      Close standard input (or n).

           [n1]>&n2    Copy file descriptor n2 as stdin (or fd n1).  fd n2.

           [n]>&-      Close standard output (or n).

           [n]<> file  Open file for reading and writing on standard input (or
                       n).

     The following redirection is often called a “here-document”.

           [n]<< delimiter
                 here-doc-text ...
           delimiter

     All the text on successive lines up to the delimiter is saved away and
     made available to the command on standard input, or file descriptor n if
     it is specified.  If the delimiter as specified on the initial line is
     quoted, then the here-doc-text is treated literally, otherwise the text
     is subjected to parameter expansion, command substitution, and arithmetic
     expansion (as described in the section on “Expansions”).  If the operator
     is “<<-” instead of “<<”, then leading tabs in the here-doc-text are
     stripped.

   Search and Execution
     There are three types of commands: shell functions, builtin commands, and
     normal programs – and the command is searched for (by name) in that or‐
     der.  They each are executed in a different way.

     When a shell function is executed, all of the shell positional parameters
     (except $0, which remains unchanged) are set to the arguments of the
     shell function.  The variables which are explicitly placed in the envi‐
     ronment of the command (by placing assignments to them before the func‐
     tion name) are made local to the function and are set to the values
     given.  Then the command given in the function definition is executed.
     The positional parameters are restored to their original values when the
     command completes.  This all occurs within the current shell.

     Shell builtins are executed internally to the shell, without spawning a
     new process.

     Otherwise, if the command name doesn't match a function or builtin, the
     command is searched for as a normal program in the file system (as de‐
     scribed in the next section).  When a normal program is executed, the
     shell runs the program, passing the arguments and the environment to the
     program.  If the program is not a normal executable file (i.e., if it
     does not begin with the "magic number" whose ASCII representation is
     "#!", so execve(2) returns ENOEXEC then) the shell will interpret the
     program in a subshell.  The child shell will reinitialize itself in this
     case, so that the effect will be as if a new shell had been invoked to
     handle the ad-hoc shell script, except that the location of hashed com‐
     mands located in the parent shell will be remembered by the child.

     Note that previous versions of this document and the source code itself
     misleadingly and sporadically refer to a shell script without a magic
     number as a "shell procedure".

   Path Search
     When locating a command, the shell first looks to see if it has a shell
     function by that name.  Then it looks for a builtin command by that name.
     If a builtin command is not found, one of two things happen:

     1.   Command names containing a slash are simply executed without per‐
          forming any searches.

     2.   The shell searches each entry in PATH in turn for the command.  The
          value of the PATH variable should be a series of entries separated
          by colons.  Each entry consists of a directory name.  The current
          directory may be indicated implicitly by an empty directory name, or
          explicitly by a single period.

   Command Exit Status
     Each command has an exit status that can influence the behaviour of other
     shell commands.  The paradigm is that a command exits with zero for nor‐
     mal or success, and non-zero for failure, error, or a false indication.
     The man page for each command should indicate the various exit codes and
     what they mean.  Additionally, the builtin commands return exit codes, as
     does an executed shell function.

     If a command consists entirely of variable assignments then the exit sta‐
     tus of the command is that of the last command substitution if any, oth‐
     erwise 0.

   Complex Commands
     Complex commands are combinations of simple commands with control opera‐
     tors or reserved words, together creating a larger complex command.  More
     generally, a command is one of the following:

     •   simple command

     •   pipeline

     •   list or compound-list

     •   compound command

     •   function definition

     Unless otherwise stated, the exit status of a command is that of the last
     simple command executed by the command.

   Pipelines
     A pipeline is a sequence of one or more commands separated by the control
     operator |.  The standard output of all but the last command is connected
     to the standard input of the next command.  The standard output of the
     last command is inherited from the shell, as usual.

     The format for a pipeline is:

           [!] command1 [| command2 ...]

     The standard output of command1 is connected to the standard input of
     command2.  The standard input, standard output, or both of a command is
     considered to be assigned by the pipeline before any redirection speci‐
     fied by redirection operators that are part of the command.

     If the pipeline is not in the background (discussed later), the shell
     waits for all commands to complete.

     If the reserved word ! does not precede the pipeline, the exit status is
     the exit status of the last command specified in the pipeline.  Other‐
     wise, the exit status is the logical NOT of the exit status of the last
     command.  That is, if the last command returns zero, the exit status is
     1; if the last command returns greater than zero, the exit status is
     zero.

     Because pipeline assignment of standard input or standard output or both
     takes place before redirection, it can be modified by redirection.  For
     example:

           $ command1 2>&1 | command2

     sends both the standard output and standard error of command1 to the
     standard input of command2.

     A ; or ⟨newline⟩ terminator causes the preceding AND-OR-list (described
     next) to be executed sequentially; a & causes asynchronous execution of
     the preceding AND-OR-list.

     Note that unlike some other shells, each process in the pipeline is a
     child of the invoking shell (unless it is a shell builtin, in which case
     it executes in the current shell – but any effect it has on the environ‐
     ment is wiped).

   Background Commands – &
     If a command is terminated by the control operator ampersand (&), the
     shell executes the command asynchronously – that is, the shell does not
     wait for the command to finish before executing the next command.

     The format for running a command in background is:

           command1 & [command2 & ...]

     If the shell is not interactive, the standard input of an asynchronous
     command is set to /dev/null.

   Lists – Generally Speaking
     A list is a sequence of zero or more commands separated by newlines,
     semicolons, or ampersands, and optionally terminated by one of these
     three characters.  The commands in a list are executed in the order they
     are written.  If command is followed by an ampersand, the shell starts
     the command and immediately proceeds onto the next command; otherwise it
     waits for the command to terminate before proceeding to the next one.

   Short-Circuit List Operators
     “&&” and “||” are AND-OR list operators.  “&&” executes the first com‐
     mand, and then executes the second command if and only if the exit status
     of the first command is zero.  “||” is similar, but executes the second
     command if and only if the exit status of the first command is nonzero.
     “&&” and “||” both have the same priority.

   Flow-Control Constructs – if, while, for, case
     The syntax of the if command is

           if list
           then list
           [ elif list
           then    list ] ...
           [ else list ]
           fi

     The syntax of the while command is

           while list
           do   list
           done

     The two lists are executed repeatedly while the exit status of the first
     list is zero.  The until command is similar, but has the word until in
     place of while, which causes it to repeat until the exit status of the
     first list is zero.

     The syntax of the for command is

           for variable [ in [ word ... ] ]
           do   list
           done

     The words following in are expanded, and then the list is executed re‐
     peatedly with the variable set to each word in turn.  Omitting in word
     ... is equivalent to in "$@".

     The syntax of the break and continue command is

           break [ num ]
           continue [ num ]

     Break terminates the num innermost for or while loops.  Continue contin‐
     ues with the next iteration of the innermost loop.  These are implemented
     as builtin commands.

     The syntax of the case command is

           case word in
           [(]pattern) list ;;
           ...
           esac

     The pattern can actually be one or more patterns (see Shell Patterns de‐
     scribed later), separated by “|” characters.  The “(” character before
     the pattern is optional.

   Grouping Commands Together
     Commands may be grouped by writing either

           (list)

     or

           { list; }

     The first of these executes the commands in a subshell.  Builtin commands
     grouped into a (list) will not affect the current shell.  The second form
     does not fork another shell so is slightly more efficient.  Grouping com‐
     mands together this way allows you to redirect their output as though
     they were one program:

           { printf " hello " ; printf " world\n" ; } > greeting

     Note that “}” must follow a control operator (here, “;”) so that it is
     recognized as a reserved word and not as another command argument.

   Functions
     The syntax of a function definition is

           name () command

     A function definition is an executable statement; when executed it in‐
     stalls a function named name and returns an exit status of zero.  The
     command is normally a list enclosed between “{” and “}”.

     Variables may be declared to be local to a function by using a local com‐
     mand.  This should appear as the first statement of a function, and the
     syntax is

           local [variable | -] ...

     Local is implemented as a builtin command.

     When a variable is made local, it inherits the initial value and exported
     and readonly flags from the variable with the same name in the surround‐
     ing scope, if there is one.  Otherwise, the variable is initially unset.
     The shell uses dynamic scoping, so that if you make the variable x local
     to function f, which then calls function g, references to the variable x
     made inside g will refer to the variable x declared inside f, not to the
     global variable named x.

     The only special parameter that can be made local is “-”.  Making “-” lo‐
     cal any shell options that are changed via the set command inside the
     function to be restored to their original values when the function re‐
     turns.

     The syntax of the return command is

           return [exitstatus]

     It terminates the currently executing function.  Return is implemented as
     a builtin command.

   Variables and Parameters
     The shell maintains a set of parameters.  A parameter denoted by a name
     is called a variable.  When starting up, the shell turns all the environ‐
     ment variables into shell variables.  New variables can be set using the
     form

           name=value

     Variables set by the user must have a name consisting solely of alphabet‐
     ics, numerics, and underscores - the first of which must not be numeric.
     A parameter can also be denoted by a number or a special character as ex‐
     plained below.

   Positional Parameters
     A positional parameter is a parameter denoted by a number (n > 0).  The
     shell sets these initially to the values of its command line arguments
     that follow the name of the shell script.  The set builtin can also be
     used to set or reset them.

   Special Parameters
     A special parameter is a parameter denoted by one of the following spe‐
     cial characters.  The value of the parameter is listed next to its char‐
     acter.

     *            Expands to the positional parameters, starting from one.
                  When the expansion occurs within a double-quoted string it
                  expands to a single field with the value of each parameter
                  separated by the first character of the IFS variable, or by
                  a ⟨space⟩ if IFS is unset.

     @            Expands to the positional parameters, starting from one.
                  When the expansion occurs within double-quotes, each posi‐
                  tional parameter expands as a separate argument.  If there
                  are no positional parameters, the expansion of @ generates
                  zero arguments, even when @ is double-quoted.  What this ba‐
                  sically means, for example, is if $1 is “abc” and $2 is “def
                  ghi”, then "$@" expands to the two arguments:

                        "abc" "def ghi"

     #            Expands to the number of positional parameters.

     ?            Expands to the exit status of the most recent pipeline.

     - (Hyphen.)  Expands to the current option flags (the single-letter op‐
                  tion names concatenated into a string) as specified on invo‐
                  cation, by the set builtin command, or implicitly by the
                  shell.

     $            Expands to the process ID of the invoked shell.  A subshell
                  retains the same value of $ as its parent.

     !            Expands to the process ID of the most recent background com‐
                  mand executed from the current shell.  For a pipeline, the
                  process ID is that of the last command in the pipeline.

     0 (Zero.)    Expands to the name of the shell or shell script.

   WWoorrdd EExxppaannssiioonnss
     This clause describes the various expansions that are performed on words.
     Not all expansions are performed on every word, as explained later.

     Tilde expansions, parameter expansions, command substitutions, arithmetic
     expansions, and quote removals that occur within a single word expand to
     a single field.  It is only field splitting or pathname expansion that
     can create multiple fields from a single word.  The single exception to
     this rule is the expansion of the special parameter @ within double-
     quotes, as was described above.

     The order of word expansion is:

     1.   Tilde Expansion, Parameter Expansion, Command Substitution, Arith‐
          metic Expansion (these all occur at the same time).

     2.   Field Splitting is performed on fields generated by step (1) unless
          the IFS variable is null.

     3.   Pathname Expansion (unless set --ff is in effect).

     4.   Quote Removal.

     The $ character is used to introduce parameter expansion, command substi‐
     tution, or arithmetic evaluation.

   TTiillddee EExxppaannssiioonn ((ssuubbssttiittuuttiinngg aa uusseerr''ss hhoommee ddiirreeccttoorryy))
     A word beginning with an unquoted tilde character (~) is subjected to
     tilde expansion.  All the characters up to a slash (/) or the end of the
     word are treated as a username and are replaced with the user's home di‐
     rectory.  If the username is missing (as in _~_/_f_o_o_b_a_r), the tilde is re‐
     placed with the value of the _H_O_M_E variable (the current user's home di‐
     rectory).

   PPaarraammeetteerr EExxppaannssiioonn
     The format for parameter expansion is as follows:

           ${expression}

     where expression consists of all characters until the matching “}”.  Any
     “}” escaped by a backslash or within a quoted string, and characters in
     embedded arithmetic expansions, command substitutions, and variable ex‐
     pansions, are not examined in determining the matching “}”.

     The simplest form for parameter expansion is:

           ${parameter}

     The value, if any, of parameter is substituted.

     The parameter name or symbol can be enclosed in braces, which are op‐
     tional except for positional parameters with more than one digit or when
     parameter is followed by a character that could be interpreted as part of
     the name.  If a parameter expansion occurs inside double-quotes:

     1.   Pathname expansion is not performed on the results of the expansion.

     2.   Field splitting is not performed on the results of the expansion,
          with the exception of @.

     In addition, a parameter expansion can be modified by using one of the
     following formats.

     ${parameter:-word}    Use Default Values.  If parameter is unset or null,
                           the expansion of word is substituted; otherwise,
                           the value of parameter is substituted.

     ${parameter:=word}    Assign Default Values.  If parameter is unset or
                           null, the expansion of word is assigned to parame‐
                           ter.  In all cases, the final value of parameter is
                           substituted.  Only variables, not positional param‐
                           eters or special parameters, can be assigned in
                           this way.

     ${parameter:?[word]}  Indicate Error if Null or Unset.  If parameter is
                           unset or null, the expansion of word (or a message
                           indicating it is unset if word is omitted) is writ‐
                           ten to standard error and the shell exits with a
                           nonzero exit status.  Otherwise, the value of pa‐
                           rameter is substituted.  An interactive shell need
                           not exit.

     ${parameter:+word}    Use Alternative Value.  If parameter is unset or
                           null, null is substituted; otherwise, the expansion
                           of word is substituted.

     In the parameter expansions shown previously, use of the colon in the
     format results in a test for a parameter that is unset or null; omission
     of the colon results in a test for a parameter that is only unset.

     ${#parameter}         String Length.  The length in characters of the
                           value of parameter.

     The following four varieties of parameter expansion provide for substring
     processing.  In each case, pattern matching notation (see _S_h_e_l_l
     _P_a_t_t_e_r_n_s), rather than regular expression notation, is used to evaluate
     the patterns.  If parameter is * or @, the result of the expansion is un‐
     specified.  Enclosing the full parameter expansion string in double-
     quotes does not cause the following four varieties of pattern characters
     to be quoted, whereas quoting characters within the braces has this ef‐
     fect.

     ${parameter%word}     Remove Smallest Suffix Pattern.  The word is ex‐
                           panded to produce a pattern.  The parameter expan‐
                           sion then results in parameter, with the smallest
                           portion of the suffix matched by the pattern
                           deleted.

     ${parameter%%word}    Remove Largest Suffix Pattern.  The word is ex‐
                           panded to produce a pattern.  The parameter expan‐
                           sion then results in parameter, with the largest
                           portion of the suffix matched by the pattern
                           deleted.

     ${parameter#word}     Remove Smallest Prefix Pattern.  The word is ex‐
                           panded to produce a pattern.  The parameter expan‐
                           sion then results in parameter, with the smallest
                           portion of the prefix matched by the pattern
                           deleted.

     ${parameter##word}    Remove Largest Prefix Pattern.  The word is ex‐
                           panded to produce a pattern.  The parameter expan‐
                           sion then results in parameter, with the largest
                           portion of the prefix matched by the pattern
                           deleted.

   CCoommmmaanndd SSuubbssttiittuuttiioonn
     Command substitution allows the output of a command to be substituted in
     place of the command name itself.  Command substitution occurs when the
     command is enclosed as follows:

           $(command)

     or (“backquoted” version):

           `command`

     The shell expands the command substitution by executing command in a sub‐
     shell environment and replacing the command substitution with the stan‐
     dard output of the command, removing sequences of one or more ⟨newline⟩s
     at the end of the substitution.  (Embedded ⟨newline⟩s before the end of
     the output are not removed; however, during field splitting, they may be
     translated into ⟨space⟩s, depending on the value of IFS and quoting that
     is in effect.)

   AArriitthhmmeettiicc EExxppaannssiioonn
     Arithmetic expansion provides a mechanism for evaluating an arithmetic
     expression and substituting its value.  The format for arithmetic expan‐
     sion is as follows:

           $((expression))

     The expression is treated as if it were in double-quotes, except that a
     double-quote inside the expression is not treated specially.  The shell
     expands all tokens in the expression for parameter expansion, command
     substitution, and quote removal.

     Next, the shell treats this as an arithmetic expression and substitutes
     the value of the expression.

   WWhhiittee SSppaaccee SSpplliittttiinngg ((FFiieelldd SSpplliittttiinngg))
     After parameter expansion, command substitution, and arithmetic expansion
     the shell scans the results of expansions and substitutions that did not
     occur in double-quotes for field splitting and multiple fields can re‐
     sult.

     The shell treats each character of the IFS as a delimiter and uses the
     delimiters to split the results of parameter expansion and command sub‐
     stitution into fields.

   PPaatthhnnaammee EExxppaannssiioonn ((FFiillee NNaammee GGeenneerraattiioonn))
     Unless the --ff flag is set, file name generation is performed after word
     splitting is complete.  Each word is viewed as a series of patterns, sep‐
     arated by slashes.  The process of expansion replaces the word with the
     names of all existing files whose names can be formed by replacing each
     pattern with a string that matches the specified pattern.  There are two
     restrictions on this: first, a pattern cannot match a string containing a
     slash, and second, a pattern cannot match a string starting with a period
     unless the first character of the pattern is a period.  The next section
     describes the patterns used for both Pathname Expansion and the ccaassee com‐
     mand.

   SShheellll PPaatttteerrnnss
     A pattern consists of normal characters, which match themselves, and
     meta-characters.  The meta-characters are “!”, “*”, “?”, and “[”.  These
     characters lose their special meanings if they are quoted.  When command
     or variable substitution is performed and the dollar sign or back quotes
     are not double quoted, the value of the variable or the output of the
     command is scanned for these characters and they are turned into meta-
     characters.

     An asterisk (“*”) matches any string of characters.  A question mark
     matches any single character.  A left bracket (“[”) introduces a charac‐
     ter class.  The end of the character class is indicated by a (“]”); if
     the “]” is missing then the “[” matches a “[” rather than introducing a
     character class.  A character class matches any of the characters between
     the square brackets.  A range of characters may be specified using a mi‐
     nus sign.  The character class may be complemented by making an exclama‐
     tion point the first character of the character class.

     To include a “]” in a character class, make it the first character listed
     (after the “!”, if any).  To include a minus sign, make it the first or
     last character listed.

   BBuuiillttiinnss
     This section lists the builtin commands which are builtin because they
     need to perform some operation that can't be performed by a separate
     process.  In addition to these, there are several other commands that may
     be builtin for efficiency (e.g.  printf(1), echo(1), test(1), etc).

     :

     true   A null command that returns a 0 (true) exit value.

     . file
            The commands in the specified file are read and executed by the
            shell.

     alias [_n_a_m_e[_=_s_t_r_i_n_g _._._.]]
            If _n_a_m_e_=_s_t_r_i_n_g is specified, the shell defines the alias _n_a_m_e with
            value _s_t_r_i_n_g.  If just _n_a_m_e is specified, the value of the alias
            _n_a_m_e is printed.  With no arguments, the aalliiaass builtin prints the
            names and values of all defined aliases (see uunnaalliiaass).

     bg [_j_o_b] _._._.
            Continue the specified jobs (or the current job if no jobs are
            given) in the background.

     command [--pp] [--vv] [--VV] _c_o_m_m_a_n_d [_a_r_g _._._.]
            Execute the specified command but ignore shell functions when
            searching for it.  (This is useful when you have a shell function
            with the same name as a builtin command.)

            --pp     search for command using a PATH that guarantees to find all
                   the standard utilities.

            --VV     Do not execute the command but search for the command and
                   print the resolution of the command search.  This is the
                   same as the type builtin.

            --vv     Do not execute the command but search for the command and
                   print the absolute pathname of utilities, the name for
                   builtins or the expansion of aliases.

     cd _-

     cd [--LLPP] [_d_i_r_e_c_t_o_r_y]
            Switch to the specified directory (default HOME).  If an entry for
            CDPATH appears in the environment of the ccdd command or the shell
            variable CDPATH is set and the directory name does not begin with
            a slash, then the directories listed in CDPATH will be searched
            for the specified directory.  The format of CDPATH is the same as
            that of PATH.  If a single dash is specified as the argument, it
            will be replaced by the value of OLDPWD.  The ccdd command will
            print out the name of the directory that it actually switched to
            if this is different from the name that the user gave.  These may
            be different either because the CDPATH mechanism was used or be‐
            cause the argument is a single dash.  The --PP option causes the
            physical directory structure to be used, that is, all symbolic
            links are resolved to their respective values.  The --LL option
            turns off the effect of any preceding --PP options.

     echo [--nn] _a_r_g_s_._._.
            Print the arguments on the standard output, separated by spaces.
            Unless the --nn option is present, a newline is output following the
            arguments.

            If any of the following sequences of characters is encountered
            during output, the sequence is not output.  Instead, the specified
            action is performed:

            \b      A backspace character is output.

            \c      Subsequent output is suppressed.  This is normally used at
                    the end of the last argument to suppress the trailing new‐
                    line that eecchhoo would otherwise output.

            \e      Outputs an escape character (ESC).

            \f      Output a form feed.

            \n      Output a newline character.

            \r      Output a carriage return.

            \t      Output a (horizontal) tab character.

            \v      Output a vertical tab.

            \0_d_i_g_i_t_s
                    Output the character whose value is given by zero to three
                    octal digits.  If there are zero digits, a nul character
                    is output.

            \\      Output a backslash.

            All other backslash sequences elicit undefined behaviour.

     eval _s_t_r_i_n_g _._._.
            Concatenate all the arguments with spaces.  Then re-parse and exe‐
            cute the command.

     exec [_c_o_m_m_a_n_d _a_r_g _._._.]
            Unless command is omitted, the shell process is replaced with the
            specified program (which must be a real program, not a shell
            builtin or function).  Any redirections on the eexxeecc command are
            marked as permanent, so that they are not undone when the eexxeecc
            command finishes.

     exit [_e_x_i_t_s_t_a_t_u_s]
            Terminate the shell process.  If _e_x_i_t_s_t_a_t_u_s is given it is used as
            the exit status of the shell; otherwise the exit status of the
            preceding command is used.

     export _n_a_m_e _._._.

     export --pp
            The specified names are exported so that they will appear in the
            environment of subsequent commands.  The only way to un-export a
            variable is to unset it.  The shell allows the value of a variable
            to be set at the same time it is exported by writing

                  export name=value

            With no arguments the export command lists the names of all ex‐
            ported variables.  With the --pp option specified the output will be
            formatted suitably for non-interactive use.

     fc [--ee _e_d_i_t_o_r] [_f_i_r_s_t [_l_a_s_t]]

     fc --ll [--nnrr] [_f_i_r_s_t [_l_a_s_t]]

     fc --ss [_o_l_d_=_n_e_w] [_f_i_r_s_t]
            The ffcc builtin lists, or edits and re-executes, commands previ‐
            ously entered to an interactive shell.

            --ee editor
                   Use the editor named by editor to edit the commands.  The
                   editor string is a command name, subject to search via the
                   PATH variable.  The value in the FCEDIT variable is used as
                   a default when --ee is not specified.  If FCEDIT is null or
                   unset, the value of the EDITOR variable is used.  If EDITOR
                   is null or unset, ed(1) is used as the editor.

            --ll (ell)
                   List the commands rather than invoking an editor on them.
                   The commands are written in the sequence indicated by the
                   first and last operands, as affected by --rr, with each com‐
                   mand preceded by the command number.

            --nn     Suppress command numbers when listing with -l.

            --rr     Reverse the order of the commands listed (with --ll) or
                   edited (with neither --ll nor --ss).

            --ss     Re-execute the command without invoking an editor.

            first

            last   Select the commands to list or edit.  The number of previ‐
                   ous commands that can be accessed are determined by the
                   value of the HISTSIZE variable.  The value of first or last
                   or both are one of the following:

                   [+]number
                          A positive number representing a command number;
                          command numbers can be displayed with the --ll option.

                   --nnuummbbeerr
                          A negative decimal number representing the command
                          that was executed number of commands previously.
                          For example, -1 is the immediately previous command.

            string
                   A string indicating the most recently entered command that
                   begins with that string.  If the old=new operand is not
                   also specified with --ss, the string form of the first oper‐
                   and cannot contain an embedded equal sign.

            The following environment variables affect the execution of fc:

            FCEDIT    Name of the editor to use.

            HISTSIZE  The number of previous commands that are accessible.

     fg [_j_o_b]
            Move the specified job or the current job to the foreground.

     getopts _o_p_t_s_t_r_i_n_g _v_a_r
            The POSIX ggeettooppttss command, not to be confused with the _B_e_l_l _L_a_b_s
            -derived getopt(1).

            The first argument should be a series of letters, each of which
            may be optionally followed by a colon to indicate that the option
            requires an argument.  The variable specified is set to the parsed
            option.

            The ggeettooppttss command deprecates the older getopt(1) utility due to
            its handling of arguments containing whitespace.

            The ggeettooppttss builtin may be used to obtain options and their argu‐
            ments from a list of parameters.  When invoked, ggeettooppttss places the
            value of the next option from the option string in the list in the
            shell variable specified by _v_a_r and its index in the shell vari‐
            able OPTIND.  When the shell is invoked, OPTIND is initialized to
            1.  For each option that requires an argument, the ggeettooppttss builtin
            will place it in the shell variable OPTARG.  If an option is not
            allowed for in the _o_p_t_s_t_r_i_n_g, then OPTARG will be unset.

            _o_p_t_s_t_r_i_n_g is a string of recognized option letters (see
            getopt(3)).  If a letter is followed by a colon, the option is ex‐
            pected to have an argument which may or may not be separated from
            it by white space.  If an option character is not found where ex‐
            pected, ggeettooppttss will set the variable _v_a_r to a “?”; ggeettooppttss will
            then unset OPTARG and write output to standard error.  By specify‐
            ing a colon as the first character of _o_p_t_s_t_r_i_n_g all errors will be
            ignored.

            After the last option ggeettooppttss will return a non-zero value and set
            _v_a_r to “?”.

            The following code fragment shows how one might process the argu‐
            ments for a command that can take the options [a] and [b], and the
            option [c], which requires an argument.

                  while getopts abc: f
                  do
                          case $f in
                          a | b)  flag=$f;;
                          c)      carg=$OPTARG;;
                          \?)     echo $USAGE; exit 1;;
                          esac
                  done
                  shift `expr $OPTIND - 1`

            This code will accept any of the following as equivalent:

                  cmd -acarg file file
                  cmd -a -c arg file file
                  cmd -carg -a file file
                  cmd -a -carg -- file file

     hash --rrvv _c_o_m_m_a_n_d _._._.
            The shell maintains a hash table which remembers the locations of
            commands.  With no arguments whatsoever, the hhaasshh command prints
            out the contents of this table.  Entries which have not been
            looked at since the last ccdd command are marked with an asterisk;
            it is possible for these entries to be invalid.

            With arguments, the hhaasshh command removes the specified commands
            from the hash table (unless they are functions) and then locates
            them.  With the --vv option, hash prints the locations of the com‐
            mands as it finds them.  The --rr option causes the hash command to
            delete all the entries in the hash table except for functions.

     pwd [--LLPP]
            builtin command remembers what the current directory is rather
            than recomputing it each time.  This makes it faster.  However, if
            the current directory is renamed, the builtin version of ppwwdd will
            continue to print the old name for the directory.  The --PP option
            causes the physical value of the current working directory to be
            shown, that is, all symbolic links are resolved to their respec‐
            tive values.  The --LL option turns off the effect of any preceding
            --PP options.

     read [--pp _p_r_o_m_p_t] [--rr] _v_a_r_i_a_b_l_e [_._._.]
            The prompt is printed if the --pp option is specified and the stan‐
            dard input is a terminal.  Then a line is read from the standard
            input.  The trailing newline is deleted from the line and the line
            is split as described in the section on word splitting above, and
            the pieces are assigned to the variables in order.  At least one
            variable must be specified.  If there are more pieces than vari‐
            ables, the remaining pieces (along with the characters in IFS that
            separated them) are assigned to the last variable.  If there are
            more variables than pieces, the remaining variables are assigned
            the null string.  The rreeaadd builtin will indicate success unless
            EOF is encountered on input, in which case failure is returned.

            By default, unless the --rr option is specified, the backslash “\”
            acts as an escape character, causing the following character to be
            treated literally.  If a backslash is followed by a newline, the
            backslash and the newline will be deleted.

     readonly _n_a_m_e _._._.

     readonly --pp
            The specified names are marked as read only, so that they cannot
            be subsequently modified or unset.  The shell allows the value of
            a variable to be set at the same time it is marked read only by
            writing

                  readonly name=value

            With no arguments the readonly command lists the names of all read
            only variables.  With the --pp option specified the output will be
            formatted suitably for non-interactive use.

     printf _f_o_r_m_a_t [_a_r_g_u_m_e_n_t_s _._._.]
            pprriinnttff formats and prints its arguments, after the first, under
            control of the _f_o_r_m_a_t.  The _f_o_r_m_a_t is a character string which
            contains three types of objects: plain characters, which are sim‐
            ply copied to standard output, character escape sequences which
            are converted and copied to the standard output, and format speci‐
            fications, each of which causes printing of the next successive
            _a_r_g_u_m_e_n_t.

            The _a_r_g_u_m_e_n_t_s after the first are treated as strings if the corre‐
            sponding format is either bb, cc or ss; otherwise it is evaluated as
            a C constant, with the following extensions:

                  ••   A leading plus or minus sign is allowed.
                  ••   If the leading character is a single or double quote,
                      the value is the ASCII code of the next character.

            The format string is reused as often as necessary to satisfy the
            _a_r_g_u_m_e_n_t_s.  Any extra format specifications are evaluated with
            zero or the null string.

            Character escape sequences are in backslash notation as defined in
            ANSI X3.159-1989 (“ANSI C89”).  The characters and their meanings
            are as follows:

                  \\aa      Write a <bell> character.

                  \\bb      Write a <backspace> character.

                  \\ee      Write an <escape> (ESC) character.

                  \\ff      Write a <form-feed> character.

                  \\nn      Write a <new-line> character.

                  \\rr      Write a <carriage return> character.

                  \\tt      Write a <tab> character.

                  \\vv      Write a <vertical tab> character.

                  \\\\      Write a backslash character.

                  \\_n_u_m    Write an 8-bit character whose ASCII value is the
                          1-, 2-, or 3-digit octal number _n_u_m.

            Each format specification is introduced by the percent character
            (``%'').  The remainder of the format specification includes, in
            the following order:

            Zero or more of the following flags:

                    ##       A `#' character specifying that the value should
                            be printed in an ``alternative form''.  For bb, cc,
                            dd, and ss formats, this option has no effect.  For
                            the oo format the precision of the number is in‐
                            creased to force the first character of the output
                            string to a zero.  For the xx (XX) format, a non-
                            zero result has the string 0x (0X) prepended to
                            it.  For ee, EE, ff, gg, and GG formats, the result
                            will always contain a decimal point, even if no
                            digits follow the point (normally, a decimal point
                            only appears in the results of those formats if a
                            digit follows the decimal point).  For gg and GG
                            formats, trailing zeros are not removed from the
                            result as they would otherwise be.

                    --       A minus sign `-' which specifies _l_e_f_t _a_d_j_u_s_t_m_e_n_t
                            of the output in the indicated field;

                    ++       A `+' character specifying that there should al‐
                            ways be a sign placed before the number when using
                            signed formats.

                    ‘ ’     A space specifying that a blank should be left be‐
                            fore a positive number for a signed format.  A `+'
                            overrides a space if both are used;

                    00       A zero `0' character indicating that zero-padding
                            should be used rather than blank-padding.  A `-'
                            overrides a `0' if both are used;

            Field Width:
                    An optional digit string specifying a _f_i_e_l_d _w_i_d_t_h; if the
                    output string has fewer characters than the field width it
                    will be blank-padded on the left (or right, if the left-
                    adjustment indicator has been given) to make up the field
                    width (note that a leading zero is a flag, but an embedded
                    zero is part of a field width);

            Precision:
                    An optional period, ‘..’, followed by an optional digit
                    string giving a _p_r_e_c_i_s_i_o_n which specifies the number of
                    digits to appear after the decimal point, for ee and ff for‐
                    mats, or the maximum number of bytes to be printed from a
                    string (bb and ss formats); if the digit string is missing,
                    the precision is treated as zero;

            Format:
                    A character which indicates the type of format to use (one
                    of ddiioouuxxXXffwwEEggGGbbccss).

            A field width or precision may be ‘**’ instead of a digit string.
            In this case an _a_r_g_u_m_e_n_t supplies the field width or precision.

            The format characters and their meanings are:

            ddiioouuXXxx      The _a_r_g_u_m_e_n_t is printed as a signed decimal (d or i),
                        unsigned octal, unsigned decimal, or unsigned hexadec‐
                        imal (X or x), respectively.

            ff           The _a_r_g_u_m_e_n_t is printed in the style [-]ddd..ddd where
                        the number of d's after the decimal point is equal to
                        the precision specification for the argument.  If the
                        precision is missing, 6 digits are given; if the pre‐
                        cision is explicitly 0, no digits and no decimal point
                        are printed.

            eeEE          The _a_r_g_u_m_e_n_t is printed in the style [-]d..dddee±dd
                        where there is one digit before the decimal point and
                        the number after is equal to the precision specifica‐
                        tion for the argument; when the precision is missing,
                        6 digits are produced.  An upper-case E is used for an
                        `E' format.

            ggGG          The _a_r_g_u_m_e_n_t is printed in style ff or in style ee (EE)
                        whichever gives full precision in minimum space.

            bb           Characters from the string _a_r_g_u_m_e_n_t are printed with
                        backslash-escape sequences expanded.
                        The following additional backslash-escape sequences
                        are supported:

                        \\cc      Causes ddaasshh to ignore any remaining characters
                                in the string operand containing it, any re‐
                                maining string operands, and any additional
                                characters in the format operand.

                        \\00_n_u_m   Write an 8-bit character whose ASCII value is
                                the 1-, 2-, or 3-digit octal number _n_u_m.

            cc           The first character of _a_r_g_u_m_e_n_t is printed.

            ss           Characters from the string _a_r_g_u_m_e_n_t are printed until
                        the end is reached or until the number of bytes indi‐
                        cated by the precision specification is reached; if
                        the precision is omitted, all characters in the string
                        are printed.

            %%           Print a `%'; no argument is used.

            In no case does a non-existent or small field width cause trunca‐
            tion of a field; padding takes place only if the specified field
            width exceeds the actual width.

     set [{ --ooppttiioonnss | ++ooppttiioonnss | ---- }}] _a_r_g _._._.
            The sseett command performs three different functions.

            With no arguments, it lists the values of all shell variables.

            If options are given, it sets the specified option flags, or
            clears them as described in the section called _A_r_g_u_m_e_n_t _L_i_s_t
            _P_r_o_c_e_s_s_i_n_g.  As a special case, if the option is -o or +o and no
            argument is supplied, the shell prints the settings of all its op‐
            tions.  If the option is -o, the settings are printed in a human-
            readable format; if the option is +o, the settings are printed in
            a format suitable for reinput to the shell to affect the same op‐
            tion settings.

            The third use of the set command is to set the values of the
            shell's positional parameters to the specified args.  To change
            the positional parameters without changing any options, use “--”
            as the first argument to set.  If no args are present, the set
            command will clear all the positional parameters (equivalent to
            executing “shift $#”.)

     shift [_n]
            Shift the positional parameters n times.  A sshhiifftt sets the value
            of _$_1 to the value of _$_2, the value of _$_2 to the value of _$_3, and
            so on, decreasing the value of _$_# by one.  If n is greater than
            the number of positional parameters, sshhiifftt will issue an error
            message, and exit with return status 2.

     test _e_x_p_r_e_s_s_i_o_n

     [ _e_x_p_r_e_s_s_i_o_n ]]
            The tteesstt utility evaluates the expression and, if it evaluates to
            true, returns a zero (true) exit status; otherwise it returns 1
            (false).  If there is no expression, test also returns 1 (false).

            All operators and flags are separate arguments to the tteesstt util‐
            ity.

            The following primaries are used to construct expression:

            --bb _f_i_l_e       True if _f_i_l_e exists and is a block special file.

            --cc _f_i_l_e       True if _f_i_l_e exists and is a character special file.

            --dd _f_i_l_e       True if _f_i_l_e exists and is a directory.

            --ee _f_i_l_e       True if _f_i_l_e exists (regardless of type).

            --ff _f_i_l_e       True if _f_i_l_e exists and is a regular file.

            --gg _f_i_l_e       True if _f_i_l_e exists and its set group ID flag is
                          set.

            --hh _f_i_l_e       True if _f_i_l_e exists and is a symbolic link.

            --kk _f_i_l_e       True if _f_i_l_e exists and its sticky bit is set.

            --nn _s_t_r_i_n_g     True if the length of _s_t_r_i_n_g is nonzero.

            --pp _f_i_l_e       True if _f_i_l_e is a named pipe (FIFO).

            --rr _f_i_l_e       True if _f_i_l_e exists and is readable.

            --ss _f_i_l_e       True if _f_i_l_e exists and has a size greater than
                          zero.

            --tt _f_i_l_e___d_e_s_c_r_i_p_t_o_r
                          True if the file whose file descriptor number is
                          _f_i_l_e___d_e_s_c_r_i_p_t_o_r is open and is associated with a
                          terminal.

            --uu _f_i_l_e       True if _f_i_l_e exists and its set user ID flag is set.

            --ww _f_i_l_e       True if _f_i_l_e exists and is writable.  True indicates
                          only that the write flag is on.  The file is not
                          writable on a read-only file system even if this
                          test indicates true.

            --xx _f_i_l_e       True if _f_i_l_e exists and is executable.  True indi‐
                          cates only that the execute flag is on.  If _f_i_l_e is
                          a directory, true indicates that _f_i_l_e can be
                          searched.

            --zz _s_t_r_i_n_g     True if the length of _s_t_r_i_n_g is zero.

            --LL _f_i_l_e       True if _f_i_l_e exists and is a symbolic link.  This
                          operator is retained for compatibility with previous
                          versions of this program.  Do not rely on its exis‐
                          tence; use --hh instead.

            --OO _f_i_l_e       True if _f_i_l_e exists and its owner matches the effec‐
                          tive user id of this process.

            --GG _f_i_l_e       True if _f_i_l_e exists and its group matches the effec‐
                          tive group id of this process.

            --SS _f_i_l_e       True if _f_i_l_e exists and is a socket.

            _f_i_l_e_1 --nntt _f_i_l_e_2
                          True if _f_i_l_e_1 and _f_i_l_e_2 exist and _f_i_l_e_1 is newer
                          than _f_i_l_e_2.

            _f_i_l_e_1 --oott _f_i_l_e_2
                          True if _f_i_l_e_1 and _f_i_l_e_2 exist and _f_i_l_e_1 is older
                          than _f_i_l_e_2.

            _f_i_l_e_1 --eeff _f_i_l_e_2
                          True if _f_i_l_e_1 and _f_i_l_e_2 exist and refer to the same
                          file.

            _s_t_r_i_n_g        True if _s_t_r_i_n_g is not the null string.

            _s_1 == _s_2       True if the strings _s_1 and _s_2 are identical.

            _s_1 !!== _s_2      True if the strings _s_1 and _s_2 are not identical.

            _s_1 << _s_2       True if string _s_1 comes before _s_2 based on the ASCII
                          value of their characters.

            _s_1 >> _s_2       True if string _s_1 comes after _s_2 based on the ASCII
                          value of their characters.

            _n_1 --eeqq _n_2     True if the integers _n_1 and _n_2 are algebraically
                          equal.

            _n_1 --nnee _n_2     True if the integers _n_1 and _n_2 are not algebraically
                          equal.

            _n_1 --ggtt _n_2     True if the integer _n_1 is algebraically greater than
                          the integer _n_2.

            _n_1 --ggee _n_2     True if the integer _n_1 is algebraically greater than
                          or equal to the integer _n_2.

            _n_1 --lltt _n_2     True if the integer _n_1 is algebraically less than
                          the integer _n_2.

            _n_1 --llee _n_2     True if the integer _n_1 is algebraically less than or
                          equal to the integer _n_2.

            These primaries can be combined with the following operators:

            !! _e_x_p_r_e_s_s_i_o_n  True if _e_x_p_r_e_s_s_i_o_n is false.

            _e_x_p_r_e_s_s_i_o_n_1 --aa _e_x_p_r_e_s_s_i_o_n_2
                          True if both _e_x_p_r_e_s_s_i_o_n_1 and _e_x_p_r_e_s_s_i_o_n_2 are true.

            _e_x_p_r_e_s_s_i_o_n_1 --oo _e_x_p_r_e_s_s_i_o_n_2
                          True if either _e_x_p_r_e_s_s_i_o_n_1 or _e_x_p_r_e_s_s_i_o_n_2 are true.

            ((_e_x_p_r_e_s_s_i_o_n))  True if expression is true.

            The --aa operator has higher precedence than the --oo operator.

     times  Print the accumulated user and system times for the shell and for
            processes run from the shell.  The return status is 0.

     trap [_a_c_t_i_o_n _s_i_g_n_a_l _._._.]
            Cause the shell to parse and execute action when any of the speci‐
            fied signals are received.  The signals are specified by signal
            number or as the name of the signal.  If _s_i_g_n_a_l is 0 or EXIT, the
            action is executed when the shell exits.  _a_c_t_i_o_n may be empty
            (''), which causes the specified signals to be ignored.  With
            _a_c_t_i_o_n omitted or set to `-' the specified signals are set to
            their default action.  When the shell forks off a subshell, it re‐
            sets trapped (but not ignored) signals to the default action.  The
            ttrraapp command has no effect on signals that were ignored on entry
            to the shell.  ttrraapp without any arguments cause it to write a list
            of signals and their associated action to the standard output in a
            format that is suitable as an input to the shell that achieves the
            same trapping results.

            Examples:

                  trap

            List trapped signals and their corresponding action

                  trap '' INT QUIT tstp 30

            Ignore signals INT QUIT TSTP USR1

                  trap date INT

            Print date upon receiving signal INT

     type [_n_a_m_e _._._.]
            Interpret each name as a command and print the resolution of the
            command search.  Possible resolutions are: shell keyword, alias,
            shell builtin, command, tracked alias and not found.  For aliases
            the alias expansion is printed; for commands and tracked aliases
            the complete pathname of the command is printed.

     ulimit [--HH | --SS] [--aa | --ttffddssccmmllppnnvv [_v_a_l_u_e]]
            Inquire about or set the hard or soft limits on processes or set
            new limits.  The choice between hard limit (which no process is
            allowed to violate, and which may not be raised once it has been
            lowered) and soft limit (which causes processes to be signaled but
            not necessarily killed, and which may be raised) is made with
            these flags:

            --HH          set or inquire about hard limits

            --SS          set or inquire about soft limits.  If neither --HH nor
                        --SS is specified, the soft limit is displayed or both
                        limits are set.  If both are specified, the last one
                        wins.

            The limit to be interrogated or set, then, is chosen by specifying
            any one of these flags:

            --aa          show all the current limits

            --tt          show or set the limit on CPU time (in seconds)

            --ff          show or set the limit on the largest file that can be
                        created (in 512-byte blocks)

            --dd          show or set the limit on the data segment size of a
                        process (in kilobytes)

            --ss          show or set the limit on the stack size of a process
                        (in kilobytes)

            --cc          show or set the limit on the largest core dump size
                        that can be produced (in 512-byte blocks)

            --mm          show or set the limit on the total physical memory
                        that can be in use by a process (in kilobytes)

            --ll          show or set the limit on how much memory a process can
                        lock with mlock(2) (in kilobytes)

            --pp          show or set the limit on the number of processes this
                        user can have at one time

            --nn          show or set the limit on the number files a process
                        can have open at once

            --vv          show or set the limit on the total virtual memory that
                        can be in use by a process (in kilobytes)

            --rr          show or set the limit on the real-time scheduling pri‐
                        ority of a process

            If none of these is specified, it is the limit on file size that
            is shown or set.  If value is specified, the limit is set to that
            number; otherwise the current limit is displayed.

            Limits of an arbitrary process can be displayed or set using the
            sysctl(8) utility.

     umask [_m_a_s_k]
            Set the value of umask (see umask(2)) to the specified octal
            value.  If the argument is omitted, the umask value is printed.

     unalias [--aa] [_n_a_m_e]
            If _n_a_m_e is specified, the shell removes that alias.  If --aa is
            specified, all aliases are removed.

     unset [--ffvv] _n_a_m_e _._._.
            The specified variables and functions are unset and unexported.
            If --ff or --vv is specified, the corresponding function or variable
            is unset, respectively.  If a given name corresponds to both a
            variable and a function, and no options are given, only the vari‐
            able is unset.

     wait [_j_o_b]
            Wait for the specified job to complete and return the exit status
            of the last process in the job.  If the argument is omitted, wait
            for all jobs to complete and return an exit status of zero.

   CCoommmmaanndd LLiinnee EEddiittiinngg
     When ddaasshh is being used interactively from a terminal, the current com‐
     mand and the command history (see ffcc in _B_u_i_l_t_i_n_s) can be edited using vi-
     mode command-line editing.  This mode uses commands, described below,
     similar to a subset of those described in the vi man page.  The command
     ‘set -o vi’ enables vi-mode editing and places sh into vi insert mode.
     With vi-mode enabled, sh can be switched between insert mode and command
     mode.  It is similar to vi: typing ⟨ESC⟩ enters vi command mode.  Hitting
     ⟨return⟩ while in command mode will pass the line to the shell.

EEXXIITT SSTTAATTUUSS
     Errors that are detected by the shell, such as a syntax error, will cause
     the shell to exit with a non-zero exit status.  If the shell is not an
     interactive shell, the execution of the shell file will be aborted.  Oth‐
     erwise the shell will return the exit status of the last command exe‐
     cuted, or if the exit builtin is used with a numeric argument, it will
     return the argument.

EENNVVIIRROONNMMEENNTT
     HOME       Set automatically by login(1) from the user's login directory
                in the password file (passwd(4)).  This environment variable
                also functions as the default argument for the cd builtin.

     PATH       The default search path for executables.  See the above sec‐
                tion _P_a_t_h _S_e_a_r_c_h.

     CDPATH     The search path used with the cd builtin.

     MAIL       The name of a mail file, that will be checked for the arrival
                of new mail.  Overridden by MAILPATH.

     MAILCHECK  The frequency in seconds that the shell checks for the arrival
                of mail in the files specified by the MAILPATH or the MAIL
                file.  If set to 0, the check will occur at each prompt.

     MAILPATH   A colon “:” separated list of file names, for the shell to
                check for incoming mail.  This environment setting overrides
                the MAIL setting.  There is a maximum of 10 mailboxes that can
                be monitored at once.

     PS1        The primary prompt string, which defaults to “$ ”, unless you
                are the superuser, in which case it defaults to “# ”.

     PS2        The secondary prompt string, which defaults to “> ”.

     PS4        Output before each line when execution trace (set -x) is en‐
                abled, defaults to “+ ”.

     IFS        Input Field Separators.  This is normally set to ⟨space⟩,
                ⟨tab⟩, and ⟨newline⟩.  See the _W_h_i_t_e _S_p_a_c_e _S_p_l_i_t_t_i_n_g section
                for more details.

     TERM       The default terminal setting for the shell.  This is inherited
                by children of the shell, and is used in the history editing
                modes.

     HISTSIZE   The number of lines in the history buffer for the shell.

     PWD        The logical value of the current working directory.  This is
                set by the ccdd command.

     OLDPWD     The previous logical value of the current working directory.
                This is set by the ccdd command.

     PPID       The process ID of the parent process of the shell.

FFIILLEESS
     _$_H_O_M_E_/_._p_r_o_f_i_l_e

     _/_e_t_c_/_p_r_o_f_i_l_e

SSEEEE AALLSSOO
     csh(1), echo(1), getopt(1), ksh(1), login(1), printf(1), test(1),
     getopt(3), passwd(5), environ(7), sysctl(8)

HHIISSTTOORRYY
     ddaasshh is a POSIX-compliant implementation of /bin/sh that aims to be as
     small as possible.  ddaasshh is a direct descendant of the NetBSD version of
     ash (the Almquist SHell), ported to Linux in early 1997.  It was renamed
     to ddaasshh in 2002.

BBUUGGSS
     Setuid shell scripts should be avoided at all costs, as they are a sig‐
     nificant security risk.

     PS1, PS2, and PS4 should be subject to parameter expansion before being
     displayed.

BSD                            January 19, 2003                            BSD

Using the ⚠️ man (manual) command we see that on this mashing sh points to the dash shell.

For convenience we will use the cell `` magic in the the rest of the lecture to write our scripts

For figuring out binding of commands to different executables (you can have several python interpreters alongside each other on your system) use ⚠️ which

!which python
/home/mirok/miniconda3/envs/in3110/bin/python

Variables#

  • Assign a variable by var=value (NOTE no spaces around =!)

  • Retrieve the value of the variable by ${var} or $var

%%bash
#!/usr/bin/bash

cmd=echo    # Functions can be passed around
greet="Hi"

${cmd} ${greet} world $greet!

# Undefined variables result in empty string

${cmd} ${greet} ${world}!
Hi world Hi!
Hi !

There are also special variables defined in the environment. By convention their names are all uppercase. As an example, recall that when running hello_world.sh above we have specified the full path to the script. In particular, the following would give an error

! ./scripts/hello_world.sh
hello world!

To fix this problem, recall the role of PYTHONPATH in looking up Python modules by the Python interpreter. In fact PYTHONPATH is environmental variable

! echo $PYTHONPATH   # NOTE that this could be empty

Similar role is played by the environmental variable PATH which specifies directories to look for program executables.

! echo $PATH
/home/mirok/Documents/Teaching/UiO-IN3110.github.io/lectures/command-line/scripts/scripts:/home/mirok/Documents/Software/AnaMorph/bin:/home/mirok/Documents/Software/visit3_3_3.linux-x86_64/bin:/home/mirok/Documents/Software/Fiji.app:/home/mirok/Documents/Software/julia:/home/mirok/Documents/Software/ParaView-5.11.0-MPI-Linux-Python3.9-x86_64/bin:/home/mirok/miniconda3/envs/in3110/bin:/home/mirok/miniconda3/condabin:/home/mirok/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin

What we would like to do to run our script just as hello_world.sh is to modify the env. Consider the following

%%bash
new_PATH="$PWD/scripts:$PATH"
echo $new_PATH
/home/mirok/Documents/Teaching/UiO-IN3110.github.io/lectures/command-line/scripts:/home/mirok/Documents/Teaching/UiO-IN3110.github.io/lectures/command-line/scripts/scripts:/home/mirok/Documents/Software/AnaMorph/bin:/home/mirok/Documents/Software/visit3_3_3.linux-x86_64/bin:/home/mirok/Documents/Software/Fiji.app:/home/mirok/Documents/Software/julia:/home/mirok/Documents/Software/ParaView-5.11.0-MPI-Linux-Python3.9-x86_64/bin:/home/mirok/miniconda3/envs/in3110/bin:/home/mirok/miniconda3/condabin:/home/mirok/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin

Here we have computed the value assigned to new_PATH by using ⚠️ pwd command and building up the string. Note that we prepend to the list to get higher precedence to our directory. To update the PATH we could continue as follows

%%bash
new_PATH="$PWD/scripts:$PATH"
export PATH=$new_PATH  # PATH is set

echo $PATH

# Navigate somewhere else so that we don't get lucky
cd $HOME
echo "Now at" $PWD
# Call
echo
hello_world.sh
/home/mirok/Documents/Teaching/UiO-IN3110.github.io/lectures/command-line/scripts:/home/mirok/Documents/Teaching/UiO-IN3110.github.io/lectures/command-line/scripts/scripts:/home/mirok/Documents/Software/AnaMorph/bin:/home/mirok/Documents/Software/visit3_3_3.linux-x86_64/bin:/home/mirok/Documents/Software/Fiji.app:/home/mirok/Documents/Software/julia:/home/mirok/Documents/Software/ParaView-5.11.0-MPI-Linux-Python3.9-x86_64/bin:/home/mirok/miniconda3/envs/in3110/bin:/home/mirok/miniconda3/condabin:/home/mirok/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
Now at /home/mirok

hello world!

Here we have used the command ⚠️ cd to change the directory to HOME which is an environment variable holding the user home directory, here

! echo $HOME
/home/mirok

NOTE There is a pitfall in each notebook cell execution is its own process. In particular, the exported variables will not be reflected in the next (not child) processes.

! echo $PATH   
/home/mirok/Documents/Software/AnaMorph/bin:/home/mirok/Documents/Software/visit3_3_3.linux-x86_64/bin:/home/mirok/Documents/Software/Fiji.app:/home/mirok/Documents/Software/julia:/home/mirok/Documents/Software/ParaView-5.11.0-MPI-Linux-Python3.9-x86_64/bin:/home/mirok/miniconda3/envs/in3110/bin:/home/mirok/miniconda3/condabin:/home/mirok/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin

So we will do this outside in the terminal/in one running shell session. We can put this process to sleep by ctrl+z. After the setting we can bring it back to (f)ore(g)round by ⚠️ fg. Alternatively, we can resume the sleeping process in the (b)ack(g)round ⚠️ bg.

Some other examples of setting variables on computations

%%bash
weekday=$(date +"%A %Y-%m-%d %H:%M:%S")    # date +"%A" is a bash command to display the day of the week 
echo "Today is $weekday."
Today is onsdag 2023-11-01 12:43:00.
%%bash
# Here we just use a different syntax to get it
files=`ls ..`
echo $files
13_scikit_learn 14-julia-ml about best_practices command-line mixed-programming numerical-python pandas Peer-review information.ipynb production pull-request python regular-expressions tips_and_tricks visualisation web web-servers

As said before command ⚠️ ls lists content of a directory.

Typed variables#

By default variables are un-typed, and treated as character arrays

%%bash
x=5
x=$x++5
echo $x
5++5

We can be explicit about the type of variable

%%bash
declare -i b     # define an integer variable b
a=5
b=$a+5
echo $b
10

Or express that the variable is constant/read-only

%%bash
declare -r r=10            
echo $r
r=5
10
bash: line 3: r: readonly variable
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
Cell In [29], line 1
----> 1 get_ipython().run_cell_magic('bash', '', 'declare -r r=10            \necho $r\nr=5\n')

File ~/miniconda3/envs/in3110/lib/python3.9/site-packages/IPython/core/interactiveshell.py:2417, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2415 with self.builtin_trap:
   2416     args = (magic_arg_s, cell)
-> 2417     result = fn(*args, **kwargs)
   2418 return result

File ~/miniconda3/envs/in3110/lib/python3.9/site-packages/IPython/core/magics/script.py:153, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell)
    151 else:
    152     line = script
--> 153 return self.shebang(line, cell)

File ~/miniconda3/envs/in3110/lib/python3.9/site-packages/IPython/core/magics/script.py:305, in ScriptMagics.shebang(self, line, cell)
    300 if args.raise_error and p.returncode != 0:
    301     # If we get here and p.returncode is still None, we must have
    302     # killed it but not yet seen its return code. We don't wait for it,
    303     # in case it's stuck in uninterruptible sleep. -9 = SIGKILL
    304     rc = p.returncode or -9
--> 305     raise CalledProcessError(rc, cell)

CalledProcessError: Command 'b'declare -r r=10            \necho $r\nr=5\n'' returned non-zero exit status 1.

Bash also support array type

%%bash
declare -a array=("foo" "bar") # array
echo ${array[0]}  # First array value
echo ${array[@]}  # All array values
echo ${#array}    # !!!Array size
echo ${#array[@]} # But
foo
foo bar
3
2

Flow control and functions#

For flow we shall discuss if, case and for and while loops

if statement

%%bash
declare name="Joe2"
# Here we are comparing 2 strings
if [ $name == "Joe" ]
then
  echo "Joseph"
else
  echo "Don't know"
fi
Don't know

Note [ is not a bracket(for grouping)

%%bash
declare -i -r number=10
# Here we are comparing numbers
if [ $number -gt 10 ]    # -eq -le
then
  echo "The variable is greater than 10."
else
  echo "The variable is at most 10"
fi
The variable is at most 10

We can do if-elif branching and the tests can be combined with && (AND) or || (OR). Below we also introduce parameter expansion { } to grab substrings or get length of strings and (( ) to perform some simple arithmetic

%%bash
declare name="Blph"  
# Joey

if [ $name == "Joe" ]
then
  echo Name is Joe
fi

# AND
if [ ${name: 0:1} == "J" ] && [ ${name: -1:1} == "y" ]
then
  echo The first letter is J and last is y
# OR
elif [ ${name: 0:1} == "A" ] || [ ${#name} -eq $((2+2)) ]
then
  echo The first letter is A or name length is $((1+3))
else
  [ ${#name} -eq 5 ] && echo "Don't know for 5 char long name"
fi
# NOTE: we add this "success" expression so that ipython does not complain
# about notzero exit status
# We could also use
exit 0
The first letter is A or name length is 4

⚠️ exit with status flag/number is used to indicate succesful or failed execution. 0 means success. These is a special variable which captures exit code of the preceeding call

%%bash
name="Joey"
echo 1 ${name: -1:0}
echo 2 ${name: -1:1}
echo 3 ${name: -1:2}
1
2 y
3 y

Let’s illustrate the exit status

%%bash
name="alex" # alexa
[ ${#name} -eq 5 ] && echo "Exec only when name 5"

if [ "$?" == "0" ]
then
  echo There was no problem
else
  echo There was a problem
fi
There was a problem

There are handy tests for existence of files/directories. For example we can check

%%bash
dir='scripts'

if [ -d $dir ]
then
  echo There is $dir directory
  cp -r $dir $dir.bk
  ls .             # . is a current directoy, .. is the one above
  echo
  if [ -x "$dir/hello_world.sh" ]
  then
    echo $dir contains executable
  fi
fi
There is scripts directory
allPDFs.tar
allPDFs.tar.gz
Bash - interactive lecture.ipynb
Bash - interactive lecture.slides.html
cmdline_bash.ipynb
data
figs
hello-world
hw.sh
Makefile
ottar_scicomm.pdf
results
run_and_test.sh
scripts
scripts.bk
Valgkort_2023.pdf

scripts contains executable

Here we have used the copy command ⚠️ cp with a -r recursive switch.

Other test switches

  • -h FILE - True if the FILE exists and is a symbolic link.

  • -r FILE - True if the FILE exists and is readable.

  • -w FILE - True if the FILE exists and is writable.

  • -x FILE - True if the FILE exists and is executable.

  • -d FILE - True if the FILE exists and is a directory.

  • -e FILE - True if the FILE exists and is a file, regardless of type

  • -f FILE - True if the FILE exists and is a regular file (not a directory or device)

case statement

To simplify writing nested if statements especially if branching is a case analysis/pattern matching we use case construct. This will be useful e.g. for parsing command line arguments (see later)

%%bash
place="Oslo"
case $place in
        Oslo)
            m=4;;  # ;; indicates end of case
        Bergen)
            m=5;;
        *)
            m=-1
esac
echo $m
4

for loop

Consider this setup where we run over bunch of parameters to perform a “simulation” whose result we want to store

%%bash
experiments="first second third"

dir=results

if [ -d $dir ]
then
  echo $dir exists
else
  mkdir $dir
fi

declare -i counter
counter=0

for e in $experiments
do
  echo running $e
  sleep 0.2
  touch $e.txt        # Touch/create empty file with that name
  cp $e.txt $dir      # Back it up
  rm -vf $e.txt           # Remove the original
  ((counter=counter+1))       # Increase the counter
done
echo Performed $counter experiments
results exists
running first
removed 'first.txt'
running second
removed 'second.txt'
running third
removed 'third.txt'
Performed 3 experiments

Here we have used a make directory command ⚠️ mkdir, the simulation was mocked up by ⚠️ sleep command which delays the execution by arg seconds and the results were created by ⚠️ touch. Finally we removed the original results by ⚠️ rm.

Previus example illustrates a common situation where the tasks in the loop could execute in parallel as opposed to serial as done previosly. Lunching the tasks in parallel can be done with &

%%bash
experiments="first second third"

for e in $experiments
do
  sleep 1 && echo Launched $e
done
Launched first
Launched second
Launched third

In contrast the parallel execution as expected runs quicker

%%bash
experiments="first second third"

for e in $experiments
do
  sleep 1 && echo Launched $e &
done
Launched first
Launched third
Launched second

while loop

Consider the task of counting lines in a file

%%bash
filename="./data/text.txt"
declare -i count; count=0

echo "Start counting..."
# loop over all lines of  file
while read p
do
    # echo $p
    # increase line counter
    ((count++))
done < $filename
echo "done"

echo "Number of lines in $file: $count"
wc -l $filename     # We compare with a builtin
Start counting...
done
Number of lines in : 13
13 ./data/text.txt

Color printing by setting terminal properties

%%bash
declare -i index; index=1

normal=$(tput setaf 9)

while [ $index -le 4 ]
do
    tput setaf $index          # Foreground
    tput setab $((index+1))          # Background
    echo Index is $index
    tput setaf 9   # Restore
    ((index++))
done
Index is 1
Index is 2
Index is 3
Index is 4

Functions#

Functions are declared by function keyword and called with their name followed by arguments. Note that by default variables inside the function body are global

%%bash
myresult="Nothing"

function greet
{
    echo "greet was called"
    myresult='some value'  # Global
    insideresult="What"    # Global
}

echo $myresult
greet  # Call
echo $myresult $insideresult
Nothing
greet was called
some value What

Arguments of the function can be parsed with special accessors

%%bash
function foo
{
    echo "foo called with $# arguments"  # $# is the arg count
    echo "The first one is $0" # NOTE the zero argument is not the first one from the user!
                               # $1 $2 etc  
    # Show all of them
    declare -i n; n=1
    for arg in $@; do
      echo "command-line argument no. $n is <$arg>"
      ((n++))
    done
}

foo This
echo
foo This That
foo called with 1 arguments
The first one is bash
command-line argument no. 1 is <This>

foo called with 2 arguments
The first one is bash
command-line argument no. 1 is <This>
command-line argument no. 2 is <That>

Or we can process them in an array-style

%%bash
function bar
{
    while [ $# -gt 0 ]
    do
        option=$1; # load arg into option
        shift;     # move $1 pointer
        case "$option" in
            -n)
                name=$1
                shift
                ;;  
            -a)
                age=$1; shift; ;;  
            *)
                echo "$0: invalid option \"$option\""; exit 1;;
        esac
    done
    echo $name is $age years old
}

bar -n "Jim"
#echo
bar -a 30 -n Ana
echo "Exit status "$?
echo
# bar -a 30 -b Ana
Jim is years old
Ana is 30 years old
Exit status 0

Combining bash commands#

Unix processes uses the following three standard streams as preconnected input and output communication channels:

  • user input is passed to the standard input STDIN stream

  • normal information is passed to the standard output STDOUT stream

  • error information is passed to the standard error STDERR stream.

The streams can be redireced

STDOUT to file

Bash redirects > pass STDOUT to a file:

./myscript.sh > myfile.txt   

same as above, but appends output to an existing file

./myscript.sh >> myfile.txt   
%%bash
chmod u+x ./scripts/hello_world_bang.sh
./scripts/hello_world_bang.sh > ./data/foo.txt
cat ./data/foo.txt

echo

for i in {1..5}
do
    ./scripts/hello_world_bang.sh >> ./data/foo.txt
done
cat ./data/foo.txt
hello world!

hello world!
hello world!
hello world!
hello world!
hello world!
hello world!

File to STDIN Use the < redirect to send a file to STDIN:

%%bash
wc -w < ./data/text.txt # Count the number of words and print to STDOUT 
echo

wc -w < ./data/text.txt > ./data/word_stat.txt # Same as above, but save STDOUT output to file
wc -l < ./data/text.txt > ./data/line_stat.txt 
wc -m < ./data/text.txt > ./data/char_stat.txt # Characters
echo

cat ./data/word_stat.txt ./data/line_stat.txt ./data/char_stat.txt
35


35
13
239

⚠️ wc prints the word(-w), line(-l) or character(-m) counts for a file

You can specify which stream to redirect. [STREAM]>. Valid values for STREAM is 1 for stdout, 2 for stderr and & for both.

./compile_model.sh                 # stdout and stderr are displayed on the terminal
./compile_model.sh 1> out.txt      # Redirect stdout to file, same as >
./compile_model.sh 2> err.txt      # Redirect stderr to file
./compile_model.sh &> outerr.txt   # Redirect stdout and stderr to file

Combining bash commands: Pipes

The bash pipe | connects STDOUT of one command to STDIN of another. Let’s look at some pipeline examples

  1. Print the file content (here single column data) in a sorted way

! head -5 ./data/names.txt
journal
lineage
excavate
charismatic
rank
%%bash
# Look first how many
wc -l < ./data/names.txt
cat ./data/names.txt | sort
40
aaapath
autonomy
autonomy
biscuit
charismatic
cruel
daughter
decrease
decrease
demonstrator
demonstrator
drawer
excavate
facade
joke
joke
journal
laaaandscape
letter
liability
liability
lineage
lung
magnitude
mall
man
maniac
manipulation
maximum
maximum
missile
noble
paaalace
paaaot
rank
reign
relieve
straaaeam
suggest
suggest

Note that we get a possibly a very long list. To only look at a selection we could extend the pipilene with calls to ⚠️ head, ⚠️ tail and ⚠️ more which “zoom” on beginning, end or yield chunks of the text.

!cat ./data/names.txt | sort | head -3
aaapath
autonomy
autonomy
!cat ./data/names.txt | sort | tail -3
straaaeam
suggest
suggest
# NOTE: not notebook friendly as it expects some user interaction - run in terminal
# cat ./data/names.txt | sort | more -2
  1. Introduce T junction

Buiding on the previous example we might want to only get the count of unique words. This can be accomplised by adding ⚠️ uniq to the pipiline

! cat ./data/names.txt | sort | uniq | wc -l
33

However, wouldn’t it be useful to have the list of unique words too? This is where ⚠️ tee comes in, introducing a T junction in the pipeline redirecting the partial output to a file

%%bash
cat ./data/names.txt | sort | uniq | tee ./data/unique_name.txt | wc -l
echo
head -6 ./data/unique_name.txt
33

aaapath
autonomy
biscuit
charismatic
cruel
daughter
  1. Combine with variables

As an example we wish to build news app. We begin by retrieving the data using ⚠️ curl running in -s silent mode. Let’s see what we work with

%%bash
#  
# Fall back if net is down
cat ./data/nrk_data.txt | grep newsfeed__message-title
          <h3 class="kur-newsfeed__message-title">Samlet rester fra Meierigården – søker etter levninger</h3>
          <h3 class="kur-newsfeed__message-title">Medier: Tre svensker pågrepet for drap i Bosnia</h3>
          <h3 class="kur-newsfeed__message-title">Veskeforbud på større svenske arrangementer</h3>
          <h3 class="kur-newsfeed__message-title">Nye angrep mot opprørere nord i Myanmar</h3>
          <h3 class="kur-newsfeed__message-title">EUs utenrikssjef uttrykker sjokk etter israelsk angrep mot flyktningleir</h3>
          <h3 class="kur-newsfeed__message-title">Mann i 30-årene bedro flere titalls personer med løfter om fotballbilletter</h3>
          <h3 class="kur-newsfeed__message-title">Nytt angrep mot flyktningleir i Gaza</h3>
          <h3 class="kur-newsfeed__message-title">TV 2 har anmeldt demonstranter som stormet «Skal vi danse»-scenen</h3>
          <h3 class="kur-newsfeed__message-title">Irans utenriksminister: – Konsekvensene blir alvorlige</h3>
          <h3 class="kur-newsfeed__message-title">Dømt til over fem års fengsel for grov vold i Vika</h3>

Our next step is to extract information from this text. Specifically, we are after the first headline. One possibility is to split (as in Python string) based on some delimiter and working with “fields” / elemnts of the resulting array. This is the functionality of ⚠️ cut -d DELIMITER -fINDEX

! cat ./data/nrk_data.txt | grep newsfeed__message-title | head -1 | cut -d '>' -f2 
Samlet rester fra Meierigården – søker etter levninger</h3

Following the same logic we can get

%%bash
title=`cat ./data/nrk_data.txt | grep newsfeed__message-title | head -1 | cut -d '>' -f2 | cut -d '<' -f1`
echo $title
Samlet rester fra Meierigården – søker etter levninger

At this point we know the basics and are in position to “glue” different programs together. We have seen a few already, e.g. cut, sort. In the following we cover a few more which could are useful in the scientific workflow.

Text manipulation utilities - grep, awk and sed#

grep global regular expression print#

Grep searches input file, looks at them line by line, prints if there is a match until there are no more lines. Recall our list or words

! head -10 ./data/names.txt
journal
lineage
excavate
charismatic
rank
missile
biscuit
reign
letter
paaalace

By using grep we can answer questions like:

  1. Are there lines containing “ma”?_

! grep "ma" ./data/names.txt
charismatic
magnitude
man
mall
maniac
maximum
manipulation
maximum
  1. What are the lines and line numbers containing “ma”? (-n)

! grep -n "ma" ./data/names.txt
4:charismatic
16:magnitude
21:man
23:mall
25:maniac
26:maximum
27:manipulation
34:maximum

Or which do not (-v flag for lines that do not match)

! grep -v "ma" ./data/names.txt
journal
lineage
excavate
rank
missile
biscuit
reign
letter
paaalace
paaaot
straaaeam
aaapath
laaaandscape
drawer
lung
noble
relieve
facade
daughter
cruel
suggest
decrease
demonstrator
joke
autonomy
liability
suggest
decrease
demonstrator
joke
autonomy
liability
  1. How many lines match ? (-c)

! grep -c "ma" ./data/names.txt
8

Of course we now know that the same could have been accomplised e.g. with pipes

! grep "ma" ./data/names.txt | wc -l
8

There is support for regular expression in the search word. By default it is limited.

# Use -l to print files containing lines with regexp nu* in them
! grep -l "nu*" ../*/*.ipynb
../13_scikit_learn/scikit-learn-1.ipynb
../13_scikit_learn/scikit-learn-1-presentation.ipynb
../13_scikit_learn/scikit-learn-2.ipynb
../14-julia-ml/julia_examples.ipynb
../14-julia-ml/python_examples.ipynb
../14-julia-ml/stokes_pinns.ipynb
../about/About the course.ipynb
../about/Introduction to git.ipynb
../about/Scripting vs regular programming.ipynb
../best_practices/Best practices.ipynb
../command-line/Bash - interactive lecture.ipynb
../command-line/cmdline_bash.ipynb
../mixed-programming/mixed_programming_cython.ipynb
../mixed-programming/mixed_programming_introduction.ipynb
../mixed-programming/Numba.ipynb
../mixed-programming/Profiling and Optimizing with IPython.ipynb
../numerical-python/exercises.ipynb
../numerical-python/numerical_python.ipynb
../numerical-python/python_profiling.ipynb
../pandas/API-exercises.ipynb
../pandas/Pandas_exercises.ipynb
../pandas/Pandas.ipynb
../pandas/PublicAPIs.ipynb
../production/environments.ipynb
../production/sphinx-docs.ipynb
../pull-request/Peer review assignment 5.ipynb
../python/exercises.ipynb
../python/ipython.ipynb
../python/more_python.ipynb
../python/packages_and_testing.ipynb
../python/python_summary-classes.ipynb
../python/python_summary.ipynb
../python/python_summary-typing.ipynb
../regular-expressions/regular-expressions.ipynb
../tips_and_tricks/bash_rc_alias.ipynb
../tips_and_tricks/Builtin Superheroes.ipynb
../tips_and_tricks/git_branches.ipynb
../tips_and_tricks/git_gui.ipynb
../tips_and_tricks/gitignore.ipynb
../tips_and_tricks/git_ssh_keys.ipynb
../tips_and_tricks/ipython_embed.ipynb
../tips_and_tricks/prettier_git.ipynb
../tips_and_tricks/ssh_keys.ipynb
../visualisation/altair.ipynb
../visualisation/corona-data.ipynb
../visualisation/gendata.ipynb
../visualisation/maps.ipynb
../visualisation/matplotlib.ipynb
../visualisation/visualisation.ipynb
../web/Introduction to HTML.ipynb
../web-servers/Introduction to HTML - Forms.ipynb
../web-servers/Introduction to webservers.ipynb
../web-servers/monty-hall-game.ipynb
../web-servers/monty-hall-rest.ipynb
../web/web.ipynb
../web/Web scraping.ipynb

With egrep we have the full power

! egrep -l "np|numpy|python|import" ../*/*.ipynb
../13_scikit_learn/scikit-learn-1.ipynb
../13_scikit_learn/scikit-learn-1-presentation.ipynb
../13_scikit_learn/scikit-learn-2.ipynb
../14-julia-ml/julia_examples.ipynb
../14-julia-ml/python_examples.ipynb
../14-julia-ml/stokes_pinns.ipynb
../about/About the course.ipynb
../about/Introduction to git.ipynb
../about/Scripting vs regular programming.ipynb
../best_practices/Best practices.ipynb
../command-line/Bash - interactive lecture.ipynb
../command-line/cmdline_bash.ipynb
../mixed-programming/mixed_programming_cython.ipynb
../mixed-programming/mixed_programming_introduction.ipynb
../mixed-programming/Numba.ipynb
../mixed-programming/Profiling and Optimizing with IPython.ipynb
../numerical-python/exercises.ipynb
../numerical-python/numerical_python.ipynb
../numerical-python/python_profiling.ipynb
../pandas/API-exercises.ipynb
../pandas/Pandas_exercises.ipynb
../pandas/Pandas.ipynb
../pandas/PublicAPIs.ipynb
../production/environments.ipynb
../production/sphinx-docs.ipynb
../pull-request/Peer review assignment 5.ipynb
../python/exercises.ipynb
../python/ipython.ipynb
../python/more_python.ipynb
../python/packages_and_testing.ipynb
../python/python_summary-classes.ipynb
../python/python_summary.ipynb
../python/python_summary-typing.ipynb
../regular-expressions/regular-expressions.ipynb
../tips_and_tricks/bash_rc_alias.ipynb
../tips_and_tricks/Builtin Superheroes.ipynb
../tips_and_tricks/git_gui.ipynb
../tips_and_tricks/ipython_embed.ipynb
../tips_and_tricks/prettier_git.ipynb
../visualisation/altair.ipynb
../visualisation/corona-data.ipynb
../visualisation/gendata.ipynb
../visualisation/maps.ipynb
../visualisation/matplotlib.ipynb
../visualisation/visualisation.ipynb
../web/Introduction to HTML.ipynb
../web-servers/Introduction to HTML - Forms.ipynb
../web-servers/Introduction to webservers.ipynb
../web-servers/monty-hall-game.ipynb
../web-servers/monty-hall-rest.ipynb
../web/web.ipynb
../web/Web scraping.ipynb

awk is a text pattern scanning and processing language. It operates on lines of the input file which it sees as being made of fields marked by a separator. This allows to extract information and do further processing.

Let’s use awk to extract the file permission column

%%bash
# Unpack this
ls -lrvrh
echo
ls -lrvrh | awk '{print $1}' | head -5
total 912K
drwxrwxr-x 4 mirok mirok 4,0K nov.   1 13:16 scripts.bk
drwxrwxr-x 3 mirok mirok 4,0K nov.   1 12:28 scripts
-rwxrwxr-x 1 mirok mirok  127 sep.   5 16:42 run_and_test.sh
drwxrwxr-x 2 mirok mirok 4,0K nov.   1 10:12 results
-rw-rw-r-- 1 mirok mirok  33K nov.   1 11:02 ottar_scicomm.pdf
-rw-rw-r-- 1 mirok mirok   30 sep.   5 16:42 hw.sh
drwxrwxr-x 2 mirok mirok 4,0K sep.   5 16:42 hello-world
drwxrwxr-x 2 mirok mirok 4,0K sep.   5 16:42 figs
drwxrwxr-x 2 mirok mirok 4,0K nov.   1 10:49 data
-rw-rw-r-- 1 mirok mirok 189K nov.   1 13:48 cmdline_bash.ipynb
-rw-rw-r-- 1 mirok mirok  80K nov.   1 11:02 allPDFs.tar.gz
-rw-rw-r-- 1 mirok mirok  90K nov.   1 11:02 allPDFs.tar
-rw-rw-r-- 1 mirok mirok  51K nov.   1 11:02 Valgkort_2023.pdf
-rw-rw-r-- 1 mirok mirok  180 sep.   5 16:42 Makefile
-rw-rw-r-- 1 mirok mirok 401K sep.   5 16:42 Bash - interactive lecture.slides.html
-rw-rw-r-- 1 mirok mirok  18K sep.   5 16:42 Bash - interactive lecture.ipynb

total
drwxrwxr-x
drwxrwxr-x
-rwxrwxr-x
drwxrwxr-x

Combined with grep we can get the total number of executables

%%bash
ls -l | awk '{print $1}' | egrep "x." 
echo
ls -l | awk '{print $1}' | egrep -c "x." 
drwxrwxr-x
drwxrwxr-x
drwxrwxr-x
drwxrwxr-x
-rwxrwxr-x
drwxrwxr-x
drwxrwxr-x

7

and cout their size in bytes

# Unpack
!ls -l | awk '{print $1, $5}' | egrep "x." | awk 'BEGIN {sum=0} {sum=sum+$2} END {print sum}'
24703

Of course the delimiter can be specified. For example with a CSV file from the Pandas lecture we would work with a comma separator

%%bash
head -5 ./data/used_car_sales.csv
echo
awk -F "," '{print $1}' ./data/used_car_sales.csv | head -10
"ID","pricesold","yearsold","zipcode","Mileage","Make","Model","Year","Trim","Engine","BodyType","NumCylinders","DriveType"
"121144","3500","2020","430**","101249","Chrysler","300 Series","2006","TOURING","3.5L MPI 24-VALVE HO V6","Sedan","6","RWD"
"155642","29000","2020","386**","25165","Chevrolet","Corvette","2007","","","Coupe","0",""
"59517","4000","2019","33707","210500","Chevrolet","Silverado 2500","2002","LT","6.6L Turbo Diesel Duramax","Crew Cab Pickup","8","4WD"
"56873","10010","2019","01501","21632","Chevrolet","Camaro","1987","","350","Coupe","8","RWD"

"ID"
"121144"
"155642"
"59517"
"56873"
"5550"
"46260"
"73673"
"84557"
"15603"

sed stream editor allows us to do text transformation on the input stream, e.g. filter, perform substitutions. Here we will run with -e to embed sed.

The first usecase we consider is sed -e 's/pattern/substitute/' file where we run ins s substitution mode. sed with consume the stream and for each mathc on a line peform the substition.

!head -8 ./data/names_columns.txt
journal       maa
lineage	      lineage	      
excavate      excavate      
charismatic   charismatic   
rank	      maa
missile	      missile	      
biscuit	      biscuit	      
reign	      reign	      
! sed -e 's/ma*/XXX/g' ./data/names_columns.txt | head -8
journal       XXX
lineage	      lineage	      
excavate      excavate      
charisXXXtic   charisXXXtic   
rank	      XXX
XXXissile	      XXXissile	      
biscuit	      biscuit	      
reign	      reign	      

Note that /g above stands for greedy execution.Note that /g above stands for greedy execution. Can you spot the difference?

! sed -e 's/ma*/XXX/' ./data/names_columns.txt | head -8
journal       XXX
lineage	      lineage	      
excavate      excavate      
charisXXXtic   charismatic   
rank	      XXX
XXXissile	      missile	      
biscuit	      biscuit	      
reign	      reign	      

We can redirect the output to a new file with

sed -e 's/ma*/XXX/g' ./data/names_columns.txt > ./data/names_modif.txt

or perform the substituion in place

sed -e -i 's/ma*/XXX/g' ./data/names_columns.txt

The patterns can be full on regular expressions. Let’s use sed to hide numbers from the phone book (where we pretend that all numbers have only 3 digits)

! head -5 ./data/contacts.txt
# This 
# is a 
# comment
123 joe
333 miro
! sed -e 's/[0-9][0-9][0-9]/xxxy/g' ./data/contacts.txt
# This 
# is a 
# comment
xxxy joe
xxxy miro
ana
xxxy peter
lucy xxxy

Another usecase is to perform an action on a match. First action we will use is p for print. Let’s print all the directories using sed

!ls -l | sed -n -e '/^d/ p' # vs no -n
drwxrwxr-x 2 mirok mirok   4096 nov.   1 10:49 data
drwxrwxr-x 2 mirok mirok   4096 sep.   5 16:42 figs
drwxrwxr-x 2 mirok mirok   4096 sep.   5 16:42 hello-world
drwxrwxr-x 2 mirok mirok   4096 nov.   1 10:12 results
drwxrwxr-x 3 mirok mirok   4096 nov.   1 12:28 scripts
drwxrwxr-x 4 mirok mirok   4096 nov.   1 13:16 scripts.bk

Another action is d for delete. Suppose you would like to remove all the comments (from say your python code)

! sed -e '/^#/ d' ./data/contacts.txt
# We could redirect with > or -i for inplace
123 joe
333 miro
ana
233 peter
lucy 222

sed also understand line numbers so we could for example delete some 10 lines of the long CSV file

%%bash 
echo size before `ls -lrvt ./data/used_car_sales.csv | awk '{print $5}'`
sed -i -e '2,20 d' ./data/used_car_sales.csv
echo size after `ls -lrvt ./data/used_car_sales.csv | awk '{print $5}'`
size before 13057047
size after 13055021

For more information see the nice summary by Matt Probert. We forgot to emphasize that ⚠️ grep, awk and sed are more additions to our family of seen programs/commands.

File manipulation utilities - find, tar and gzip#

Assume that we have run some analysis on remote machine. When the computations are done we would like to gather the data and compress them for easier transfer.

⚠️ find visits all files in a directory tree and can execute one or more commands for every file

find source [specifiers]

We can specify the name and type (regular (f)ile, (d)irectory)

! find ./scripts/ -name "hello*" -type f 
./scripts/hello_world_bang.sh
./scripts/hello_world.sh

In case the source tree is very deep it is good idea to limit the depth of the tree traveral

! find $HOME -maxdepth 2 -name "*.py" -type f 
/home/mirok/Downloads/ip_hdg_poisson.py
/home/mirok/Downloads/emi_test_single.py
/home/mirok/Downloads/rami_mesh_refine.py
/home/mirok/Downloads/train_x2.py
/home/mirok/Downloads/hdg_primal_poisson.py
/home/mirok/Downloads/stokes-3d.py
/home/mirok/Downloads/clement.py
/home/mirok/Downloads/stokes-3d(1).py
/home/mirok/Downloads/darcy_robin_dirichlet.py
/home/mirok/Downloads/single_test.py

The name specifier can combine several filters

# Or find all log and PDF files
! find $HOME -maxdepth 1 \( -name '*.log' -o -name '*.pdf' \) -type f
/home/mirok/ottar_scicomm.pdf
/home/mirok/Valgkort_2023.pdf
/home/mirok/burgas_vienna.pdf
/home/mirok/ottar_scicomm.log

We can also run a command for each file:

find rootdir -name filenamespec -exec command {} \; -print
# {} is the current filename

Let’s use this to print a more detailed info about the file

!find $HOME -maxdepth 1 \( -name '*.log' -o -name '*.pdf' \) -type f  -exec ls -lrvt {} \;
-rw-rw-r-- 1 mirok mirok 33646 aug.  23 12:44 /home/mirok/ottar_scicomm.pdf
-rw-rw-r-- 1 mirok mirok 52093 aug.   2 10:31 /home/mirok/Valgkort_2023.pdf
-rw-rw-r-- 1 mirok mirok 28948 juni  23 09:27 /home/mirok/burgas_vienna.pdf
-rw-rw-r-- 1 mirok mirok 11409 aug.  23 12:44 /home/mirok/ottar_scicomm.log

We can perform several actions. Below we copy cp the file in addition to printing some more info.

%%bash
find $HOME -maxdepth 1 \( -name '*.log' -o -name '*.pdf' \) -type f -size +30k  -exec ls -lrvt {} \; -exec cp  "{}" . \;
ls *.pdf
-rw-rw-r-- 1 mirok mirok 33646 aug.  23 12:44 /home/mirok/ottar_scicomm.pdf
-rw-rw-r-- 1 mirok mirok 52093 aug.   2 10:31 /home/mirok/Valgkort_2023.pdf
ottar_scicomm.pdf
Valgkort_2023.pdf

Note that we have narrowed the search by a size specifier. The unit above is k(ilobytes).

Now that we can find things. Let’s compress them.

The ⚠️ tar command can pack single files or all files in a directory tree into one file, which can be unpacked later.

tar -cvf myfiles.tar mytree file1 file2
#           dest     sources
# options:
# c: pack, v: list name of files, f: pack into file

# unpack the mytree tree and the files file1 and file2:
tar -xvf myfiles.tar

# options:
# x: extract (unpack)

The tarfile can be compressed with ⚠️ gzip

gzip mytar.tar
# result: mytar.tar.gz

Let’s deal with these PDFs that are lying around

%%bash 
[ -e allPDFs.tar ] && rm allPDFs.tar
[ -e allPDFs.tar.gz ] && rm allPDFs.tar.gz 

tar -cvf allPDFs.tar `find . -name '*.pdf' -print`
gzip -k allPDFs.tar
echo
ls -lrvth allPDFs.*
./ottar_scicomm.pdf
./Valgkort_2023.pdf

-rw-rw-r-- 1 mirok mirok 80K nov.   1 11:02 allPDFs.tar.gz
-rw-rw-r-- 1 mirok mirok 90K nov.   1 11:02 allPDFs.tar

Here we have ran gzip with -k keep flag, otherwise the tar file would be removed.

We started this section assuming the scenario that we find ourselves on some remote machine. How do we get there?

Remote connection utilities#

Here are some commands that come in handy when working with remote machines. They are all ⚠️

  • ping is the machine connected?

  • ssh to connect over SSH, -X or -Y switch for window forwarding, i.e. graphics

ssh username@hostname
  • scp secured copy, -r for directories

ssh username@hostname:/path/to/source destination
  • hostname how is the machine called?

  • whoami what is my user name

  • who who else is logged in

  • ps what are the running processes

  • top see how much resources are used

We demo most of the above commands outside in the terminal. We make one exception below to see some of the concepts discussed today in action

%%bash
# evalApply is name of my machine and I have SSH server runing on it 
machine=evalApply
ping $machine -c 1 &> /dev/null

if [ $? -gt 0 ]; then
    echo Connection to $machine cannot be established
else
    echo Connection to $machine can be established
fi
Connection to evalApply can be established

Plotting utilities#

Now that we have data we may want to do some visual exploration. One option is to GNUPlot. Note that the program does not ship with Ubuntu by default and needs to be installed. Gnuplot offers interactive plotting (somewhat like building up the plot in ipython). It can also exacure scripts. For example, below is a rather intuite way of producing a plot from data

plot "data1_leg.txt" using 1:2 title 'L0' with linespoints lt 3 lc rgb 'red', \
     "data2_leg.txt" using 1:3 title 'L1' with linespoints

This can be entered on a prompt when gnuplot is running

gnuplot
gnuplot> COMMANDS HERE

or if we have stored the source in a file, say foo.txt, we can get the plot by gnuplot -p foo.txt. Nice feature of GNUPlot is the ability to generate plots for LaTex.

Note that GNUPlot is not limited to line plots, cf. the gallery of examples

!./scripts/tori.gplot
/bin/bash: ./scripts/tori.gplot: /usr/bin/gnuplot: bad interpreter: No such file or directory