Scripting vs regular programming#

What is a script?#

  • Very high-level, often short, program written in a high-level scripting language

  • Scripting languages: ../../_images/scripting_languages.svg

  • This course: Python + a taste of Bash (Unix shell)

Characteristics of a script#

  • Glue other programs together and automate tasks

    • Often special-purpose code

    • Extensive text processing

    • File and directory manipulation

    • Many small interacting scripts may yield a big system

  • Perhaps a special-purpose GUI on top

  • (Sometimes) portable across Unix, Windows, Mac

  • Interpreted program (no compilation+linking)

Why not stick to Java or C/C++?#

Features of scripting languages compared to Java, C/C++ and Fortran:

  • shorter, more high-level programs

  • much faster software development

  • more convenient programming

  • you feel more productive:

    • no variable declarations, but lots of consistency checks at run time

    • technical details are hidden: no pointers, automatic garbage collection, …

    • easy to combine software components and interact with the OS

    • lots of standardized libraries and tools

Scripts yield short code#

Consider reading real numbers from a file, where each line can contain an arbitrary number of real numbers:

1.1  9   5.2
1.762543E-02
0 0.01 0.001
        
   9 3 7

Python solution:

F = open("myfile.txt")
n = F.read().split()
print(n)
['1.1', '9', '5.2', '1.762543E-02', '0', '0.01', '0.001', '9', '3', '7']

Using regular expressions (1)#

Suppose we want to read complex numbers written as text

(-3, 1.4) or (-1.437625E-9, 7.11) or (  4, 2 )

Python solution:

import re

m = re.search(r"\(\s*([^,]+)\s*,\s*([^,]+)\s*\)", "(  -3,1.4)")
re, im = (float(x) for x in m.groups())
print("Real", re, " Img: ", im)
Real -3.0  Img:  1.4

(This will only find the first match of the regular expression, use re.findall to return a list of all matches.)

Using regular expressions (2)#

Regular expressions like

    \(\s*([^,]+)\s*,\s*([^,]+)\s*\)

constitute a powerful language for specifying text patterns

Doing the same thing, without regular expressions, in Fortran and C requires quite some low-level code at the character array level

Remark: we could read pairs (-3, 1.4) without using regular expressions,

s = "(-3,  1.4 )"
re, im = s[1:-1].split(",")

Script variables are not declared#

Example of a Python function:

import os


def debug(leading_text, variable):
    if os.environ.get("MYDEBUG", "0") == "1":
        print(leading_text, variable)

Dumps any printable variable (number, list, hash, heterogeneous structure)

Printing can be turned on/off by setting the environment variable MYDEBUG

The same function in C++#

Templates can be used to mimic dynamically typed languages

Not as quick and convenient programming:

template <class T>
void debug(std::ostream &o, const std::string &leading_text,
           const T &variable) {
  char *c = getenv("MYDEBUG");
  bool defined = false;
  if (c != NULL) {               // if MYDEBUG is defined ...
    if (std::string(c) == "1") { // if MYDEBUG is true ...
      defined = true;
    }
  }
  if (defined) {
    o << leading_text << " " << variable << std::endl;
  }
}

The relation to OOP#

Object-oriented programming can also be used to parameterize types.

  • Introduce base class A and a range of subclasses, all with a (virtual) print function;

  • Let debug work with variable as an A reference;

  • Now debug works for all subclasses of A.

Advantage: complete control of the legal variable types that debug are allowed to print (may be important in big systems to ensure that a function can only make transactions with certain objects)

Disadvantage: much more work, much more code, less reuse of debug in new occasions

Flexible function interfaces (1)#

User-friendly environments (Python, Matlab, Maple, Mathematica, S-Plus, …) allow flexible function interfaces

First try:

# f is some data
plot(f)

More control of the plot:

plot(f, label='f', xrange=[0,10])

More fine-tuning:

plot(f, label='f', xrange=[0,10], title='f demo',
     linetype='dashed', linecolor='red')

Flexible function interfaces (2)#

In C++, some flexibility is obtained using default argument values, e.g.,

void plot(const double[] & data, const char[] label ="",
          const char[] title = "", const char[] linecolor = "black");

Limited flexibility, since the order of arguments is significant.

Python uses keyword arguments = function arguments with keywords and default values, e.g.,

def plot(data, label='', xrange=None, title='',
         linetype='solid', linecolor='black', ...)

The sequence and number of arguments in the call can be chosen by the user.

Classification of languages (1)#

Many criteria can be used to classify computer languages.

Dynamically vs statically typed (or type-safe)#

Python (dynamic):

c = 1            # c is an integer
c = [1,2,3]      # c is a list

C (static):

double c; c = 5.2;  // c can only hold doubles
c = "a string...";  // compiler error

Classification of languages (2)#

Weakly vs strongly typed#

Perl (weak):

$b = '1.2'
$c = 5*$b;   # implicit type conversion: '1.2' -> 1.2

Python (strong):

import math

b = "1.2"
# c = 5*b                 # legal, but probably not the result you want
# c = math.exp(b)         # illegal, no implicit type conversion
c = math.exp(float(b))  # legal
print(c)
3.3201169227365472

Classification of languages (3)#

More classifications:

  • Interpreted vs compiled languages

  • High-level vs low-level languages (Python-C)

  • Scripting vs system languages

Turning files into code (1)#

Code can be constructed and executed at run-time

Consider an input file with the syntax

a = 1.2
no of iterations = 100
solution strategy = 'implicit'
c1 = 0
c2 = 0.1
A = 4

How can we read this file and define variables a, no_of_iterations, solution_strategy, c1, c2, A with the specified values?

Turning files into code (2)#

The answer lies in this short and generic code:

file = open("inputfile.dat")
for line in file:
    variable, value = line.split("=")  # separate the statement by the = sign
    variable = variable.strip()  # strip leading and trailing blanks
    variable = variable.replace(" ", "_")  # replace blanks by _
    exec(variable + "=" + value)  # magic...
print(A)  # noqa
4

This cannot be done in Fortran, C or C++! Why?

Scripts can be slow#

Perl and Python scripts are first compiled to byte-code.

The byte-code is then interpreted.

Text processing is usually as fast as in C.

Loops over large data structures might be very slow:

for i in range(len(A)):
    A[i] = ...

Fortran, C and C++ compilers are good at optimizing such loops at compile time and produce very efficient assembly code (e.g. 100 times faster).

Fortunately, long loops in scripts can easily be migrated to Fortran or C.

Scripts may be fast enough#

Read 100 000 (x,y) data from file and write (x,f(y)) out again

  • Pure Python: 4s

  • Pure Perl: 3s

  • Pure Tcl: 11s

  • Pure C (fscanf/fprintf): 1s

  • Pure C++ (iostream): 3.6s

  • Pure C++ (buffered streams): 2.5s

  • Numerical Python modules: 2.2s (!)

  • Remark: in practice, 100 000 data points are written and read in binary format, resulting in much smaller differences

When scripting is convenient#

  • The application’s main task is to connect together existing components

  • The design of the application code is expected to change significantly

  • The application performs extensive string/text manipulation

  • The application can be made short if it operates heavily on list or hash structures

  • CPU-time intensive parts can be migrated to C/C++ or Fortran

When to use C, C++, Java, Fortran#

  • Does the application implement complicated algorithms and data structures?

  • Does the application manipulate large datasets so that execution speed is critical?

  • Are the application’s functions well-defined and changing slowly?

  • Will type-safe languages be an advantage, e.g., in large development teams?

Some personal applications of scripting#

  • Get the power of Unix also in non-Unix environments

  • Automate manual interaction with the computer

  • Customize your own working environment and become more efficient

  • Increase the reliability of your work (what you did is documented in the script)

  • Have more fun!

Some business applications of scripting#

  • Many business sectors make use of scripting language internally:

    • Financial sector (Model prototyping, R&D),

    • Mobile App & Web companies (Development language)

    • Engineering (Setup of simulation models, R&D)

Python/bash knowledge is a welcomed skill for many jobs.

What about mission-critical operations?#

  • Scripting languages are free

  • What about companies that do mission-critical operations?

  • Can we use Python when sending people (or robots) to Mars?

  • Who is responsible for the quality of products?

The reliability of scripting tools#

  • Scripting languages are developed as a world-wide collaboration of volunteers (open source model)

  • The open source community as a whole is responsible for the quality

  • There is a single repository for the source code (plus mirror sites)

  • This source is read, tested and controlled by a very large number of people (and experts)

  • The reliability of large open source projects like Linux, Python, and Perl appears to be very good - at least as good as commercial software