Chapter 19. Using A Shell

Table of Contents

Introduction
Chaining Commands
Running Multiple Commands
Grouping Commands
Storing Commands
Advanced Shell Scripting
Want more?

Introduction

As the Linux basics have been unfolded in the previous chapters, I want to close off this book with an introduction to shell scripting. Shell scripting is one of Linux' (and Unix) powerful features. It allows you to automate various tasks and is often used to help you manage your system. One of Linux' main principles is "keep it simple", and by providing simple tools that do simple jobs (but do them well) you can build powerful systems. The downside of simple tools is that you need plenty of them to do certain tasks, and shell scripts help you glue together these various tools.

Before we launch off our shell scripting, we need to cover a few basic items of running commands in Linux first. Some of them have been described earlier in this book so it might not seem too unfamiliar for most of you.

Chaining Commands

Running Multiple Commands

When you are inside a shell, you can run one or more commands after another. We have seen such use cases previously, for instance to view the content of a directory and then go inside a subdirectory:

~$ mkdir Documents
~$ cd Documents

In the previous example, multiple commands are ran as single commands one after another. But with a shell, you can execute multiple on a single line. To do so, you separate the commands with a semicolon:

~$ mkdir Documents; cd Documents

You can add as many commands as you want. The shell will execute each command when the previous one has finished. For instance:

~# emerge --sync; layman -S; emerge -uDN @world; revdep-rebuild -p

is the equivalent of

~# emerge --sync
~# layman -S
~# emerge -uDN @world
~# revdep-rebuild -p

Nothing too powerful about that, but wait, there's more. Commands in a Unix environment always finish with a particular return code. This return code tells whoever called the command if the command exited successfully (return code is 0) or not (return code is not 0) and why not. This return code, as you might have guessed, is an integer, usually within the range of 0...255. You can see the return code of a command by showing the special shell variable $? (which can be done using echo). Let's see a simple example:

~$ ls
Documents     tasks.txt     bookmarks.html
~$ echo $?
0
~$ ls -z
ls: invalid option -- 'z'
Try 'ls --help' for more information
~$ echo $?
2

Conditional Execution

The use of return codes allows us to investigate conditional execution. For instance, you want to update Portage and then update your system, but the latter should only start if the update of Portage finished successfully. If you run commands manually, you would wait for the Portage update to finish, look at its output, and then decide to continue with the update or not. By making use of the return code of emerge --sync; we can run the following command only if the synchronization finished with return code 0. Using shell scripting, you can use && as separator between such commands.

~# emerge --sync && emerge -uDN @world

By using &&, you instruct the shell only to execute the following command if the previous one is successful. Now what about when the command finishes with an error?

Well, the shell offers the || separator to chain such commands. Its use is less pertinent in regular shell operations, but more in shell scripts (for instance for error handling). Yet, it is still useful for regular shell operations, as the following (perhaps too simple, yet explanatory) example shows:

~$ mount /media/usb || sudo mount /media/usb

The command sequence tries to mount the /media/usb location. If for any reason this fails (for instance, because the user does not have the rights to mount), retry but using sudo.

Input & Output

Now on to another trick that we have seen before: working with output and input. I already explained that we can chain multiple simple tools to create a more useful, complex activity. One of the most often used chaining methods is to use the output of one command as the input of another. This is especially useful for text searching and manipulation. Say you want to know the last 5 applications that were installed on your system. There are many methods possible to do this, one of them is using /var/log/emerge.log (which is where all such activity is logged). You could just open the logfile, scroll to the end and work backwards to get to know the last 5 installed applications, or you can use a command like so:

~$ grep 'completed emerge' /var/log/emerge.log | tail -5
1283033552:   ::: completed emerge (1 of 1) app-admin/cvechecker-0.5 to /
1283033552:   ::: completed emerge (1 of 3) dev-perl/HTML-Tagset-3.20 to /
1283033552:   ::: completed emerge (2 of 3) dev-perl/MP3-Info-1.23 to /
1283033552:   ::: completed emerge (3 of 3) app-pda/gnupod-0.99.8 to /
1283033552:   ::: completed emerge (1 of 1) app-admin/cvechecker-0.5 to /

What happened is that we first executed grep, to filter out specific string patterns out of /var/log/emerge.log. In this case, grep will show us all lines that have "completed emerge" in them. Yet, the question was not to show all installations, but the last five. So we piped the output (that's how this is called) to the tail application, whose sole task is to show the last N lines of its input.

Of course, this can be extended to more and more applications or tools. What about the following example:

~$ tail -f /var/log/emerge.log | grep 'completed emerge' | awk '{print $8}'
app-admin/cvechecker-9999
dev-perl/libwww-perl-5.836
...

In this example, tail will "follow" emerge.log as it grows. Every line that is added to emerge.log is shown by tail. This output is piped to grep, which filters out all lines but those containing the string 'completed emerge'. The results of the grep operation is then piped to the awk application which prints out the 8th field (where white space is a field separator), which is the category/package set. This allows you to follow a lengthy emerge process without having to keep an eye on the entire output of emerge.

The | sign passes through the standard output of a process. If you want to have the standard error messages passed through as well, you need to redirect that output (stderr) to the standard output first. This is accomplished using the "2>&1" suffix:

~# emerge -uDN @world 2>&1 | grep 'error: '

In the above example, all lines (including those on standard error) will be filtered out, except those having the "error: " string in them. But what is that magical suffix?

Well, standard output, standard error and actually also standard input are seen by Unix as files, but special files: you can't see them when the application has finished. When an application has a file open, it gets a file descriptor assigned. This is a number that uniquely identifies a file for a specific application. The file descriptors 0, 1 and 2 are reserved for standard input, standard output and standard error output.

The 2>&1 suffix tells Unix/Linux that the file descriptor 2 (standard error) should be redirected (>) to file descriptor 1 (&1).

This also brings us to the redirection sign: >. If you want the output of a command to be saved in a file, you can use the redirect sign (>) to redirect the output into a file:

~# emerge -uDN @world > /var/tmp/emerge.log

The above command will redirect all standard output to /var/tmp/emerge.log (i.e. save the output into a file). Note that this will not store the error messages on standard error in the file: because we did not redirect the standard error output, it will still be displayed on-screen. If we want to store that output as well, we can either store it in the same file:

~# emerge -uDN @world > /var/tmp/emerge.log 2>&1 

or store it in a different file:

~# emerge -uDN @world > /var/tmp/emerge.log 2> /var/tmp/emerge-errors.log

Now there is one thing you need to be aware: the redirection sign needs to be directly attached to the output file descriptor, so it is "2>" and not "2 >" (with a space). In case of standard output, this is implied (> is actually the same as 1>).

Grouping Commands

Shells also offer a way to group commands. If you do this, it is said that you create a sub-shell that contains the commands you want to execute. Now why would this be interesting?

Well, suppose that you want to update your system, followed by an update of the file index. You don't want them to be ran simultaneously as that would affect performance too much, but you also don't want this to be done in the foreground (you want the shell free to do other stuff). We have seen that you can background processes by appending a " &" at the end:

~# emerge -uDN @world > emerge-output.log 2>&1 &

However, chaining two commands together with the background sign is not possible:

~# emerge -uDN @world > emerge-output.log 2>&1 &; updatedb &
bash: syntax error near unexpected token ';'

If you drop the ";" from the command, both processes will run simultaneously. The grouping syntax comes to the rescue:

~# (emerge -uDN @world > emerge-output.log 2>&1; updatedb) &

This also works for output redirection:

~# (emerge --sync; layman -S) > sync-output.log 2>&1

The above example will update the Portage tree and the overlays, and the output of both commands will be redirected to the sync-output.log file.

Storing Commands

All the above examples are examples that are run live from your shell. You can however write commands in a text file and execute this text file. The advantage is that the sequence of commands is manageable (if you make a mistake, you edit the file), somewhat documented for the future (how did I do that again?) and easier to use (just a single file to execute rather than a sequence of commands).

A text file containing all this information is called a shell script. As an- example, the following text file will load in the necessary virtualisation modules, configure a special network device that will be used to communicate with virtual runtimes, enable a virtual switch between the virtual runtimes and the host system (the one where the script is ran) and finally enables rerouting of network packages so that the virtual runtimes can access the Internet:

modprobe tun
modprobe kvm-intel
tunctl -b -u swift -t tap0
ifconfig tap0 192.168.100.1 up
vde_switch --numports 4 --hub --mod 770 --group users --tap tap0 -d
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables --flush
iptables -A POSTROUTING -t nat -o eth0 -j MASQUERADE
iptables -A FORWARD -i tap0 -o eth0 -s 192.168.100.1/24 ! -d 192.168.100.1/24 -j ACCEPT
iptables -A FORWARD -o tap0 -i eth0 -d 192.168.100.1/24 ! -s 192.168.100.1/24 -j ACCEPT

Can you imagine having to retype all that (let alone on a single command line, separated with ;)

To execute the script, give it an appropriate name (say "enable_virt_internet"), mark it as executable, and execute it:

~# chmod +x enable_virt_internet
~# ./enable_virt_internet

Note that we start the command using "./", informing Linux that we want to execute the file that is stored in the current working directory.

Now on to the advantages of using shell scripts even more...

Documenting

You can easily document your steps in shell scripts. A comment starts with #, like so:

# Load in virtualisation modules
modprobe tun
modprobe kvm-intel
# Setup special networking device for communicating with the virtual environments
tunctl -b -u swift -t tap0
ifconfig tap0 192.168.100.1 up
vde_switch --numports 4 --hub --mod 770 --group users --tap tap0 -d
# Enable IP forwarding
echo 1 > /proc/sys/net/ipv4/ip_forward
# Set up "internet sharing" by allowing package forwarding towards the Internet
iptables --flush
iptables -A POSTROUTING -t nat -o eth0 -j MASQUERADE
iptables -A FORWARD -i tap0 -o eth0 -s 192.168.100.1/24 ! -d 192.168.100.1/24 -j ACCEP;
iptables -A FORWARD -o tap0 -i eth0 -d 192.168.100.1/24 ! -s 192.168.100.1/24 -j ACCEPT

Comments can also be added at the end of a line, btw.

Another, special, comment, is one that tells your operating system how the shell script should be called. There are plenty of shells in Linux (bash, bsh, zsh, csh, ...) and there are even scripting languages that also use text files for their content (perl, python, ruby, ...). Linux does not use file extensions to know how to execute or interpret a file. To help Linux determine the correct execution environment, we add a special comment at the beginning of the file (this must be the first line of the script), in this case to inform Linux that this is a bash shell script:

#!/bin/sh
# Load in virtualisation modules
modprobe tun
...

This line is quite important, and should always start with #! (also called the shebang) followed by the interpreter (/bin/bash in our case).

Introduce Loops, Conditionals, ...

A shell script like the above is still quite simple, but also error-prone. If you had your share of issues / incidents with that script, you will most likely start adding error conditions inside it.

First, make sure that we are root. We can verify this by reading the special variable $UID (a read-only variable giving the user id of the user executing the script):

#!/bin/sh

if [ $UID -ne 0 ];
then
  echo "Sorry, this file needs to be executed as root!"
  exit 1;
fi

# Load in virtualisation modules
...

The [ ... ] set is a test operator. It gives a return code of 0 (statement is true) or non-zero (statement is false). The if ... then ... fi statement is a conditional which is executed when the test operator returns true (0).

Next, we introduce a loop:

# Load in virtualisation modules
for MODULE in tun kvm-intel;
do
  modprobe $MODULE
done

What the loop does is create a variable called MODULE, and give it as content first "tun" and then "kvm-intel". The shell expands the loop as follows:

MODULE="tun"
modprobe $MODULE     # So "modprobe tun"
MODULE="kvm-intel"
modprobe $MODULE     # So "modprobe kvm-intel"

Introduce Functions

You can even made the script more readable by introducing special functions.

...

# This is a function definition. It is not executed at this point.
load_modules() {
  for MODULE in tun kvm-intel;
  do
    modprobe $MODULE
  done
}

setup_networking() {
  tunctl -b -u swift -t tap0
  ifconfig tap0 192.168.100.1 up
  vde_switch --numports 4 --hub --mod 770 --group users --tap tap0 -d
  echo 1 > /proc/sys/net/ipv4/ip_forward
}

share_internet() {
  iptables --flush
  iptables -A POSTROUTING -t nat -o eth0 -j MASQUERADE
  iptables -A FORWARD -i tap0 -o eth0 -s 192.168.100.1/24 ! -d 192.168.100.1/24 -j ACCEP;
  iptables -A FORWARD -o tap0 -i eth0 -d 192.168.100.1/24 ! -s 192.168.100.1/24 -j ACCEPT
}

# Only now are the functions called (and executed)
load_modules
setup_networking
share_internet

Functions are often used to help with error handling routines. Say that we want the script to stop the moment an error has occurred, and give a nice error message about it:

...

die() {
  echo $* >&2;
  exit 1;
}

load_modules() {
  for MODULE in tun kvm-intel;
  do
    modprobe $MODULE || die "Failed to load in module $MODULE";
  done
}
...

If at any time the modprobe fails, a nice error message will occur, like so:

~# ./enable_virt_internet
Failed to load in module tun
~# 

Advanced Shell Scripting

The above is just for starters. Much more advanced shell scripting is possible - shells nowadays have so many features that some scripts are even disclosed as full applications. You will be surprised how many scripts you have on your system. Just for starters, in /usr/bin:

~$ file * | grep -c 'ELF '
1326
~$ file * | grep -c script
469

So in my /usr/bin folder, I have 1326 binary applications, but also 469 scripts. That is more than one third of the number of binary applications!

Want more?

This is as far as I want to go with a book targeting Linux starters. You should now have enough luggage to make it through various online resources and other books, and still have a reference at your disposal for your day-to-day activities on your (Gentoo) Linux system.

I would like to thank you for reading this document, and if you have any particular questions or feedback, please don't hesitate to contact me at or on the Freenode and OFTC IRC networks (my nickname is SwifT).