-
Introduction to systems programming
- In this week, we will cover the following topics:
- Introduction to general operating system functionality.
- Introduction to the UNIX and Linux Operating systems.
- Overview of the UNIX/LINUX Architecture.
- Overview of Shells, and specifically the bash shell used on the course.
-
1.1 Overview
1.1.1 Readings
- The Linux Command Line; 2013; W. E. Shotts Jr (http://linuxcommand.org/tlcl.php)
- Shell Scripting: Expert Recipes for Linux, Bash, and More; 2011; Steve Parker; Wrox; ISBN: 978-0-470-02448-5
- C Primer Plus 6th Ed; 2014; Stephen Prata; Addison Wesley; ISBN-13: 9780-0-321-92842-9
- Advanced Programming in the UNIX Environment 3rd Ed.; 2013; Stevens & Rago; Addison Wesley; ISBN-13: 978-0-321-63773-4
- The Linux Programming Interface; Michael Kerrisk; 2010; No Starch Press; ISBN-13: 978-1-59327-220-3
1.1.2 Topics covered during this course
- Operating Systems Concepts
- Processes
- File Systems
- Users and Security
- Threads
- TCP/UDP, IP and Sockets
Programming the bash shell in detail
C programming on Linux
Systems Programming on Linux with C
-
1.2 Operating system functionality
- Modern operating systems will provide most or all the following services:
- multitasking
- multiuser
- main memory management
- disk storage management
- inter-process communication mechanisms
- windowing environment - GUI
- I/O device control and various internal/external bus standards (e.g. SCSI, USB)
- a "low-level" system call interface accessible
- API libraries
- file utility programs
- command shells
- miscellaneous systems tools: assemblers, scripting languages, compilers, mail, editors...
- programming tools
- system administration and accountancy tools
- communication protocols e.g. TCP/IP
-
1.3 UNIX
1.3.1 Overview
Been around since 1968.
- Main use today is enterprise servers hosting web servers, file servers, databases, proprietary mission critical (legacy) software.
- in decline - Gartner: 16% server market share in 2012 declining to 9% by 2017
- appealing for highly resilient and reliable enterprise systems running on proprietary hardware (proprietary CPUs)
- legacy systems
- not going away yet but in general decline...
1.3.2 Vendors
IBM - AIX on POWER CPUs
HP - HP-UX on PA-RISC
Oracle (Sun Microsystems) - Solaris on SPARC
Apple OS X - Intel x64 (usually on Mac but also on mainstream PC hardware to a limited extent)
1.3.3 The GNU Project
In the mid-1980s Richard Stallman started the GNU project intended to provide a free UNIX implementation.
This initially encouraged the cooperative development of software tools for existing proprietary UNIX versions.
Under the terms of the GNU General Public License (GPL) software produced must be made available as source code and freely distributable. GPL licensing applies also to any subsequent modifications thereof.
- Popular GNU Project software includes the GNU compiler collection (including the C compiler), glibc (the GNU C library) and the bash shell.
- … all software used in this course.
-
1.4 Linux
By the early 1990s virtually all UNIX software tools had been (re)implemented as GNU project equivalents but there was no completed kernel.
The kernel part of the project was dropped and Linus Torvald’s Linux kernel was adopted under GPL instead.
therefore strictly “Linux” should only mean the kernel rather than the whole OS...
The Linux kernel source was engineered to be independent of a specific CPU but runs mostly on x86-64 PCs and servers.
also SPARC, POWER, HP PA-RISC...
Linux again conforms to the wider UNIX POSIX standards.
The bottom line is that from both a programmer’s and administrator’s point of view Linux looks and behaves like mainstream UNIX.
The industry has an enormous amount of IT support and software development professionals with UNIX experience who can learn and use Linux very quickly.
It is attractive to many (including government) to have a free, community created OS which can run on mainstream, cheap hardware away from the control of a single company.
The main future for UNIX/Linux appears to be Linux VMs for server software dynamically created in the cloud.
e.g. on Amazon Web Services (AWS) or Microsoft Azure
1.4.1 Linux Distros
- The term “Linux” is commonly used to mean the kernel, plus a wide range of other software (tools and libraries) that together make a complete OS.
- in the very early days of Linux, the user was required to assemble all of this software, create a file system, and correctly place and configure all of the software on that file system.
- this demanded considerable time and expertise.
- as a result, a market opened for Linux distributors, who created packages (distributions) to automate most of the installation process, creating a file system and installing the kernel and other required software for that distribution’s intended purpose.
As of July 2016 on 🔗 http://distrowatch.com/ the 3 most popular distros are:-
Debian
Ubuntu
- Mint:
- based on Ubuntu but a more complete desktop experience.
- has an option to use the lightweight Xfce desktop (better for use on a VM as less resource hungry)
- the most popular distro currently
- in the labs....
-
1.5 UNIX/Linux Architecture
- The kernel manages the machines hardware:
- memory resident after booting.
- is secure.
- contains the bodies of system calls which you can call from your program.
- handles I/O through device drivers.
- creates and manages processes.
- written in C and assembler.
- OS applications you write are written in C and call system:
- ls, ps, cat, man etc. etc. are all written in C.
- so is the shell (bash).
- Although it is possible to call system calls directly from your application it is easier to do so through library routines which add some extra functionality:
- again, these libraries are used in a C program.
- Alternatively, new applications can be written by plugging together the existing OS applications using the programming features of a shell:
- e.g. the bash shell
-
1.6 Shells generally
- UNIX/Linux will usually support several shell alternatives.
- these support textual interaction with the computer through terminals
Each will have a scripting language allowing system administration scripts to be written.
the default shell is bound to your account details
- Alternatives around today are:-
- POSIX shell (sh or dash on modern Linux)
- bash
- an advanced modern shell maintaining backwards compatibility to the Korn Shell and earlier Bourne Shell, itself somewhat based on Algol 68 syntax
- csh – the “C-shell”
- the shell scripting parts are more based on a C-like syntax
1.6.1 Lab Material
Inevitably what is presented in a module of this nature is a subset of important points with some detail in the important areas.
In both bash and C programming there are more details, features, way to do it etc. than can be possibly presented or that you will be able to take-in in the time available.
- With bash you will not be given explanations of what a lot of commands do.
- You will get directed to learning resources where you can explore and learn the details.
- in particular but not exclusively the lab material
- it is your responsibility to go and learn
-
1.7 Shells: the bash shell (we will be using this)
5 different “command types":
- Standard commands are programs e.g. type find
- therefore these will have an entry in the file system
- a typical place would be /bin or /usr/bin but could be anywhere...
- these create a child process when run – i.e. fork a process (see later)
- include shell script programs
- Builtins are for efficiency part of the shell implementation.
- include pwd, printf, echo, cd, getopts...
- see:- http://www.thegeekstuff.com/2010/08/bash-shell-builtin-commands/
- to list: compgen -b
- Functions are a way to build reusable units of shell script code.
- no child process is created when they are run
- usually added by some tools on installation so some already there
- or could be created by you in your scripts
- try: declare –f, declare -F
- Keywords are built-in shell script programming constructs and are reserved words e.g. for, while, do, select and !
- to list: compgen -k
- Aliases will be set up for some common commands.
- can be set for commands and builtins – see alias
- e.g. type ls
- to list: compgen -a
- The sequence of directories to be searched for a program is determined by the PATH variable.
- i.e. echo $PATH
- /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
- : is the path separator
it will not include the current working directory (.) by default for security reasons and hence the need for ./myscript.sh etc.
1.7.1 Globbing and Regular Expressions
- In shells the * and ? wildcard characters are used for file name expansion, sometimes also called globbing.
- there are a whole lot of other globbing constructs introduced in the lab material
- However regular expressions are used by other tools.
- in a limited sense they may look the same as globbing but they are not
- regular expressions in the shell are often associated with the grep family of commands to find matching text in a file or in output
For more differences and a bit more on regular expressions: http://www.tldp.org/LDP/GNU-Linux-Tools-Summary/html/x11655.htm
- cat myfile | grep '^s.*n$'
- this command searches the file myfile for lines starting with an 's' and ending with an 'n' and prints them to the standard output
- note the regular expression is in quotes to switch off globbing
1.7.2 Where things are
- You will need to know where things are on your computer:
- find: extremely useful with many options.
- find / -name <filename> 2>/dev/null
- go work out what this does!
- find will be discussed in Lecture 4
- whereis and which:
- whereis –b <command> - where executable is
- which <command> - searches your path as defined in $PATH and returns where the executable for the command to be executed is located
- locate:
- locate <command>
- a lot more sophisticated than whereis and is based on an installation database
- what's installed:
- dpkg –l (distro specific)
- See also:
- 🔗 http://linux.about.com/od/commands/fl/How-To-Find-Linux-Commands-And-Programs-Using-Whereis.htm
1.7.3 Linux Manual - Sections of the Manual Pages
- The manual sections are traditionally defined as follows: -
- User commands (Programs) Those commands that can be executed by the user from within a shell.
- System calls Those functions which wrap operations performed by the kernel.
- Library calls All library functions excluding the system call wrappers (Most of the libc functions).
- Special files (devices) Files found in /dev which allow to access to devices through the kernel.
- File formats and configuration files Describes various human-readable file formats and configuration files.
- Games Games and funny little programs available on the system.
- Overview, conventions, and miscellaneous Overviews or descriptions of various topics, conventions and protocols, character set standards, the standard filesystem layout, and miscellaneous other things.
- System management commands Commands like mount(8), many of which only root (the superuser) can execute.
- Note usage:-
- man passwd – the passwd command is passwd(1) – 1 is default section
- man 5 passwd – the password file format i.e. passwd(5)
- man 2 syscalls – list all the system calls
- Other than man command there are:
- whatis – display one-line manual page descriptions
- whatis find
- apropos – search the manual pages
- apropos find
- info – read the additional info documents
- info find
- --help
- find --help
- note the double dash “--”
- may not be any though - dependent on command
- Use stackoverflow.com
- someone will have asked already what you want to know...
1.7.4 The Concept of a Process
- A process is defined as an executing image of a program including its data and register values and stack.
- more about this in later lectures but it also a useful concept in understanding the shell
- A shell like bash is a C program which starts as a process when you create a terminal window.
- to emphasise : it’s just a compiled C program itself as are all the other programs like find and file etc.
- remember some commonly used functionality is implemented in the shell program itself like builtins and these don’t fork a new process as described in the next slides this is to speed up execution
- When a command is executed the shell forks a child process to run the new command.
- the shell parent process is, by default, suspended until the child exits
- this behaviour is therefore synchronous.
- the child process could be a command like find, your own compiled C program or a script which will be executed with an interpreter
- the script could typically be a Python or Perl script or could be a shell script program e.g. a bash script etc.
- When a command is executed the shell forks a child process to run the new command.
- values are passed into the child process via command line arguments
- or picked up from environment variables that have been marked with export
- new variables and changed values are lost when the child process exits
- the child process can also access standard files and special files connected to the terminal (standard output etc.)
- the child process will return an exit status value to the parent process on termination which can be set and picked up programmatically