07
  • 7.1 Prerequisites for this time

    All of lab 5
    Learn C the Hard Way
    Ex15 through Ex17
    Understand pointer arithmetic
    Understand string implementation and utility functions
    Other issues with pointers: NULL, dangling pointers etc., call by value and call by reference, pointers to functions
    Memory allocation and initialization from the stack and heap
    structs: use with pointers and linked lists
    array of pointers
    argv and argc
    simple file I/O - text and binary
    bitwise operators
  • 7.2 Compilers

    General summary

    Figure 1: Compilation sequence Diagram
    Figure 1: Compilation sequence

    When a program has been written, it exists in a source file. There are a number of intermediate steps (see Figure 1) in the use of this file to produce an object program:
    Lexical analyzer: encodes the source code as numerical tokens to make the rest of the processing simpler and more efficient.
    Parser: (or syntax analyser) checks the token stream against the syntax rules of the language (which are coded in) and creates a tree-like data structure of the source code (a syntax tree or parse tree). Also creates a symbol table which holds information about variables e.g. names and data-types.
    Code generator: traverses the tree to generate the object program (or object code). Also optimises code to make it as efficient as possible.

    Due to the number of steps required, compilation may create lots of temporary files during compilation, and this certainly more so in the case of C, compared with other high-level languages such as Java.

  • 7.3 The gcc compiler

    The GNU C complier is gcc, although everyone uses cc for portability, which is a symbolic link to gcc on Linux. Like all C compilers it is a one pass compiler, which means the source input is read and processed one function at a time to build a parse tree; hence the need for function prototypes...

    ELF ("Executable Linking Format) is the current object file format in current Linux and UNIX. 'Try the file command using one of your lab C source files as a parameter and a second time using the associated executable file as a parameter. You will find that the C file is reported as being ASCII Text and the executable as being an ELF binary file.

    7.3.1 Compilation stages

    compilation stages Diagram

    1. The C preprocessor copies the contents of the included header files into the source code file, expands macro code e.g. replaces symbolic constants defined using #define with their values.

    2. The expanded source code file produced by the C preprocessor is compiled into the assembly language for the target platform.

    3. The assembler code generated by the compiler is assembled into the object code for the target platform.

    4. The object code file generated by the assembler is linked together with any other compiled module code files and any libraries to produce an executable file.

  • 7.4 The C Preprocessor

    The C preprocessor is a macro processor that is invoked automatically by the C compiler to transform your program before actual compilation. A macro processor allows you to define macros, which are brief abbreviations for longer constructs. The pre-processor has a number of features:
    inclusion of header files - these are files of declarations that can be substituted into your program
    macro expansion - abbreviations for arbitrary fragments of C code, and then the C preprocessor will replace the macros with their definitions throughout the program
    conditional compilation - can include or exclude parts of the program according to various conditions
    line control - can use line control to inform the compiler of which code source file each source line originally came from

    7.4.1 The #include macro

    Header files most commonly contain already defined constants, functions and other declarations. Example usage:
    #include <stdio.h> – read the contents of the header file stdio.h.
    stdio.h: standard I/O functions for console, files

    Other important headers in the C Standard Library are ctype.h, math.h, stdlib.h, string.h, time.h. All such included files must be on the ‘include path’. There is an -Idirectory option with cc, which can be used to specify additional include directories standard include directories assumed by default, i.e., braces ('<') mean “get header file from include path”. Alternatively, an example like <#include "myheader.h"> searches for myheader.h first in the local directory and typically are your own headers or headers considered as part of the project source.

  • 7.4.2 Defining expression macros

    The main use of the #define is to define constants and therefore a classic example is:

    #define PI 3.14
    

    …but this is inline replacement – not a C language construct; many system constants are defined this way. There is also a const keyword, which is relevant because it helps identify cases where a const type is being changed unintentionally (see above). The #define can also take arguments and be treated like a function, as in the following example:

    #define add3(x,y,z) ((x)+(y)+(z))
    

    … where parentheses define the order of operations.

    7.4.3 Conditional pre-processor macros

    Conditional macros allow an if-else logic to be applied to programs. For example, #if, #ifdef, #ifndef, #else, #elif , #endif are conditional pre-processor macros that can control which lines are compiled in the source file because these lines are evaluated before the code itself is compiled. Thus, conditions must be pre-processor defines or literals. Conditional macros are used in header files to ensure declarations happen only once.

    The cc option -Dname=value sets a preprocessor define that can be used

  • 7.5 Assembly Language

    A human readable form of machine instructions, which are written as text mnemonics. This is an additional stage in C compilation because it is possible to do things with assembler that are impossible to do even in C, so this extra compilation stage is an opportunity. The compiler cc (with the appropriate switch) is also an assembler, which translates the mnemonics to their final form as binary machine instructions in ‘.o’ files.

  • 7.6 Multi-module Programming in C

    In practice, real C programming nearly always uses multiple source code files, placing function code and some globals in separate files. This isn’t an esoteric feature - you’ll see it all the time. Normally what you see are a set of source code files with functions, but with only one containing a code entry point i.e. main()function. These modules have to be compiled separately and then subsequently linked together to form an executable. Common code libraries can be deployed in known locations in the file system.

    In more detail you provide:
    object code with the functions precompiled in .o files
    these have to be linked, again using cc, with a single module containing a code entry point i.e. a main() to form an executable
    this is necessary to finally resolve function calls to actual function bodies
    the accompanying source .h files with the corresponding function prototype declarations etc.
    but not for the code module with the main(); the intention being that it only calls the functions provided by the other modules
    the source .h files have to included in the source code using them i.e. in the code file with the main()
    this is necessary so the compiler can check that the function call names, parameters and return types are all consistent in the early stages of compilation.
  • 7.7 C header files (hexdump.h)

    Look at the following example code:

    #ifndef __HEXDUMP_H__
    #define __HEXDUMP_H__
    
    void hexdump(const char *, const void *, size_t);
    void chardump(const char *, const void *, size_t);
    
    #endif
    

    The #ifndef is to ensure that the .h file is only read once and not indirectly read multiple times when using multiple source code modules. This is standard practice and naming.

    Now look at the following for actually use of the header file

    #include "hexdump.h"
    

    This is to be included in both the function definition file (here hexdump.c) and the main program file which uses the functions. Additionally need to ensure correct compilation order for dependencies between modules. See further discussion about Make in section 7.10.

  • 7.8 Code Libraries

    Code modules are obviously of use with larger projects. However as is they are not suitable for deployment and general use. Code libraries are built from a set of code object files and packaged as a single file with an accompanying set of .h files. These can then be placed in a common area of the file system and picked up at compilation automatically or they can also be located using cc switches and appropriate environment variables. They are then used like normal code modules with the cc -l flag to specify a library to link (can link > 1).

    The text here will make more sense once you understand the code in the exercises, in Lab 6, to create and use static and shared libraries.

    7.8.1 Static and shared Libraries

    The linker finds the libraries it needs and physically copies them into the executable output file that it generates. Therefore this happens at compile time. By convention static libraries are “.a” files.

    Alternatively shared libraries are sometimes called dynamic libraries. These are not physically copied in at compile time and the executable is smaller. The library still has to be linked at the final stage of compilation but is now loaded at run time. By convention shared libraries are “.so.m.n.o” files, where n etc. forms a variable length version number e.g. libc.so.6

    If multiple processes are using the shared library only one instance needs to loaded into memory by the OS and shared. Shared libraries can be updated without recompilation of the program using them (as long as they remain consistent). However shared libraries have to be deployed with the program. ldconfig is used to manage shared libraries on a host (complex).

  • 7.8.2 The C Standard Library

    On Linux the implementation of this is the GNU C Library (often called “glibc”). This is implemented in libc.so.6 usually in the /lib location somewhere. There may be multiple versions for different architectures etc. on the same host...Some functions in glibc are wrappers around system calls that also add extra functionality; for example fopen() to open a file calls system call open() and fopen() adds buffering to I/O as open() is unbuffered (see man 3 fopen).

    Using system calls requires no libraries to be linked because the Standard Library is linked automatically. However, as an optimisation, as they are so commonly used, some functions are copied directly by cc into the executable! Note, Standard Library execution is in user mode.

    Nice summary at: 🔗 http://www.tutorialspoint.com/c_standard_library/index.htm

    Which has code and and defined constants for useage of:

    <stdio.h>
    File I/O e.g. FILE including random access
    create, move remove files, temporary files, file access errors, system error messages e.g. perror()

    <ctype.h>
    testing characters e.g. isupper() etc.

    <string.h>
    memory functions to reset memory areas, copy area...
    string manipulation e.g. strcpy() etc.

    <stdlib.h>
    character conversion utilities e.g. atoi()
    random number generation
    exit functions e.g. exit()
    searching functions
    definition of size_t

    <assert.h>
    assert() for debugging diagnostics and testing

    <stdarg.h>
    definitions to supports functions with variable number of arguments

    <time.h>
    manage system and process time
    measure execution times
  • 7.9 Some more keywords

    7.9.1 static

    The static keyword, in C, has two meanings, depending on where the static variable is declared. Outside a function, static variables/functions are only visible within that file, not globally (cannot be extern’ed). Inside a function, static variables; are still local to that function; are initialized only during program initialization and; do not get reinitialized with each function call i.e. retains last value it had when function last returned.

    7.9.2 extern

    For functions, you can put function prototypes in a header file. For variables: re-declare the global variable using the extern keyword in header file. The extern keyword informs the compiler that the variable is defined in another code module and enables access/modifying of global variable from other code modules. This code can be compiled separately and linked with the code file containing main(). Again see: Lab 6 exercise for use of static and extern.

  • 7.10 More about make

    A more generalised Makefile for multi-module code is available in any issued code using hexdump(). This will build projects containing .h and multiple module files and sets up the necessary dependencies. You only need to edit the first 3 lines!
    it will compile the code to .o files and then link them to create a separate executable
    all .o files are dependent on .c and .h files
    the executable is dependent on the .o files
    notice the .o files remain after building (but you could fix this...)

    For more see: 🔗 http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
    I have based the Makefile on “Makefile 4”

    Prerequisites for next time
    All of Lab 6
    use of maths library
    usage of math.h and linking maths library
    how to specify linked libraries in a Makefile
    review of “hexdump” Makefile and how to use/edit it for multi-module programs
    use of extern and static
    creation and use of static and shared libraries