-
C Programming Parts 1 and Part 2
- In this week, we will cover the following topics:
- Introduction to C (Part 1 and Part 2)
- C Datatypes
- Arrays and Strings
- Casts and Sizeof
- Terminal I/O
- Function Scope
- Intro to Pointers
-
5.1 C programming: introduction
5.1.1 Acknowledgements
C Programming Notes are based on two sources:
1. MIT Open Courseware 6.087
I have abridged, added to...
2. Zed Shaw’s “Learn C the Hard Way”
content from this:
the original web site is available from GCULearn (can cut and paste into editor)
in the library
you should be working through this as directed in the labs
called “LCTHW” throughout the slides
-
5.1.2 What is C?
C is a widely used programming language that offers efficiency/performance and low level access. As a lower-level language (than e.g., C#, Java), this usually means faster code, if used correctly.
- Features of C:
- few keywords
- structures, unions – compound data types
- functions - units of code decomposition
- pointers – memory, arrays
- external standard library – I/O, other facilities
- compiles to native code
- macro pre-processor
- Versions of C (evolution over the years):
- 1972 – C invented (Dennis Ritchie AT&T Bell Labs)
- 1978 – The C Programming Language published; first specification of language
- 1989 – C89 standard (known as ANSI C or Standard C)
- 1990 – ANSI C adopted by ISO, known as C90
- 1999 – C99 standard
- mostly backward-compatible
- 2011 – C11 current standard of C language
- ISO standard (iso9899:2011)
- default for cc (and thus gcc)
- cc supports earlier C versions through command line switches
In this course: C11 (the default).
- What is C used for?
- To program systems, where ‘systems‘ are:
- OSs, like Linux
- microcontrollers: vehicles and airplanes
- embedded processors: phones, portable electronics, etc.
- DSP processors: digital audio and TV systems
...
C and related languages
- C has been extremely influential...
- Derivatives of C: C++, Objective C, Java, C#
- Also influenced the development of Perl, Python
- However, C lacks:-
- exceptions
- range-checking
- garbage collection
- object-oriented programming
- polymorphism
C is not object-oriented. Therefore, unlike Java, for example it lacks: classes, objects, inheritance, methods (functions are not methods), references (uses pointers), access modification (private, protected etc), collections and iterators, namespaces, no interfaces. C++, a superset of C, added object-orientation.
- Warning: as a low-level language, C is inherently unsafe: no range checking; limited type safety at compile time; no type checking at runtime. Therefore, when using C, you must handle with care:
- Always use code analysers like valgrind, cppcheck and splint
- Always run in a debugger like gdb (more later. . . )
- Never run as root.
The flip side of being inherently unsafe this is you can do anything you want and especially you have access to raw memory contents.
-
5.1.3 Writing C Programs
Of course, for C you can use an IDE...on Linux it is typically Eclipse, but you could use Visual Studio on Windows for C and if you had 20+ C modules i.e. separate code files in your project with multiple programmers in a professional environment you would want to use an IDE.
However this module is about learning C and systems programming. To learn this properly you need to have experienced "bare-metal" C programming at the command line then you progress to IDEs when you know what you're doing! To get started, at this point LCTHW Ex 0 is a good place to look!
- Some of the tools we need to develop programs will include:
- cc: the compiler, often symbolic link to gcc. Source code (.c file extension) is converted to object code (.o file extension). then to either a compiled code module (with no main() function) or to an executable.
- Example of how the command line might look to run cc:
- cc -Wall hello-clean.c -o hello-clean
- –Wall (enables compiler warnings).
- Make.
- Indent.
- Gdb.
- Diff.
- git (or other versioning tools).
- More complicated forms of using cc exist:
- multiple source module files (see later)
- auxiliary directories
- optimisation,
- linking of other compiled code modules and libraries (see later)
Embed debugging info and disable optimisation:
cc -g -O0 -Wall hello-clean.c -o hello-clean
…capital 'O' followed by a zero you should use this form, but we will be using make and don’t need to type this in all the time (see later).
Note: you are learning C as a second language, so we assume that you know how to program. Again, we will sometimes refer to "Learn C the Hard Way" (LCTHW) exercises; the onus is on you to then go and follow these exercises in your own time.
-
5.1.4 C programs: notes on syntax and structure
Structure of a .c files (examples provided for labs <simple-IO.c, hello.c, pay.c, series.c> )
General Structure of a C program /* Begin with comments about file contents */ /* Or // to comment a single line to its end */ Insert #include statements and pre-processor definitions Function prototypes and global variable declarations Define main() function { Function body } Define other function { Function body }
Defining Variables: Variables must be defined before use. For example:
int n; float phi; char name[20];
Functions: C does not have classes and therefore does not have methods. Instead C has functions. Functions are units of C code composition, defined either in the same code module file as they are used or in another code module or in a code library they may be compiled separately and you may not have the source code available at all you need the function prototype whose general form is:
return_type function_name(arg1_type, arg2_type, ...);
… for example:
int factorial(int n); int factorial(int);
… which needs to be declared before the actual function whose general form is:
function definition(variabletype, variabletype …){ define local variables; program statements; }
…for example:
int factorial(int n){/*code here*/}
Function prototypes for many common functions are often in header files i.e. those for the C Standard Library. You can get away without using function prototypes if the functions are defined before use in the same module file i.e. before main(, but this is a trivial special case and function prototypes should normally be used (see series.c where there are no function prototypes but there is one in hello.c).
Functions must match function prototype (if there is one) and within the function prototype declaration, variable names don’t have to match (and may be omitted in the prototype). variables defined inside a block exist only in that block.
N.B. The syntax in terms of parameters and local variables is not different from Java and, therefore, should be familiar to you.
Function and memory: On a function call a copy of the actual parameters is made on the stack, which exists (the same as the function’s local variables) for the lifetime of the function call. Thus all memory allocated for actual parameters and local variables will be destroyed when the function call ends unless you explicitly allocate it off the heap (see later). Consider in C that if the parameter is something large (e.g. an array) it may be a substantial overhead or infeasible to create a copy; in C this is handled by passing in the address of the item (a pointer) instead. We will return to this when we discuss pointers.
Commentary on the <hello.c> main()function.
puts(): output text to console window (stdout) and add a newline. or alternatively, and in most cases better, use:
printf()
String literals: written surrounded by double quotes. Therefore, “this is a string literal”.
return 0; exits the function, returning value 0 to caller, generally taken to mean false, although there is no boolean data type in C. Non-zero values, in comparison are taken to mean true; notice this is consistent with the return type of main(), which is <int>.
C ‘definitions’ and ‘declarations’: In C, a declaration introduces an identifier and describes its type, be it a variable or function. A declaration is what the compiler needs to accept references to that identifier. A variable can be declared as often as you want, but a declaration does not allocate storage for the variable. A function prototype is an example of a declaration. A definition instantiates/implements this identifier (the declaration). It's what the linker needs to link references to those entities. an identifier is defined exactly once a definition can be used in place of a declaration a function definition a variable definition also allocates storage for the variable (but note there is more to this with pointers).
String data types: Strings //Strings stored as specially formatted (null terminated) character arrays. The last character in the character array is '\0' or NULL, not written explicitly in string literals. A simple string definition: char str1[] = "hello"; //To pass a string into a function as a parameter a function prototype might look like: void myfunc(char ∗inputString); //The ∗ denotes a pointer or address. The actual function call: myfunc(str1); //more about this later but it is the address of the first element of the array //Special characters specified using \ (escape character): \\ – backslash, \' – apostrophe, \" – quotation mark \b, \t, \r, \n – backspace, tab, carriage return, linefeed //the \n very common... \xhh –hexadecimal ASCII character codes, e.g. \x41 – 'A' //You can use puts()to print out strings, however printing anything out is better managed with printf().
Strings are frequently used within programs to receive terminal input and produce terminal/console output. Please, in your own time view the following Lynda.com videos on strings, which also provide examples in terms of console I/O:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Course: C Essential Training
Module 4: Strings
Titles: Understanding strings
Using screen-based input and output.
Manipulating strings~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pre-processor macros (‘#’): The C Preprocessor is really a separate program which is invoked to process the source file before it is handed on to the main compilation process. What happens is that a set of macros which appear inline with the source code are expanded first. This enables the inclusion of header files, definition of simple constants and conditional compilation of set areas of code to some criteria.
#include
Header files: containing already defined constants, functions and other declarations. The #include <stdio.h> is the most common of the macros, which can be interpreted to mean “read the contents of the header file stdio.h”, which contains the standard I/O functions for console and files I/O. Note the <> brackets. This indicates to search in the C Standard library for the file stdio.h. Frequently used header files are: ctype.h, math.h, stdlib.h, string.h, time.h. For user defined header files, the “” syntax means search from the local directory for, e.g., “myheader.h”.#define
The main use of #define is to define constants, e.g., < #define PI 3.14>. This is an inline replacement – not a C language construct many system constants are defined this way. There is also a const keyword in C, and also more macros, but we will we will discuss these later.C Coding Style: There are several common styles of C code, including the GNU style, the Kernighan & Ritchie used on this course, which can be passed over a given .c file in the following way, from the command line:
indent -kr program.c
By convention variable and function names are all lowercase you may use an underscore to break up variable names for readability constants are all uppercase again with an underscore as necessary. Global variables should start with “g_” to make them easily identifiable.
*TIP* Some tools of the trade
Compiling and Debugging our Code. C is difficult - even when it compiles it often has memory leaks, crashes, doesn’t do what you want...use the tools and reduce the problem space...learn the bash history commands to quickly re-execute commands implement one small feature at a time - debug then move on... format code with indent so it’s easier to read and thus debug use make (see below) to simplify the compilation process (make sure -Wall is in the cc flags to get all the warnings and address these before proceeding further use valgrind to check memory usage - see LCTHW Ex2. Use cppcheck and splint try running them against your source code although you get a lot of false positives with splint use gdb to debug at runtime.
*TIP* Some tools of the trade (make)
The make tool manages real software development builds for multiple source code files and libraries a set of rules making sure everything is built in the correct order with the right compiler switches i.e. handles build dependencies some rules implicit, others you define links all the compiled modules together to form new executable only compiles what has changed since the last build uses timestamps for this so touch can be used to force recompilation define a Makefile for your project and type make a Makefile has its own syntax and uses tabs for whitespace! LCTHW Ex2 but use my Makefiles we will expand on this later but use make!
-
5.1.5 Over to you
- You should complete the following for next time:
- All of Lab 3 codes provided, built and run.
- Learn C the Hard Way
- Ex1 through Ex7
- you can look ahead to Ex10-14 for: for, while, if, switch
- the syntax of these constructs is much the same as Java and C# so you should be able to just pick these up quickly...
- Written your own programs as instructed in Lab 3 - built and debugged.
- Used the development tools described above in this lecture and established your own development/build/debug workflow for C on Linux.
- Know about basic C program structure, printf, variables and basic types available, basic control structures as above, basic ideas about functions.
-
5.1.6 Summary
- We covered:
- How to edit, compile, and debug C programs
- including simple usage of cc and make
- C programming fundamentals:-
- comments
- #include and #define
- the main() function
- declaring and initializing variables, scope
- using puts() – calling a function and passing an argument
- returning from a function
-
5.2 C Programming part 2
5.3 Prerequisites
- This must all have been tried as practical work on Linux in the labs.
- All of Lab 3 including the following.
- Learn C the Hard Way
- Ex1 through Ex7
- you can look ahead to Ex10-14 for: for, while, if, switch
- Built, run and understood the issued Lab 3 programs.
- Written your own programs as instructed in Lab 3 - built and debugged.
- Used the development tools described in this lecture and established your own development/build/debug workflow for C on Linux.
- Know about basic C program structure, printf, variables and basic types available, basic control structures as above, basic ideas about functions.
-
5.4 C Datatypes
5.4.1 Numeric, non-numeric
- C has a small family of datatypes.
- numeric (int, long, float, double...). Integer types can be optionally unsigned thereby effectively doubling their positive range.
- characters (char). No given String type, however; only specially formatted arrays of char type.
- User-defined (struct, union, enum).
- Arrays.
Conversions between data types are allowed and forced casting, mechanisms that qualify C as a ‘weakly types language’.
-
5.4.2 Sizeof
The different numerical types offer the usual tradeoffs in memory usage vs. range/precision. The size of these types will be architecture dependent. The function sizeof(), will return the length in bytes of a type:
the return type is size_t. <stdlib.h> (and <stddef.h>), which represents the memory size of a type or variable in bytes. An unsigned int. However, for compatibility on different architectures with different maximum sizes, has been abstracted away from this to have its own type i.e. size_t.
- Example sizeof usage:
- sizeof(char): returns 1
- sizeof(long): returns 4 on 32 bit architectures but 8 on 64 bit!
In Lab 4 program directory: <sizeof-simple.c>
First of several programs using hexdump()implemented in a separate code module to print out memory contents.
uses a new Makefile
sizeof-simple.c prints out the sizes of basic types and their memory contents.
Some of these values are architecture dependent.
e.g. size of pointer on a 32 bit architecture is 4 bytes
but size of pointer on a 64 bit architecture is 8 bytes -
5.5 Arrays and Strings
The basic syntax and usage for arrays for simple types (char, int, float etc.) is the same as Java and C#; with the standard terminology of element and index used; multi-dimensional arrays are also possible again with familiar syntax and use. However unlike Java and C# it is critically important to understand in C how arrays and strings are implemented. On definition arrays are allocated a contiguous area of memory equal to size of data type * number of elements; e.g. 10 integers at 4 bytes each = 40 bytes. In C, there is no bounds checking on arrays; your code can easily wander beyond the end of the array into other memory areas doing whatever with no warning!
You can define arrays like this with the allocated memory area size explicitly defined.
int int_array1[5];
Alternatively you can define them to fit given literal data exactly and it will also thereby be initialized.
int int_array2[] = {0, 1, 2, 3, 4};
Be careful: as there is no bounds checking careful coding is necessary to ensure any processing or changes remain within the allocated memory area for strings the area allocated implicitly with string literals will include an extra character for NULL which is then automatically added.
In Lab 4 program directory: sizeof-arrays
Shows allocated memory for various array and string definitions.
And also memory contents using hexdump()...
Beware the use of sizeof() with arrays of pointers!There is no formal “string” type in C. Strings are simply character arrays (char[]) with a special character delimiter (NULL sometimes written '\0') following the character data, i.e. null-terminated strings all strings must have this format; if '\0' is missing string handling will not work. printf, fgets, strcmp, strcpy... all expect null-terminated strings.
Definition:
char str[] = "I am a string."; char str[20] = "I am a string.";
Creating strings with literals like above will automatically add the '\0' and in the first example space will be allocated for it i.e. 15 bytes exactly, but in the second example you must ensure there is sufficient space in the array for it (even though here some space at the end will not be used).
-
5.6 Typecasting
Casting (or typecasting) converts a variable from one data type to another data type. explicitly e.g:
mean = (double)sum/count;
a cast has the highest precedence of operators implicitly e.g. integers to characters (giving ASCII values of chars). Notice, this is used in the right side of an assignment expression so the original variable’s type is unchanged.
Lab 4 casts program
Go read: 🔗 http://www.tutorialspoint.com/cprogramming/c_type_casting.htm
Pointers are usually cast when the memory is allocated off the heap.
When variables are promoted to higher precision, data is preserved. This is automatically done by the compiler for mixed data type expressions:
float f; int i; f = i + 3.14159; /∗ i is implicitly promoted to float, i.e. f =(float)i + 3.14159 ∗/
Another conversion done automatically by the compiler is char to int. This allows comparisons as well as manipulations of character variables
isupper = (c >= 'A' && c <= 'Z') ? 1 : 0; // c and literal constants are converted to int ASCII values if (!isupper) c = c − 'a' + 'A'; // subtraction is possible because of integer conversion
As a rule (with exceptions), the compiler promotes each term in a binary expression to the highest precision operand. If you are in any doubt: do an explicit cast it as it will do no harm!
-
5.7 Terminal (Console) I/O
C was not initially intended as an application programming language with interactive program usage. It was intended as a systems programming language with input mainly from command line arguments and files. Consequently, keyboard input handling is low-level and complex for beginners.
5.7.1 Output
For formatted stdout use printf(): format string allows complex text to be created from multiple types easily. The function fprintf() and sprintf() are variants send text to a file and to a string respectively but same format string syntax.
Alternatively puts() simply writes a string to stdout up to but not including the null character; a newline character is appended to the output. Furthermore, putchar() is also sometimes useful to print a single character.
Use perror() for error message output to stderr a custom message is printed before the system error message itself on the same line; a newline character is appended to the output.
-
5.7.2 Input
This is not as simple. There are many options with subtly different semantics to read from stdin:
scanf(), getchar(), gets(), fgets() ...
C is dealing with all the actual characters in the input stream including white space - these functions all handle it in different ways and this is what makes it complex.
- Advice: use fgets() as in Lab 4 string-io program easy and predictable to deal with and handles spaces aids validation as numbers can be read in with this as text and then easily converted if OK to appropriate type
- gets() not used now due to security concerns (buffer overrun)
- fgets() can also easily be used to read from text files formatted with newline breaks
-
5.8 Variable scope
The scoping rules are familiar. Local variables in a function and the function’s parameters are scoped to the lifetime of the function call. These are created on the stack. Note that any code block { } e.g. in a for loop, also has its own local scope, i.e., local to that specific block of code.
Variables declared outside of a function are globals (avoid, if possible) and persist throughout life of program; by “global” it means just global to this code module file and can be accessed/modified in any function. There are no special keywords for global, just implicit according to context. However, there are also static and extern variables (see later).
-
5.9 Pointers
Normal variables hold a value whereas Pointer variables hold the address of where a value is. This is therefore a form of indirect addressing and the size of a pointer variable is the size of an address in the CPU architecture, most likely 64 bit, but some legacy 32 bit CPUs and Oss (see sizeof example code). Be aware that C programming is impossible without a sound understanding of pointers.
Pointers can be used for any address in the current process address space. This includes dynamic memory allocated off the heap, which we haven’t used yet, but also for any other memory area with the same syntax.
- Pointers are used extensively for:
- arrays, strings, structs, command line arguments, parameter passing, dynamic data types, a primitive anonymous function mechanism, systems programming APIs...
Pointers are typed: you are typing the value that the pointer points at, which helps improve the typing integrity of the language, which isn’t great anyway. Pointers in this context can be cast.
When we obtain the value of what’s pointed at, this is called de-referencing the pointer we usually do not care about the address the pointer contains and (certainly with memory off the heap) it may have different values with each program run pointers can be used to construct dynamic data-structures e.g. linked lists and trees.
-
5.9.1 Addressing Variables
Every variable residing in memory has an address. Normal (non-pointer) variables have addresses, which can be obtained using the & operator. Pointers use the ∗ operator for dereferencing, i.e. for a given a pointer variable this obtains what it points at.
See Lab 4 pointers program.
How do we define a pointer?
int ∗pn;
pn holds the address of where an integer is held
How do we use a pointer (or dereference it)?
∗pn = 52;
∗pn holds the value of that integer
Notice the pointer type and value match up, i.e. dereferenced pointers are like any other variable of that type.
Unfortunately C uses ∗ twice which should be read differently:-
int ∗pn; is a pointer definition
∗pn = 52; is a dereference
How to find the address of a normal variable with & ?
int n = 4; double pi = 3.14159; int ∗pn = &n; // address of integer n double ∗ppi = π // address of double pi
Note also by convention we write (although these are syntactically the same):-
double ∗ppi = π and not:-
double∗ ppi = π
-
5.9.2 Dereferencing Pointers
I have a pointer – now what? Accessing/modifying addressed variable: dereferencing/indirection operator:
// print "pi = 3.14159" printf("pi = %f\n", ∗ppi); // pi now equals 7.14159 ∗ppi = ∗ppi + ∗pn;
-
5.9.3 Casting Pointers
C can explicitly cast any pointer type to any other pointer type:
// pn originally of type (int ∗) ppi = (double ∗) pn;
Implicit cast to/from void∗ also possible (more later... ) a generic pointer Dereferenced pointer has new type, regardless of real type of data. It is possible to cause segmentation faults, other difficult-to-identify errors What happens if we dereference ppi now?
-
5.10 Prerequisites for next time
- Learn C the Hard Way
- Ex8 through Ex14
- Used basic C datatypes and casts
- Understand 1-dimensional arrays
- Used simple strings and fgets()
- Understand basic memory allocation and layout viewed with hexdump()
- Used cppcheck and valgrind
- Understand basic pointer usage
- Have by this point used in C: if, for, while, switch, 1 and 0 for true and false