-
C Programming Part 3 and Part 4
- In this week, we will cover the following topics:
- C Programming Part 3
- Arrays and pointers
- Strings and string utilities
- Dangling pointers
- Layout of process memory allocation
- Text, Code, BSS
- The Heap
- The Stack
- Using malloc() to allocate dynamic memory off the heap
- C programming Part 4
- struct
- typedef
- Pointers to structs
- Arrays of structs
- Dynamic Memory and Linked Lists
- Miscellaneous Topics:-
- arrays of pointers
- command line input
- text files
- binary files
- enum
- bitwise operators
-
6.1 C programming part 3
6.1.1 Prerequisites
- This must all have been tried as practical work on Linux in the labs. All of Lab 4 including:
- Learn C the Hard Way
- Ex8 through Ex14
- Used basic C datatypes and casts
- Understand 1-dimensional arrays
- Used simple strings and fgets()
- Understand basic memory allocation and layout viewed with hexdump()
- Used cppcheck and valgrind
- Understand basic pointer usage
- Have by this point used in C: if, for, while, switch, 1 and 0 for true and false
-
6.1.2 Arrays and Pointers
Primitive arrays implemented in C use a pointer to a block of contiguous memory. Consider array of 10 ints:
int arr[10];
variable arr can now also be used as a pointer of type int ∗. The name of an array is a constant pointer to the first element of the array. Thus, accessing arr using array entry operator:
int a = arr[0];
…is the same as :
int a = *arr;
The variable arr is really a pointer to element 0 of the array! However, pointers to arrays are constant pointers; you cannot change what they point at, otherwise you would lose the means of access to the array. The solution is to use a temporary pointer for further processing and assign the address to that:
int *pa = arr;
same as:
int *pa = &arr[0];
6.1.3 Pointer Arithmetic
Suppose that:
int ∗pa = arr;
The pointer is not an int, but can add (or subtract) an int to a pointer, such that:
// points to arr[i] pa + i
where the address value increments by i times the size of data type. Be careful - this is at first glance a surprise! The method sizeof()is used internally to determine size of data type, e.g., suppose arr[0] has address 100. Then arr[3] has address 112 because sizeof(int) returns 4. Consider also:
int *pa = arr; pa++; // pa now has the address 104.
-
6.1.4 Strings as Arrays
Strings stored as null-terminated character arrays (last character == '\0'). Suppose:
char str[] = "This is a string.";
and
char *pc = str;
Manipulate string as an array (as that’s what it is):
*(pc+10) = 's';
Notice however: that <sizeof(str) == 18> due to the extra size required for the terminating character '\0', but <strlen(str) == 17> because it does not count the '\0'.
Null Pointers and their use in Strings
A null pointer has a value that does not reference anything. In C NULL is a macro and will be the integer value 0 (zero). Therefore, it can be interpreted as "false" in if and while etc. de-referencing a null pointer will likely crash the process also de-referencing an invalid pointer will crash the process often. A value of NULL, expressed as a character as '\0', is used as the delimiter following the end of the characters making up a string. Therefore, strings are simply character arrays with NULL at the end of the data content hence the phrase “null-terminated string” unlike Java and C# string implementation is visible.
- strcpy(): Used to create a copy of a string
- strcmp(): Used to compare two strings
- strlen(): Used to determine the length of a string
Note, these are a few of the more frequently used functions. Many other string utility functions exist...
see Lab 5 ‘strings’ program for code examples
6.1.5 Impact of Pointers on Function Parameters
- As we know a copy of the actual parameters is created on the stack for the lifetime of the function call, which is called call by value and from a function we also only have a single return value, it is sometimes necessary:
- to return more than one output
- to make changes to variables persisting beyond the lifetime of the function call, but not by using global (or static) variables. For example,
- to pass in large data items where it is inefficient to create a copy on the stack.
The answer is to use a pointer as a formal parameter (or return type) where a copy of the pointer will be created on the stack, but this is just a copy of an address - the address remains the same, pointing at the same value this is called call by reference. For example, if you want to modify variables in a caller to swap them, you can use call by reference with pointers:
// swap function void swap(int *x, int *y) { int temp = *x; *x = *y; *y = temp; } // swap function call example int a = 5, b = 7; swap(&a, &b); // now, a == 7, b == 5
… notice we have also effectively passed back 2 outputs.
To purists everything in C is call by value. An actual parameter is a copy of the variable passed in, which exists for the lifetime of the function call; when we pass in a pointer it is a copy of an address and it is only the effect of this that is call by reference.
Arrays: in C used as parameters effectively exhibit call by reference. No copy of the array is created on the stack you are always passing in a pointer regardless of syntax used changes to the array will persist beyond the lifetime of the function call there is no call by value behaviour for arrays in C.
Structs: (see C Programming Part 4) used call by value, but you can pass in a pointer to a struct to effectively have a call by reference mechanism.
-
6.1.6 Pointers to Constants
There are times when the ‘const’ keyword is used. For example:
void print_str(const char *str1)
Constants cannot be changed, in this example the *str1, and when this is the intention, it is wise to use ‘const’ to add this constraint explicitly, ensuring that str1 cannot be altered by mistake (or maliciously).
6.1.7 Avoid Dangling Pointers
The phrase ‘dangling pointer’ referes to a pointer that points to inaccessible data. Consider the following example:
int f(void) { char *p; p = (char *) malloc(8 * sizeof(char)); … return 0; } int main(){ f(); }
The memory allocation is made, although once the function returns, there’s no way to get at it as p is a local out-of-scope variable when the function f() returns. The compiler will, however, produce warnings to signal this situation. You could use static, global variables, declare pointer elsewhere and pass in as parameter, get back as return value etc. ...
-
6.1.8 Global and Static Variables in C
Global variables have scope throughout the code module and lifetime throughout the process. They are defined outside main() at top of code, often below any header file #include<> lines. Static variables have lifetime throughout the process, and if defined inside a function scope maintain value between function calls. These features should not be used if there are alternatives to achieving the same functionality. The static keyword is used as a prefix in these cases:
static int somePersistentVar = 0;
6.1.9 What to Store
We will now discuss the memory layout of a process, i.e., an instance of a executing program (for relevant code, see Lab5: mem-allocations.c).
Code and Constants: Executable code and constant data include program binary, and any static libraries it loads. The OS knows everything in advance; amount of space needed; contents of the memory etc. This is known as the Text or Code segment.
- Static Data: Variables that exist for the lifetime of a process, i.e., global variables and static local variables, the only difference between them being scope, and the amount of space required is known in advance:
- Data: initialized in the code
- initial value specified by the programmer
- e.g., int x = 97;
- memory is initialized with this value
- BSS: not initialized in the code
- initial value not specified
- e.g., “int x;”
- all memory initialized to 0
- BSS stands for “Block Started by Symbol”
- an old archaic term (not important)
Dynamic Memory: Memory allocated while process is running, e.g., allocated using the malloc() function and deallocated using the free(). The OS knows nothing in advance; amount of space, contents of dynamic data-structures (linked lists, trees etc). Therefore, room to grow is needed in an areas known as the Heap.
- Temporary Variables: created the during lifetime of a function or block, including storage for the function, actual parameters and local variables and current line number of the function. This is need to support nested function calls to store the variables of the calling function, and know where to return when function completes. Therefore, again, room to grow is needed in an area known as the Stack where some often-used terms are:
- ‘push’ on the stack as new function is called
- ‘pop’ off the stack as the function ends
every thread has a separate stack, and we will cover more on threads later in the course.
- Process: instance of an executing program including Text, Data, BSS, Heap and Stack.
- Text: code, constant data
- Data: initialized global & static variables
- BSS: uninitialized global & static variables
- Heap: dynamic memory
- Stack: local variables
The memory layout for a process is shown in the following figure: //Data char *string = "hello"; //BSS int iSize; char *f(void) { //Stack char *p; iSize = 8; //Heap p = (char *) malloc(iSize); return p; }
-
6.1.10 Dynamic memory
Memory allocation and release: How, and when, is memory allocated? Global and static variables are allocated at process startup. Local variables allocated at function call. Dynamic memory is allocated according to use of malloc() family of functions.
How is memory deallocated Global and static variables are deallocated when the process finishes. Local variables are deallocated when the function returns. Dynamic memory is deallocated according to the use of the free() function, and it is good practice to deallocate memory effectively, in order help avoid memory leaks.
For examples, and further discussion, of memory allocation and deallocation, using the malloc() and free() function, Watch the following video on Lynda.com:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Course: C Essential Training
Module: 11 Pointers and Memory Management
Title: Managing memory using allocation and release~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Memory Initialization:
- Local variables and memory allocated with malloc() have undefined values:
int count; char* p = (char *) malloc(8);
If you need a variable to start with a particular value, use an explicit initialiser:
int count = 0; p[0] = '\0';
Global and static variables are initialised to 0 by default, equivalent statements being:
static int count = 0; static int count;
Why are stack and heap memory allocations undefined? Because it might be memory that’s being reused. The memory on the stack might have been used during other function calls (and most likely it has...). The heap could be allocating previously used (and freed up) dynamic memory, it depends - it might be a fresh memory page and in which case it might be all zeros - or it might not...why bother to go to the overhead of zeroing previously used memory? Basically you don’t know and it is your responsibility to use it properly.
The following diagram demonstrates Heap Dynamics. -
6.2 C programming part 4
6.2.1 Structures
Definition: A structure () is a collection of related variables (of possibly different types) grouped together under a single name. This is a an example of composition – building complex structures out of simpler ones. In a sense, structures are a bit like a primitive Java/C# class with just the instance variables.
The variables declared within a structure are called its members or fields. Variables can be declared like any other built in data-type. Access to members is achieved using the familiar '.' syntax as with objects in Java/C#. Initialization can also be achieved thus:
struct point ptA = {10,20};
The assignment operator copies every member of the structure, and this applies to function parameters; unlike arrays structs are passed call by value, so it may be a substantial overhead to copy structures onto the stack; passing in a pointer to a large structure may be a more appropriate alternative.
Example Structures
struct point { int x; int y; }; struct rectangle { struct point topLeft; struct point bottRight; }; struct chain_element { int data; struct chain_element* next; };
Individual members can be accessed using the '.' operator, and if structure is nested, multiple '.' are required:
struct point pt = {10,20}; int x = pt.x; struct rectangle rect; int tlx = rect.topLeft.x;
-
6.2.2 Type definitions
The typedef keyword is used in C to simplify and cleanup source code that uses struct a lot, although all it adds is a simplified syntax and no extra functionality:
See:
🔗 http://stackoverflow.com/questions/252780/why-should-we-typedef-a-struct-so-often-in-c
and:
Lab 5 structs program
Lab 5 linkedlist program
Extensive examples to illustrate structs and typedef:
6.2.3 Pointers to Structures
Structures are copied element wise. For large structures it is more efficient to pass pointers. For example:
void foo(struct point ∗pp); struct point pt; foo(&pt);
Members should be accessed from structure pointers using '->' operator:
struct point p={10,20}; struct point *pp = &pt; /*Changes p.x */ pp->x = 30; /*copies value of pp.y*/ int y = pp->y;
… and there are some other ways to access structure members:
struct point p={10,20}; struct point *pp = &pt; /*Changes p.x */ (*pp).x = 3; /*copies value of pp.y*/ int y = (*pp).y;
-
6.2.4 Structure arrays
The following code examples should become familiar.
Declaring arrays of int:
int x[10];
Declaring arrays of struct:
struct point p[10];
Initializing arrays of int:
int x[4]={0,20,10,2};
Initializing arrays of structs:
struct point p[3]={0,1,10,20,30,12}; struct point p[3]={{0,1},{10,20},{30,12}};
Initializing arrays of structs alternatively:-
struct point p[3]; p[0].x = 0; p[0].y = 1; // etc.
6.2.5 Dynamic Memory Allocation: further notes
The code <void ∗malloc(size_t n)> allocates blocks of memory from the heap and returns a pointer to an non-initialized block of memory on success returns NULL on failure. the returned value should be cast to the appropriate type be careful: there is no way to get how much memory was allocated back later unless you retain it somewhere! Always use malloc() with sizeof(). For example:
int ∗x = (int ∗) malloc(n ∗ sizeof(int));
where n is number of ints you want. Remember the cast as this is how you use malloc()properly.
The code void ∗calloc(size_t n, size_t size) allocates an array of n elements each of which is "size" bytes initializes memory to 0. The code <void free(void ∗)> frees memory allocated with malloc(). Again, refer back to Lynda.com video refered to above.
-
6.2.6 Linked Lists in C
A linked list is a dynamic data structure that consists of a sequence of records where each element contains a link to the next record in the sequence –
See:
Lab 5 linkedlist. A great example as it ties together pointers, structures and dynamic memory allocation from the heap...
note use of free() to release dynamic memory- Linked lists can be singly linked, doubly linked or circular. We will focus on singly linked list. Every node has a payload and a pointer to the next node in the list. The start (head) of the list is maintained in a separate variable. End of the list is indicated by NULL (sentinel)nas show in the following diagram:
6.2.7 Pointers to Pointers
Pointer variable holds address as before - but the content of that address location is now another address i.e. a pointer rather than an int etc. Can address location of address in memory – pointer to that pointer.
int n = 3; /* pointer to n */ int ∗pn = &n; /* pointer to address of n */ int ∗∗ppn = &pn;
Many uses in C: pointer arrays, string arrays.
What's the difference between these two functions?
void swap (int **a, int **b) { int *temp = *a; *a = *b; *b = temp; } void swap (int *a, int *b) { int temp = *a; *a = *b; *b = temp; }
-
6.2.8 Pointer Arrays
Pointer arrays are arrays of pointers. Beware: the following only allocating memory for an array of pointers, not what each pointer addresses.
//an array of pointers to integers int ∗arr[20]; //an array of pointers to characters char ∗arr[10];
Pointers in an array can point to arrays themselves.
// an array of char arrays/strings char ∗strs[10];
//equivalent array of pointers to strings. char ∗more_strs[]; and char ∗∗more_strs
Arrays of Strings
An array of strings, each stored as a pointer to a null-terminated array of chars. Each string may be of different length:
// length = 6 char str1[] = "hello" ; // length = 5 char str2[] = "ciao" ; char str_array[] = {str1, str2};
Note that str_array contains only pointers, not the characters themselves! Try:
sizeof(str_array)
6.2.9 Command Line Arguments
Use of command line argument values is very common in a Linux systems programming environment. So far, we have used int main(void) to invoke the main function. However, the main function can take arguments that are populated automatically when the program is invoked and these can then be picked up in the program code (See LCTHW Ex10 and Ex11 and Lab 5 argv program.)
The main parameters as interface to command line
int main(int argc, char ∗argv[])
- argc: count of arguments
- argv[]: an array of pointers to strings containing each of the arguments
- all this is populated automatically
- note: the arguments include the name of the program as well
Examples
./cat a.txt b.txt
argc=3, argv[0]="cat" argv[1]="a.txt" argv[2]="b.txt"
./cat
argc=1, argv[0]="cat"
-
6.2.10 Pointers to Functions
Decide ‘at runtime’ to call a specific function. Useful to implement callbacks (event handlers). Pass pointer to callback function to another function - when complete, result passed to callback function for further processing or signal (interrupt) handlers.
- Further reading:
- LCTHW Ex 18 is a good example
- A concise description is at: http://www.newty.de/fpt/index.html
6.2.11 File I/O
Files are sequences of bytes. The programmer is presented with a stream abstraction as we know they are really in blocks but the OS hides this. File systems are usually stateful. That is information is maintained about open files, which you have to open and gracefully close. Reading and writing progress within the file is recorded by a file pointer; when you open a file this will be set to the start of the file; as you read/write bytes this is automatically incremented by how many i.e. it is an offset relative to the start of the file; it can also be explicitly moved to a given offset with a seek.
The FILE structure holds information about streams. The primary parameter to pass to the file library functions is defined in <stdio.h>, and is valued by fopen(), given a file path. The FILE structure contains: file descriptor numbers; the actual file pointer value; various status flags; information about buffering. File i/o is largely used opaquely in code and just passed to other library functions. e.g. fgets().
Opening a file can be achieved by using:
FILE ∗fopen(char name[],char mode[])
Where the mode can be "r" (read only),"w" (write only),"a" (append) among other options. "b" can be appended for binary files. fopen() returns a pointer to the file stream if it exists or NULL otherwise. We don’t need to know the details of the FILE data type. Furthermore, remember that stdout, stderr and stdin are files and they too can therefore be read from/written to with FILE.
For text files which have lines (or records) of text, punctuated by regular newline characters, fgets() can be used largely as we used it with terminal input. The analogous fputs() can be used to write to files. Ensure that files are opened with fopen() and the correct mode, and eventually closed with fclose().Be aware that file I/O is buffered and fclose() will flush buffers, but there are other options to do this. (see Lab 5: files/text-files directory).
Binary File I/O: Use fread() and fwrite(). Details of the parameters are in the manual. Ensure that files are opened with fopen() with the correct mode, and eventually closed with fclose(). The function fseek() allows the file pointer offset to be set. (See example code: Lab 5: files/binary-files directory, and notice unsigned char used for bytes).
For efficiency this is best done blocks at a time if possible, e.g. copying files, but efficiency in this respect depends on the application. You can use use BUFSIZ, which is optimised to the buffering sizes in the kernel (see copyfile example also in Lab 5: files/binary-files directory)
End of File: Some functions use the EOF macro to indicate that end of file has been reached. EOF is usually -1 as this value can’t be mistaken for data content. For example:
while(getchar()!= EOF)
However:
fread() returns size_t, typically 0 at end of file.
fgets() returns NULL at end of file.
-
6.2.12 Enumerated types
- Please see Lab 5 enum program. In terms of enumerator types, defined by the enum keyword:
- Definition similar to struct.
- Implemented as integers so ‘==’ and ‘=’ can simply be used for comparison and assignment.
- See: 🔗 http://www.cs.utah.edu/~germain/PPS/Topics/C_Language/enumerated_types.html
6.2.13 Bitwise Operators
This is a common mechanism to collect 8 different flags as bits together in a single byte, which is a common concept in C systems programs with a lot of symbolic constants already set up for the flags
e.g. for access control permissions
- new operators:
- & and | for logical AND and OR, respectively.
- binary shift operators >> and <<
- use | to set a specific bit by OR-ing 2 bytes
- use & to check if a bit is set by ANDing 2 bytes
- use unsigned char
Please see Lab 5 bitwise program
-
6.2.14 Prerequisites for next time
- All of Lab 5
- Learn C the Hard Way
- Ex15 through Ex17
- Understand pointer arithmetic
- Understand string implementation and utility functions
- Other issues with pointers: NULL, dangling pointers etc., call by value and call by reference, pointers to functions
- Memory allocation and initialization from the stack and heap
- structs: use with pointers and linked lists
- array of pointers
- argv and argc
- simple file I/O - text and binary
- bitwise operators