A tutorial on programming C

Over the years I have written various things related to programming and hardware. They were published on websites that no longer exist and/or have disappeared into oblivion. I have found a few of them, which I think may still be relevant today, so I will re-publish them here.

I will start with a tutorial on programming in C. I think programming in bare C is still relevant today, even with fancy languages based on C, such as C++, C#, Objective C and Java. It is good to have a good grasp of the basics in C, and be fluent with pointer manipulation and such. It can help you to make your code simpler and more efficient in more modern languages as well.

C: The beginning
—————-

Well, C is a relatively simple language, in that it does not have too many
language constructs. That’s because the language is relatively low-level,
it’s quite close to the machine, so to say. And the machine is a thing of
anarchy and chaos. C was developed to get some structure in this, without
sacrificing too much performance, and size.

We have:

- Data declarations
- Functions
- Operators
- Flow control statements
- Type definitions

That is what you have to work with. In short, code consists of functions,
containing (flow control) statements, with operations on data of specific
types.
That may seem a bit too much to grasp at this point, but we’ll take it one
step at a time.

Data
—-

Ofcourse we need data… like text or numbers. On data, we can perform
operations, like adding, subtracting, multiplying and such.
So first, let’s see how we can give our programs some data.

We have 2 kinds of data: initialized data, and uninitialized data. They are
declared in much the same way, except that initialized data gets a value
assigned at declaration, and uninitialized data does not. So the initial
value of an uninitialized variable is undefined.

First you give the type of your data variable, then you give the name:

char myChar;

This is an unintialized data variable. Initialized data works by assigning an
initial value to the variable:

char myChar = 'a';

Well, char is just 1 primitive data type of C. I will give you the complete
list and their sizes in memory here:

- char          1 byte integer, also used for characters.
- int           2 or 4 bytes integer, depending on the system architecture.
- float         4 bytes floating point number.
- double        8 bytes floating point number.
- (pointers)    depending on the system architecture.

As you see, some data types are dependant on the system. So to make things
easier, I will choose the popular x86 system in 32 bit mode from now on.
Note that other systems may vary.

There are also 2 ‘size modifier’ directives:

- short         2 bytes
- long          4 bytes

These directives can be prefixed to int, and will determine the size. In most
compilers, you can omit the int, and just use long and short as if they were
primitive types themselves.

So for 2 byte integers, both these declarations are correct:

short int a;
short a;

Similarly for 4 byte integers:

long int a;
long a;

And if there’s no size modifying directive for an int, the compiler will use
the default size, which is platform-dependant.

On the x86 system, ints are long, 4 bytes, and so are pointers. Pointers are a
special group of data types, which I will cover later.
(in 16 bit realmode OSes, ints used to default to short, and pointers could
be either near or far. This had to do with the segmented memory model on old
x86 processors (8086, 8088, 80186 and 80286). This legacy system is beyond
the scope of this text, since modern x86 systems use 32 bit addressing. But
when using a realmode OS such as DOS, you have to pay attention to this.)

These data types are seen as signed numbers by default. Signed means that the
number can be both positive and negative. Unsigned variables can not have
negative values.
This is interesting, because a char is only 1 byte, or 8 bits big. It can take
on 2^8, or 256 values. With signed, this would be -128 to and with 127.
With unsigned, it would be 0 to and with 255.
You can control this behavior with the signed and unsigned directives:

signed int;
unsigned int;

It is also legal to define multiple variables of the same type on one line,
even intermixing initialized and uninitialized data.
It works by simply separating all variables by comma’s, like this:

unsigned int myVar1, myVar2, myVar3 = 50, myVar4;

Functions
———

Functions are the core of any C program. They contain the actual code, and
therefore provide the functionality of the program. A function can receive
parameters, the data it will process. And a function can return a primitive
data type variable. You declare a function in the sequence of return type,
function name, and parameter list (in parentheses):

int MyFunction(int param1, unsigned char param2, signed short param3)

Functions also have the possibility to not return anything. In that case we
have the special void data type. We will see this type again later with
pointers. This data type can also indicate that we want no parameters.
So if you don’t need any return value, and no parameters, then you can do:

void MyFunction(void)

(note: there’s old-style and new-style for functions with no parameters.
Official ANSI C wants MyFunction(void), but before the ANSI C standard was
introduced, MyFunction() was used. For most compilers, both styles should
work, but some (eg Borland) may enforce the ANSI C (void) style.)

Blocks of code are always between curly braces: {}. So the code that goes
into our function is no different. The code block immediately follows our
first line which declared the function prototype.

This might also be a good time to explain how to add comments to your program.
A C comment is prefixed by /* and postfixed by */. Anything between those
symbols is considered as comment, and will not be looked at by the compiler.

A small example:

int main(void)
{
    /* Print some text to the screen, using a library function */
    puts("Hello world!");

    /* Exit function with return value */
    return 0;
}

Here we have a function calling the puts() function with a text string as
a parameter (puts() will ‘put’ the ‘s’tring on screen. We will look at these
strings later, aswell as the puts function), and then returning a signed
integer value of 0 (this is an immediate operand).
Note also that each line of code in C is delimited by a semicolon (;).

Now, to look at the calling of functions more closely…

You can use functions from your own source, but you can also import functions
from earlier compiled modules of code, or libraries. Libraries are made up of
a number of modules of code. ANSI C comes with quite a few libraries of code,
which you can use in your programs. With these libraries, you also get header
files, which include these function prototypes, among other things. We will
look at these header files more closely lateron, when we are actually going
to write a program.

Before you can call a function, the compiler needs to know how many parameters
are to be passed to the function, and what types they are. This is done via a
prototype of the function.

If a function is defined in your own source code, above the line where you
want to call it, then the compiler already knows the prototype, since it has
seen the actual function before, and you won’t have to do anything.
If a function is below your call, or imported from a code library or module,
then the compiler won’t know the function, so we have to provide a prototype
before using the function.

A prototype looks much like the first line of a function, except that the
parameter names are optional, and are usually omitted.
An example of a prototype:

int MyFunction(int x, char y, short z);

Or, omitting the names:

int MyFunction(int, char, short);

Then you’re all set to use the function lateron in your source.

Calling a function is as simple as filling in the blanks, basically. All you
have to do is fill in the variables, in the prototype, and the function will
be called, and its return value will be yielded.
You can either import functions from another library of code, or
use functions from your own source.

For example, if we have a function like this:

float sqrt(float);

which will return the square root of a float we give it, we can make a small
piece of code like this:

float x = 25, sqrtOfX;

sqrtOfX = sqrt(x);

As you can see, you can treat a function call like a number. In this example,
the parameter x will be passed to the function, the function will do its work,
and return the square root of x. And this result will be assigned to the
sqrtOfX variable.

Operators
———

So now that we know how and where to put our code, the next question ofcourse
will be: “How do I write code?”. That is not a trivial question, so we will
break code down to some subsets. Our first subset is “operations on data”.

As with most language constructs, C does not have too much operations on data.
Here’s the list:

Mathematical operators:

- +  : addition.
- -  : subtraction.
- *  : multiplication.
- /  : division.
- %  : division remainder/modulus.

Bitwise operators:

- &  : AND
- |  : OR
- ^  : XOR
- ~  : NOT
- >> : shift right
- << : shift left

They all work the same, in that you specify a target variable, then the first
operand, the operator, and then the second operand.

I will give a small example, with some data:

int destination;
int operand1 = 10;
int operand2 = 20;

destination = operand1 * operand2;

This will assign the value of the expression 10 * 20 to the destination
variable.
Well, to be more precise, the right hand side is an expression, which yields
a result. You could just write this in C:

operand1 * operand2;

This would yield the result, but it never does anything with it. In these
first examples, we will assign the result to a variable, but we will see that
there are other things we can do with expressions, such as combining them to
larger expressions. You could say that the above expression is a ‘primitive
expression’

You can also use immediate operands instead of variables:

int destination;
int operand1 = 15;

destination = operand1 / 3;

This will assign the result of 15 divided by 3 to destination.

There is also shorthand notation for the case where one of the operands is
also the destination variable. The shorthand notation works with putting the
operator directly in front of the equals-sign, and specifying only the other
operand. So:

destination = destination ^ 10;

can be written as:

destination ^= 10;

There’s another shorthand case, namely when you want to increase or decrease
the value of a variable by 1 unit. Why do I call it a unit? We’ll see that
later, when discussing pointers. For numbers, the unit is simply the number
1. These are the operators for it:

- ++ : increase
- -- : decrease

They work slightly different from the normal operators. You just prefix or
postfix them to a variable, there is no equals-sign involved. When postfixing
the operator, the value is used in an expression, and afterwards its value is
increased. When prefixing it, the value is increased first, then used in the
expression.

Some examples:

int destination;
int operand1 = 30;
int operand2 = 19;

destination = operand1 - ++operand2;

This will result in the following values:

destination = 30 - 20 = 10
operand1    = 30
operand2    = 20

Postfixing the operator:

destination = operand1 - operand2++;

Gives the following results:

destination = 30 - 19 = 21
operand1    = 30
operand2    = 20

You can also make more complex expression. The compiler should follow standard
operator precedence, and resolve expressions in brackets first. You can
recursively compose expressions, yielding the results of each ‘primitive
equation’ in the sequence the brackets and operator precedence have defined.

For example, this is possible:

int destination;
int var1 = 10;
int var2 = 20;
int var3 = 30;

destination = 27 * ((var2 + --var3)/(var1 * 2))

This will give destination the value of:

27 * ((20 + 29)/(10 * 2)) =
27 * 49/20 =
27 * 2 = 54

And finally, you can also use the results of functions in expressions:

destination = 25 + sqrt(var0);

Now, onto another kind of expressions…

Flow-control statements
———————–

Here comes the interesting part of code, namely controlling the flow of our
program, on specific conditions.

We do this with boolean expressions, a condition is either TRUE or FALSE.
Since a computer can only represent numbers, we express these boolean values
in integers (either char, short, int or long). A 0 stands for false,
everything else stands for true. But as a rule, when assigning the true value
to a variable, 1 is used.

For boolean expressions we have some operators:

- == : equals
- != : not-equals
- && : logical and
- || : logical (inclusive) or
- !  : logical not
- >  : greater than
- <  : less than
- >= : greater than or equal
- <= : less than or equal

Any boolean expression should yield the value 0 for false and 1 for true. They
work much the same as the previous expressions, namely with first operand,
operator, second operand. We will usually not store them in new variables
however, but use the result directly in a statement.

A simple example of a boolean expression would be:

short var;

var == 15;

The result of this expression is true, or 1, if var equals 15, and false, or 0
if var is any other value than 15.

Another example:

var1 && var2;

This expression results in true if, and only if both var1 and var2 are true
(non-zero).

var1 || !var2;

This expression is true when var1 is true and/or var2 is false.

Boolean expressions can also be combined:

var1 && !(var2 >= 10 || var3);

We have only a handful of flow-control statements:

- if-else
- do-while
- for
- break
- continue
- switch-case
- goto

The if-statement is simple, but effective…
‘If (this expression is true) then run this block of code’. And you can
optionally run an alternative block of code if the expression was not true,
using the else-statement after the first block of code.

Going something like this:

int var1, var2;

if (var1 == 3)
{
    var2 += 10;
}
else
{
    var1 /= 5;
}

do and while can be used to run a block of code in a loop, while (‘as long
as’) the boolean expression is true. do-while() is a special form of while(),
where the expression is checked after the code block is executed, instead of
before, with the normal while(). As a result, the code block is always run
at least once.

As an example, I shall show the code for a power function:

int mantissa = 5, exponent = 3, result = 1;

while (exponent--)
    result *= mantissa;

At the end of the loop, result will contain 5^3.
(And as you see, when there’s only 1 line of code in the loop, there’s no need
to put it in between the {} brackets).

Now for an example with do-while:

int result, var1 = 3, var2 = 7;

do
{
    result = var1 * var2;
    var1 += var2;
} while (result < 5000);

Here you can’t result before entering the loop, because result is not
initialized yet. So basically it has just the value that the last program left
there when using that piece of memory. Checking its value would be irrelevant.
We could make it an initialized value, but we know that the result is less
than 5000 the first time anyway, so this way, we save 1 check, and we save
the trouble of initializing the result value.

The for-loop is similar to a while-loop, but it has a special construct, where
you can not only specify a boolean expression which must be true to loop, but
you can also initialize some variables before entering the loop, and you
can specify some expressions which will be carried out after each loop. These
expressions are usually used to update the variables that are used for the
loop. It goes like this:

for (<initialize variables>; <boolean expression>; <update expressions>)

Or less abstract (an example which calculates faculty of x):

int i, x, y;

for (i = 1, x = 5, y = 1; i <= x; i++)
{
    y *= i;
}

In fact, it can be shortened to this:

int i, x, y;

for (i = 2, x = 5, y = 1; i <= x; y *= i++);

(Here we see that when there’s no code block following a statement, we can
just delimit the line of code, and with that the loop, with a 😉

The expressions in a for-loop are optional. For an endless loop, you can
simply omit all expressions:

for (;;)
{
    /* Put code in endless loop here */
}

Which brings us to the next question… What if we want to exit a loop such as
this endless loop, when a certain condition occurs?
That’s where the break statement comes in.

The break statement will exit the current loop, and continue the rest of the
program.
You will usually use this with an if (conditon) break; construction.

A small example:

Let’s say you’ve got a program that reacts to the user’s input (a menu or
something).
It should stay in the loop until the user chooses ‘quit’:

for (;;)
{
    /* Put code to display the options here */

    if (getUserInput() == QUIT)
        break;
    else
        doSomething();
}

Question: Do we really need to put doSomething() in an else-statement?

<Author gets some coffee, giving you time to think over the question>

Answer: In this case we don’t. Notice that when the if is true, it will only
do the break (and therefore breaks out of the loop). When the if is false, it
will do doSomething().

The continue statement is similar to the break statement, but it only exits
the current cycle of the loop and enters the next.

Going something like this:

int i, n, m;

/* user inputs n */

for (i = 0; i < 100; i++)
{
    m = i % n;

    if (m == 0)
        continue;

    n /= m;
}

We see here how we can exit the current loop on a certain condition. In this
case we exit if m is 0, to avoid a division-by-zero exception.

But, now back to our menu-example of earlier…
OK… so now you’re probably wondering why you’d want a menu with only a
quit-option.
Well… You don’t.

Now, there are two ways to add new options to the menu.
You could of course do it by adding if’s:

int userinput;

for (;;)
{
    /* Put code to display the options here */

    userinput = getUserInput();

    if (userinput == FILE)
        doFile();
    else if (userinput == EDIT)
        doEdit();
    else if (userinput == VIEW)
        doView();
    else if (userinput == QUIT)
        break;
    else doBadInput();
}

OK, this’ll work, but it’s not the ideal solution.
The compiler only sees a couple of if’s, but it isn’t able to tell whether
they have anything in common or not (compilers aren’t that smart yet.),
therefore the code it produces won’t be as efficient as it could be.

The best way to do it is to use the switch-case statement for this. The switch works
by testing one int variable against several (const) solutions.

The above example menu-example using switch ():

int userquit;

for (userquit = 0; userquit == 0;)
{
    /* Put code to display the options here */

    userinput = getUserInput();

    switch (userinput)
    {
        case EDIT:
            doEdit();
            break;
        case VIEW:
            doView();
            break;
        case QUIT:
            userquit = 1;
            break;
        default:
            doBadInput();
    }
}

Some comments on this structure: Firstly, the code looks much better 🙂

Secondly, for the compiler, it’s very easy to see that the cases are related
to each other, because they all depend on the same variable, by definition.
Hence the compiler can make optimizations that a normal block of if’s won’t
allow.

And thirdly: see the break at the end of each case? That is there to stop the
program entering the other cases. You can use this mechanism if you want to
make the program do the same thing for some of the cases. For example suppose
you’re on a diet that prescribes that you can only eat meat on tuesday,
thursday and sunday, and you want a program to print (when given a day-number
between 1 and 7) if that day is a meat-day?

/* day 1 is monday */
int day = getUserInt();

switch (day)
{
    case 2:
    case 4:
    case 5:
        puts("Meatday!!");
        break;
    case 1:
    case 3:
    case 6:
    case 7:
        puts("No meat");
        break;
    default:
        puts("This day doesn't exist, no meat for you, pal!");
}

Let’s say we enter day = 5. It’ll start executing from case 5: until it hits
a break.
See how good this looks? You can instantly see which cases do what…
Much better than:

if (day == 1 || day = 3 || day == 6 || day == 7)
    puts("No meat");

else if (day >1 && day <8)
    puts("Meatday!!");

else puts("The day doesn't exist");

The compiler can’t really optimize this in terms of speed, but here it just
looks better.
So there!

And now onto our last control flow statement: goto.

This statement can be used to make jumps to and fro in the code. It can be
quite useful in some cases.

The destination will be marked by a label. Labels are defined by a name,
followed by a semicolon:

myLabel:

They are useful for exiting multiple nested loops at once, where break cannot,
among other things.

That would look something like this:

int i, j;

for (i = 25; i >= 0; i--)
    for (j = 0; j <= 25; j++)
            if (i == j * j)
                    goto endOfLoop;

endOfLoop:
/* program continues here */

And that about wraps up our control-flow. Now on to the next challenge…

Arrays and pointers
——————-

Okay, here comes an interesting part of C. We will get into direct contact
with a part of the machine here, namely the memory.
The variables we used earlier were stored in memory aswell, but we didn’t get
to see where and how exactly. The compiler took care of that for us, we could
just use the variables by the name and type we had given them.
Now we are going to use linear sets of data, called arrays. And to understand
how they work exactly, we have to look at how the machine uses its memory.

So how does the machine use its memory?
You could picture it as a giant cupboard of drawers, and each drawer has a its
own unique number. In every drawer we can store one byte.
Picture it like this:

0: [  ]
1: [  ]
2: [  ]
3: [  ]
4: [  ]
5: [  ]
...

Or perhaps you prefer a horizontal look at it:

0:  1:  2:  3:  4:  5:  ...
[  ][  ][  ][  ][  ][  ]...

So to get a byte in memory, all we have to know is the number of its
container. We call this number the address.

It’s also interesting to look at how we store larger variables in memory.
This is another platform-dependant matter. An int for example is 4 bytes.
Now, there is 2 ways to store those 4 bytes in memory. The way we store multi-
byte numbers into memory, is called Endianness. There is Big Endian, and
Little Endian.

Let’s say that the 4 bytes of our int are AA, BB, CC and DD, from most
significant part to least significant part.
So our int looks like AABBCCDD.
In Big Endian, we store it like humans write numbers: from left to right, most
significant to least significant, so it will look like this:

[AA][BB][CC][DD]

Little Endian stores it from least significant to most significant instead.
So we get the reverse:

[DD][CC][BB][AA]

Nearly all systems use the Big Endian byte order these days, it became the
preferred method for new systems in the 80s. It is also the ‘network byte
order’, used on all standard networks, including ofcourse the internet.
But, bear in mind that the original 8086 dates from 1978, when Little Endian
was still used predominantly, so the x86 is one of very few systems around
today, that still uses the now largely obsolete Little Endian method.

So, in short, the address of a multibyte variable will always point to the
most significant byte in Big Endian, and it will always point to the least
significant byte in Little Endian.

Well, what is a pointer? That is simply the address or ‘reference’ of a
variable in memory. A pointer has a type, and contains an address, which is
a 32 bit number. So like a normal variable, it actually contains a number, and
it is stored in memory (storage is also affected by the Endianness of the
architecture, as with the normal multibyte variables).

Declaring a pointer of some type is as simple as giving the type, and then the
name, with an asterisk (*) prefixed to it:

int *myPointer;

You can use pointers to any of the primitive types, and also to user-defined
types, which we will see later.

But now, how do we assign an address value to it?
One option is to use the ‘address-of’ operator, which gets the address of a
variable. It is as simple as prepending ‘&’ to the variable name:

char myChar = 25;

‘&myChar’ will then resolve to the address of the char in memory.

Assigning a value to a pointer works the same as with the other variables:

char *myPointer, myChar = 25;

myPointer = &myChar;

Now myPointer contains a reference to myChar. There is also a dereferencing
operator in C, this will get the value of the variable at the address that is
being pointed to. It works by prepending a ‘*’ to the variable name. You could
say that it is the opposite of the &-operator. The &-operator turns a variable
into a pointer, and the *-operator turns a pointer into a variable.

So we could store the char that myPointer is pointing to, back into a char
like this:

char *myPointer, myChar = 25, newChar;

myPointer = &myChar;
newChar = *myPointer;

Now newChar has a value of 25;

We could also assign a value to an address by dereferencing a pointer. Take a
careful look at this:

char *myPointer, myChar;

myPointer = &myChar;
*myPointer = 25;

We store 25 at the address that myPointer is pointing to. But, pay attention
here! Look what happened… myPointer contained the address of myChar. So by
storing 25 to myPointer, we stored it into the address of myChar, and
therefore myChar now has the value of 25!

Now from pointers on to arrays…

An array in the field of programming is basically what the word means: an
array, a row.
A row of variables of the same type, more specifically, stored in one linear
piece of memory. You can picture it like this:

An array of 6 chars (1 byte), starting at address 521:

Address: 521: 522: 523: 524: 525: 526:
Index:   [ 0] [ 1] [ 2] [ 3] [ 4] [ 5]

Note that we start indexing by 0, not 1. So, the array of 6 chars has indices
0 to 5 for all the elements there. More generally speaking:

N elements of an array are indexed from 0 to (N-1)

Also note that we can get the address of each individual element by:

starting address + index

Let’s look at arrays of bigger types. For example, an array of long ints.

An array of 4 long ints (4 bytes), starting at address 64:

Address: 64:   68:   72:   76:
Index:   [   0][   1][   2][   3]

We see here that the correct addressing formula for all types is:

starting address + (index*sizeof(type))

Incidently, sizeof(type) is recognized by C. If we put:

sizeof(int)

it will evaluate to a value of 4, which is the number of bytes in an int.
More generally speaking, sizeof(x) will evaluate to the total number of bytes
of memory used by x.
This is applicable to all primitive and user-defined types, and as we will
see later, also to arrays and structures.

How do we define arrays and use arrays in C?

There’s basically 2 variations… We have statically allocated arrays and
dynamically allocated arrays.

First we will look at the statically allocated ones. They are defined much
like a single variable: type, name. But after the variable name we put the
number of elements we want, in square brackets []. Looking like this:

char myArray[256];

Now, we have set up an array of 256 (uninitialized) elements. myArray is the
pointer to the first element in the array.

To access an element in the array, you can simply use the subscript operator
([]), as it is called: name[index].
For example, we want to test whether element 5 equals 25.
You can manipulate myArray[5] like a normal char, so we can just create a
normal boolean expression with it:

if (myArray[5] == 25)
{
    /* Do something */
}

Assigning values and doing operations on array elements are also done like
we’ve seen before.

Here’s a small example that multiplies every element of an array by 23:

short numbers[36], i;

for (i = 0; i < (sizeof(numbers)/sizeof(numbers[0])); i++)
{
    numbers[i] *= 23;
}

Lets also take a closer look at this expression:

(sizeof(numbers)/sizeof(numbers[0]));

This yields the size of the array. So i will go from 0 to 36. Namely, what
happens here, is this:

sizeof(numbers) will give use the total amount of memory used for the array
numbers. This will be in bytes. But a short int is 2 bytes, so we need to
correct for that.

sizeof(numbers[0]) will give us the size of element 0 of the array. Element 0
is 1 short int, so this will give us 2 bytes (ofcourse all elements in the
array are equal size).

So if we divide the two, we get this:

(sizeof(numbers)/sizeof(numbers[0])) = 72/2 = 36

Which is our arraysize. This little trick is very useful. If for example we
would change the type of the array to long int later, this line can remain
untouched, as it would still yield the correct arraysize. And if we decide to
change the size of the array to say 50, then all we have to do is to change
the definition of the array:

short numbers[50];

and the for-loop will still loop through all elements.

It’s also possible to put initialized data into an array. In that case, the
size does not need to be specified, because the compiler can derive that from
the number of elements it needs to add.

We give a list of elements, in {} brackets, and separate each element with a
comma:

int myArray[] = { 10, -642342, 12321, 213122, 1231 };

For text, there is a special array definition. Text is stored as an array of
chars, terminated by a 0. For example:

char myText[] = {'H','e','l','l','o',0};

But, C provides a special construct for such text strings. The following is
equivalent:

char myText[] = "Hello";

The 0-terminator will be appended automatically.
This construct will also allow you to append strings together. If you put 2
or more strings after one another, they will be appended to form 1 string.
This makes it possible to split long strings up in multiple lines, or even
add comments inbetween. A few examples:

char myText[] = "Hi" " there!" " How are you doing?";

char myText[] = "Hello, "
                "how are you?";

char myText[] = "This is version "
/* edit this */ "0.1 beta"
                " of the software";

It’s also possible to initialize some elements, and specify a size. This can
be useful when for example only the first element needs to be 0 (for a zero-
terminated list, for example):

short myArray[32] = { 0 };

myArray is a pointer, in that it evaluates to a memory address, but it’s a bit
different from the ones we’ve seen earlier. Namely, this pointer is not stored
in memory, but the address is a constant rather, which you can use, but not
modify. The pointers we’ve seen earlier, are stored in memory, and act like
variables. You can also use operations on them.

For example, we want to fill an array with all powers of 7:

unsigned int myArray[256], *myPointer, i, power = 1;

myPointer = myArray;

for (i = 0; i < (sizeof(myArray)/sizeof(myArray[0])); i++)
{
    *myPointer++ = (power *= 7);
}

Now, a few notes here… First, this part:

*myPointer++

The ++ postfixed to myPointer basically does what you would expect it to do.
The operation is performed, and after that, the variable is increased.
In this case, the operation is a dereference. So we assign a value to the
address that myPointer points to, and then we increase its value by 1.
“1 what?” you may ask. The answer to that, is “1 element”.
If you want to increase the pointer by say 6 elements, then you can just add
6 to it, like this:

myPointer += 6;

Basically, you can do any operation on it, even stuff like adding 2 pointers.
Note however, that it adds 6 elements. The actual address that myPointer
contains, will be increased by 6 * sizeof(type), so in this case, that is
6 * 4 = 24. We will look into that some more, when we get to typecasts.

Now to the second part:

(power *= 7);

This is a nice shorthand notation. The statement in brackets is executed, and
its value is evaluated, and can be assigned to our array-element.
So, power gets multiplied by 7, and the new value of power is then assigned
to *myPointer.

On to dynamically allocated arrays.

Dynamically allocated arrays can be useful when you don’t know the size of
the array beforehand, because it is dependant on user input, for example.
Or, when you don’t need the array for the duration of the entire program, and
you would like your memory deallocated after you’re done with it.

For the allocation of memory, we have the following function:

void *malloc( size_t size );

Well, we see here that it returns a void *. A type we haven’t seen before.
Basically, it is a typeless pointer, and it cannot be dereferenced, since we
don’t know the type, and therefore we don’t know how large an element would
be. But, we assign its value to a pointer of the type of the array that we
need, so there is no trouble. Assigning the value of a variable of one type
to the value of another type, is called typecasting. Let’s look deeper into
that before we continue.

Typecasting
———–

This is a very simple and short subject… Typecasting a variable is
basically forcing the compiler to interpret the data of the variable as if
it were of another type. This is done by putting the desired type before the
variable (or expression), in brackets, looking like this:

(int)myVariable;

Here’s an example on how a cast can affect variables:

short i;
char j = -1;

i = (unsigned char)j;

Now, instead of what you would expect to happen, i is not -1 now, but it is
255. What happened is this: j is cast to an unsigned char. The bitpattern for
-1 is 11111111 in 2s complement notation. Now, we interpret that same bit
pattern as if it were not signed. In that case, 11111111 is equal to 255.
And that is the value assigned to i.

You could also use a cast to add a certain amount of bytes to a pointer,
instead of a certain amount of elements. A char is 1 byte, so we could cast
our pointer to a char pointer temporarily, and add the number of bytes we
want.

For example:

int *myPointer, myBytes;

(char *)myPointer += myBytes;

We temporarily change the type of myPointer to char *, then we add the number
of bytes we want, which is stored in myBytes in this case. Afterwards,
myPointer will be an int * again, since a cast is only temporary.

In most cases, a cast is implicit. For example, void * can always be cast to
other pointer types. In some cases tho, the compiler might give a warning,
or you want to change the behavior to that of another type temporarily.
That’s when you use a cast.

Now, back to the malloc() function…

There was this other new thing… the size parameter has type size_t… That’s
odd… we haven’t seen size_t in the variable types.
Well, this is because size_t is not a primitive type, but a user-defined one.
Basically, it’s just a primitive type, but given another name, so that it
makes more sense when reading the source.

When we search through the header files (we will look at header files more
closely lateron), we find that size_t is defined in a file called Stddef.h,
by the following line:

typedef unsigned int size_t;

So, size_t is basically nothing but an unsigned int.
The typedef directive is followed by the primitive type, and then a list of
new names to be used as variables of that type, separated by comma’s.
For example:

typedef unsigned int colour, serial, uint;

Now you can declare variables like this:

colour red, green, blue;
serial nr1, nr2, nr3;
uint a, b, c;

An interesting application for these typedefs is portability. You could write
a program that uses only user-defined types, and to port it from one system
to another, all you would have to do is adapting the type-definitions to the
new architecture.

For example, you need a 32 bit signed integer type, and on architecture X, you
would need a long int, and on architecture Y, you would need an __int32.
You would choose a usertype to represent the 32 bit signed integer, let’s take
sint32 in this example.

Then all you have to do to make it work correctly on architecture X, is this:

typedef long int sint32;

And to make it work on architecture Y, you would use:

typedef __int32 sint32;

Then there’s also the compound datatype in C, the data structure. Structures
contain a set of variables, and can be very useful for adding logic and
structure to your program. Namely, when you deal with real-world entities for
example, and they have certain attributes, you can group them together into
1 compound datatype.

Let’s say you want to store data about cars. For example, you want to store
model, year and colour of a car.
First you use the ‘struct’ keyword. Then you give your type a name. And then
you group some types together, and give them names, just like you would do
with separate variables.
Put them in between {} brackets.

Looking like this:

struct Car { unsigned int model; unsigned short year; unsigned int colour; };

Then when you want to define a variable of this type, you have to mention that
it is a structure, by using the struct keyword.
Looking like this:

struct Car myCar;

You can also initialize the struct, which would look like this:

struct Car myCar = { 911, 1989, 500 };

You could make a type of ‘struct car’, by using a typedef. This would save you
from typing ‘struct’ before each new variable definition.
So you would define a type like this:

typedef struct Car sCar;

And then define your variable like:

sCar myCar = { 911, 1989, 500 };

The most convenient way is to combine the struct definition with the typedef.
You don’t have to give the struct a name then, since you won’t be needing the
name of the actual struct, but only the name of the newly defined type.
Only if you would define a variable of the same type inside the struct, since
then the new name is not known until the typedef, which takes place after the
struct definition.

The entire line would look like:

typedef struct { unsigned int model; unsigned short year; unsigned int colour; } Car;

If you would reference a struct inside a struct, you would do this:

typedef struct tagSelf ( struct tagSelf next; } Self;

You use a temporary name, or ‘tag’, to be able to have the struct reference to
itself.

It is also common to write each member of the struct on a new line, to
increase readability:

typedef stuct {
    char *processor;
    unsigned int memory;
    unsigned int diskspace;
} Computer, *pComputer;

Note also that we defined 2 names here, Computer and *pComputer.
*pComputer is a dereferenced pointer, hence the *.
This automatically leads pComputer to be a pointer to a Computer struct
(the ‘p’ stands for pointer. For improved readability, sometimes variable
names are prefixed with abbreviations of their type, like this ‘p’ for
pointer. This is called Hungarian notation. The inventor was a Microsoft
programmer from Hungary, by the name of Charles Simonyi. The code looked like
some weird foreign language at first sight, and since Simonyi was Hungarian
by birth, they decided to call this convention Hungarian. To this day, all
Microsoft code uses this notation. It can be very convenient.)

So basically we combined this line:

typedef (Computer *) pComputer;

together with the definition of the Computer structure itself.

To access members in a struct, we use the dot (.) operator. The syntax is:

struct.member

This will resolve to a ‘normal’ variable, which can be used in expressions
just as usual.

An example:

Computer myComputer;

myComputer.processor = "MC68000";
myComputer.memory = 1048576;
myComputer.diskspace = 30234234;

When you would have a pointer to a struct, you could do this:

pComputer myComputer;

(*myComputer).processor = "MC7400";

But, there is a special arrow (->) operator for pointers to structs, which is
the preferred method:

pComputer myComputer;

myComputer->memory = 655360;

It is also possible to create arrays of structs, and even initialize them.
For example:

Computer myComputers[2] = { { "MC68000", 1048576, 30234234 },
                            { "Pentium", (48*1048576), (3072*1048576) } };

The sizeof operator also works on user-defined types and structures, as I said
earlier. So sizeof(Computer) will return the combined size of all the members
of the Computer struct:

sizeof(char *)
sizeof(unsigned int)
sizeof(unsigned int) +
----------------------
sizeof(Computer)

In other words:

 4
 4
 4 +
----
12

sizeof(myComputers) would return the total size of the array, which will be
2 * 12 = 24 bytes, in our example.

There is another type, very similar to the struct. That’s the union. A union
is used exactly like a struct, but with one difference: instead of having
all members, you can pick one member, and use that.
An example should clarify it:

union Number { int i; double d; };

union Number myNumber;

Now if we want to use this variable, we can use either the int:

myNumber.i = 245;

or the double:

myNumber.d = 26.32345;

You can also use the typedef-combinations like we saw earlier with the struct:

typedef union { int i; double f; } Number;

Number myNumber;

The union will always take up as much memory as necessary for the largest
member. sizeof(<union>) will also return that size.
sizeof(<union.member>) will return the size of the type of that member.

That about covers the user-defined types, now back to our malloc() call.
We now know that we can simply specify the number of bytes as an argument of
the malloc() function, and we get a void * back.

So, we can have it cast implicitly to a pointer, and we have a pointer of the
type we want, to a piece of memory of the size we want.

This can be useful to dynamically allocate variables, and keep the memory
usage under control. We allocate an array when we need it, and deallocate it
again, when we’re done with it, and save the memory for other uses and
applications.

For example, to allocate 1 car structure dynamically, we can do this:

Car *myCar;

myCar = malloc(sizeof(Car));

We can also allocate an array dynamically… simply by multiplying the size of
1 element by the number of elements we want.
For example, an array of 25 ints:

int *myArray;

myArray = malloc(25*sizeof(int));

Now, to use this array, we can simply apply the subscript operator to the
pointer, just like with the statically allocated arrays:

myArray[20] = -15;

To deallocate the memory again, we can use the free(void *) function. As you
can see, it takes a pointer as its argument. Since it’s a void *, any pointer
will be implicitly cast, so we can just feed it any type of pointer directly.

To deallocate our int-array, we simply type:

free(myArray);

and our memory is regained.

Our first program
—————–

Well, we now covered just about everything that C can do. Now, let’s look at
how we create programs with the language constructs we’ve seen. How to group
it together to a sourcefile, and how to create a binary executable from it.

A sourcefile is a simple ASCII file, and can be written with any texteditor
you like.
There are 2 types of sourcefiles in C:

- normal sourcecode, text files usually with the .c extension.
 - headers, text files usually with the .h extension.

There is no physical difference between .c and .h files. Both can contain any
form of C statements, directives and code. It’s more of a habit of the C
programmers to make the distinction, etiquette rather than syntax.

Headers are a special kind of sourcefiles, which usually come with code
libraries. They contain the necessary function prototypes, type definitions,
constants and macros for use with the library. They don’t normally contain
code.

When you use code from a library, you include its header file in the source:

#include "header.h"

The compiler will search the current directory for header.h first, and if it’s
not found there, it will continue to search the path specified by the INCLUDE
environment variable.

There is also this form:

#include <header.h>

This will not search the current directory, but will start with the INCLUDE
path immediately.
To speed up compilation, use the <> form whenever possible, to prevent the
compiler from searching too much.

Right, now for our first real program, the infamous “Hello world” example…

We use the puts() function to print a null-terminated string to the console
output stream.
Your API reference will tell you which header and which library to use.
On *nix systems, you can type “man puts”, and on most other systems, there
will be some help function implemented into the IDE, where you can search for
“puts”.
You will find that we need the stdio.h (‘standard I/O’)header file, so we write:

#include <stdio.h>

This will basically paste the code from stdio.h into your current source file
at the place of the #include statement.

A C program starts execution from the main() function, which is defined to
return an int, which will be the exit-code to the OS.
There are 2 versions:

- no arguments: int main(void)
- Commandline passed from the OS: int main(int argc, char *argv[])

argc is the argument-count, the number of commandline parameters passed to the
program.
argv is an array of null-terminated strings, of size argc.
Note that argv[0] is the program name itself, so if for example you had a
commandline like this:

myprogram one two three

Then you would get argc = 4, and:

argv[0] = "myprogram"
argv[1] = "one"
argv[2] = "two"
argv[3] = "three"

For our “Hello world” example, we could use the main(void) version, since we
do not require any commandline parameters to be passed to this program.

(Note: The main() function is called by a piece of code known as the ‘stub’.
The stub contains the raw entrypoint of the executable, and sets up the
environment for running a C program (such as parsing the commandline,
setting up standard input and output streams, and the OS environment
variables). It then calls your main() function. When main() returns, the
stub will clean up the environment again, and pass the return value of main()
back to the OS and exit.)

Actually, I already gave the code for the program as an example for the
functions.

The entire program would look like this:

#include <stdio.h>

int main(void)
{
    /* Print some text to the screen, using a library function */
    puts("Hello world!");

    /* Exit function with return value */
    return 0;
}

Save it to a text file called “hello.c”, and after discussing the compiling
process, we will make an executable out of it.

Now that we have our first source, it is time to compile it to a running
program. This is a bit problematic to explain, since each compiler has its own
commandline options and special behavior. I will just tell a bit about the
process of compiling and linking, so you will know what to look for in the
documentation of your C compiler.

There are several levels in the transition from C source code to a binary
executable.
Technically, we have these levels:

4. C source code
3. Instruction listing (assembly source)
2. Linkable code object
1. Executable image

A compiler will take us from level 4 to 3. Then an assembler will take us from
3 to 2, and finally, a linker will take us from 2 to the final level of 1.

But in practice, we find that compilers will go from 4 to 2 immediately these
days, and use a bytecode format internally, rather than a true assembly
instruction listing. They often still have the option to output an assembly
listing tho, should the programmer so require.
And compilers automatically invoke the linker these days, so that we can go
from C source code to an executable in one go.
Compiling and linking will always be separate steps tho, since you may have
some precompiled code which you want to link to the newly compiled code, such
as imported library functions, or you might want to create some precompiled
libraries, so you will not link them to a binary executable at all.

So, I can’t really help you with the commandline options for your compiler.
But now that you know what the compiling process is about, you should be able
to figure it out yourself. I can give you 2 examples tho, for both Microsoft’s
compiler and the GNU C Compiler (gcc).

Microsoft Compiler:

CL hello.c

This will compile hello.c into hello.obj, and then link in the necessary
standard C libraries. CL.EXE invokes LINK.EXE itself. It has libc.lib as a
standard library, so you do not need to specify any libraries for standard C
functions. You only need to specify libraries for non-C functions, like
Windows API or third party libraries.
CL will give the executable the same name as the source file which contains
the main() function, so in our case, we will get hello.exe as a filename.
In this case, this line is all you need to have hello.exe generated.
if you need other libraries or objects, you can simply add them to the
commandline: CL hello.c blah.obj foo.lib
So, try to execute your first program now!

GNU C compiler:

gcc -o hello hello.c

This compiler also has a C library as standard, so no need to specify it. The
-o switch specifies the name of the executable. If no name is given, then it
defaults to a.out instead.

Some example programs
———————

Here is a small program that I think will be useful. It will show you how to
do some simple interaction with the user. It will also allow you to play with
some bitwise operations, and make you get used to playing with hexadecimal
values. This might come in handy lateron, as hexadecimal numbers and bitwise
operations can be quite useful in speeding your code up.

We will see puts() here again. The puts() function puts a string onto the
standard output stream (stdout), which is normally the screen. And I say
‘normally’ here, because streams may be redirected to other devices, such as
printers for example. We will also see gets(), which can read a line from the
standard input stream (stdin), which is normally the keyboard input.
It will put this line into a char array. The user must pass a pointer to this
array to the gets() function.
The getchar() function is similar, but it returns the first character from the
stdin stream immediately, and does not require a buffer to be passed.

We also get to meet fflush(), which ‘flushes’ a file stream. The stdin and
stdout streams are considered as files aswell in C, so we can use it on these
streams aswell. In our case we flush the stdin stream, to remove any
superfluous input that we would rather ignore.

The last function we get to meet, is printf(). This is a function to print
formatted text (hence the ‘f’) onto stdout. It is quite an interesting
function, because you can pass as many arguments as you wish. It has ellipsis
(…) in its prototype, to make this possible. I will give an example of how
to use them later.
The printf takes a format string as its first argument, which will basically
define what to do with the next arguments. How to format these, to be more
specific.
It will use a %, followed by some formatting rules, at the place where you
want the respective argument to be printed.
In our case we will see %s, which will print the argument as a string.
So for example, you could do this:

char name[] = "Jopie";

printf("Hello %s!", name);

This will result in:

Hello Jopie!

We will also see %d, which will print a signed decimal number. For unsigned
numbers, there is also %u.

And lastly we will see %08X… This formatting is a bit more advanced.
The 0 indicates that we want 0s prefixed to our number. The 8 indicates that
we want to have 8 digits in total. And the X tells printf() to print the
argument as a hexadecimal number, with uppercase alphas. There is also
%x for lowercase alphas, but I prefer uppercase.

There are a lot more options with printf(), but it would be a bit much to
cover them all here. I suggest you use your reference for that instead.
I will leave you with one last example, then we go to the actual program:

unsigned int age = 16;
char name[] = "Jopie";

printf("Congratulations %s, today is your %uth birthday!");

Result of this would be:

Congratulations Jopie, today is your 16th birthday!

Note also that puts() always does a newline after printing the string, while
printf() does not. So in some cases, you might want to use printf() instead of
puts() to avoid that newline. Be careful though, if your string contains %,
printf() will see this as formatting. A workaround for this is using %c, which
prints 1 character:

printf("I can use %c in my strings with printf()", '%');

Anyway, here is the program:

#include <stdio.h>

char name[32];
int x = 1, y = 1;

/* Print menu, allow the user to choose an option, and process that option.
   Returns 0 if user chose to quit, else 1 */
int doMenu()
{
    char buffer[32];

    printf( "\nHello %s, here is the menu for today:\n\n"
            "1) Enter X\n"
            "2) Enter Y\n"
            "3) X + Y\n"
            "4) X - Y\n"
            "5) X * Y\n"
            "6) X / Y\n"
            "7) X & Y\n"
            "8) X | Y\n"
            "9) X ^ Y\n"
            "A) ~X\n"
            "B) ~Y\n"
            "C) Quit\n\n"
            "Current X = %d\n"
            "Current Y = %d\n\n"
            "What is your choice? ",
            name, x, y);

    /* remove additional input from stdin stream */
    fflush(stdin);

    /* Get a character from stdin stream and process the command */     
    switch(getchar())
    {
        case '1':
            /* remove additional input from stdin stream */
            fflush(stdin);      

            puts("Please enter new value for X.");
            x = atoi(gets(buffer));
            break;
        case '2':
            /* remove additional input from stdin stream */
            fflush(stdin);      

            puts("Please enter new value for Y.");
            y = atoi(gets(buffer));
            break;
        case '3':
            printf("%d + %d = %d\n", x, y, x + y);
            break;
        case '4':
            printf("%d - %d = %d\n", x, y, x - y);
            break;
        case '5':
            printf("%d * %d = %d\n", x, y, x * y);
            break;
        case '6':
            if(y == 0)
                puts("Y = 0. Division is not defined.");
            else
                printf("%d / %d = %d\n", x, y, x / y);
            break;

        /* The following are bitwise operations, let's also print out
           the hexadecimal representations for clarity */
        case '7':
            printf("%d & %d = %d || "
                   "%08X & %08X = %08X\n",
                   x, y, x & y,
                   x, y, x & y);
            break;
        case '8':
            printf("%d | %d = %d || "
                   "%08X | %08X = %08X\n",
                   x, y, x | y,
                   x, y, x | y);
            break;
        case '9':
            printf("%d ^ %d = %d || "
                   "%08X ^ %08X = %08X\n",
                   x, y, x ^ y,
                   x, y, x ^ y);
            break;
        case 'a':
        case 'A':
            printf("~%d = %d || "
                   "~%08X = %08X\n",
                   x, ~x,
                   x, ~x);
            break;
        case 'b':
        case 'B':
            printf("~%d = %d || "
                   "~%08X = %08X\n",
                   y, ~y,
                   y, ~y);
            break;
        case 'c':
        case 'C':
            /* The user wants to leave, we return 0, to break out of the
               while() loop in main() */
            puts("Thanks, and have a nice day.");
            return 0;
        default:
            /* Since we have arrived here, apparently none of the valid
               choices were reached */
            puts("Invalid choice.");
            break;
    }

    printf("Press enter to continue.");
    getchar();

    return 1;
}

int main(void)
{
    /* Input the username */
    puts("What is your name?");
    gets(name);

    /* Keep returning to menu until user chooses to quit */
    while(doMenu());

    return 0;
}

This might be a good time to greet some people.
First of all, ofcourse my Diamond Crew people, ewald and Maybird.
And ofcourse also all my Phrozen Crew mates, you know who you are 🙂
And the #Win32ASM guys, hutch, llama, Iczelion, nuu, dowap (or whatever :),
and the rest.
Ofcourse Kalms, and last but not least, the ladies (in no particular order,
you know how women are :).
Sara`, Tracy, blorght, MoonDawn, Nitallica, CandyII, jessca, embla, Baudie,
flipgrrrl, and yes, even taylor^ 🙂

X-Calibre

Advertisements
This entry was posted in Software development and tagged , , , , , , , . Bookmark the permalink.

One Response to A tutorial on programming C

  1. Pingback: CPUs and pipelines, how do they work? | Scali's blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s