[Masters of the Void]
8. Lists of Stuff

Previous | Next

Confused about pointers?

Pointers are a topic many people have problems with, so don't feel bad if you didn't understand them right away. If you still don't quite understand how pointers work, you can try to follow along anyway. This chapter may be incomprehensible to you, but this is as deep as we'll go into pointers, so from then on you should be able to wiggle along.

However, pointers are a very important and handy tool for your programs, so you may want to re-read chapter six in particular, and any other tutorials you can find, and maybe it'll all come to you eventually. If it doesn't, try just using them. You'll fall on your nose a lot of times, but practice makes perfect, and very likely it will just 'click' one day.

Our friend the array, array, array...

Most of your programs will eventually need to keep a list of one thing or other. For this purpose, C provides the array. To create an array, you simply write:

int    myIntArray[20];
This will give you an array that's good to hold 20 ints. To get at one of these ints, you simply write the name of the variable holding the array, with the number of the element you want in square brackets behind it, e.g.:
int  elementSix;
elementSix = myIntArray[5];
You can probably guess how you assign a value to any of these entries:
myIntArray[5] = 42;
This will assign the number 42 to the sixth element in our list. Sixth? Yes. Because C counts its arrays starting with zero. So, the first element is myIntArray[0], the second is myIntArray[1], the third myIntArray[2], the 20th is myIntArray[19]. The element myIntArray[20] does not exist. But note that C does not check whether the number you give it for an array is in the proper range. So, if you write
myIntArray[20] = 700;
then in our example you will simply write that 700 somewhere into RAM. Why that? Well, a C array is kept in RAM simply as a bunch of ints, one after the other. So, if you write int myIntArray[20]; to create an array, the memory the compiler creates looks like this:

20 boxes on checkered paper, labeled 'item 0' through 'item 19'.

If this is news to you, you may want to re-read Book 5. You'll also know that the memory address of item 0 in our list is 4, that of item 1 8, that of item 2 12 ... and C only needs to remember the address of item 0 and can calculate the memory address of each array element with the simple formula:

itemAddress = addressOfItemZero + (4 * itemNumber); // 4 is the size of a single int.
Notice something? With addressOfItemZero being 4 (like in our example), the calculation for item 0 is:
itemAddress = addressOfItemZero + (4 * 0);
and since 4 * 0 is 0, this is the same as addressOfItemZero. So that is why C arrays always count from 0! Now, you probably noticed that I marked up a couple of squares right after the end of our array as "item 20". This is the spot where C will put your number when you refer to item 20, because its address would be 4 + 4 * 20 == 84.

In this example, if you assigned 700 to item 20, this would simply mean that you're running off the end of your array. Your program would probably continue working just fine. But the trouble is that, since you only requested memory (the blue boxes) for 20 items, C will think that the spot right after item 20 was free. If you declare two arrays after the other, e.g.:

int    myIntArray[20];
int    otherArray[2];
they would probably look like the following:

two arrays stored sequentially. The second array's first item is marked red.

Notice how C decided to put the first item of otherArray (marked red) immediately after item 19 of myIntArray? Exactly at address 84, where item 20 would be. So, if you now assign 700 to item 20 of myIntArray, you'll overwrite item 0 of otherArray. Ouch.

"Okay, so it'll display the wrong number to the user, what's the problem?" You may think. But what if the variable immediately following your array wasn't an array of ints, but rather one of characters? You would overwrite the first four characters of the user's novel with garbage bytes.

Or worse, the data at address 84 might not even be data. It's perfectly possible that the machine code that implements one of your functions is occupying that memory. And with just the right amount of bad luck, you could overwrite the code to save your document with bytes that are equivalent to the code that formats your hard disk. And worse: You wouldn't even notice it when you do this. The user may not notice until four hours later when he saves his finished novel.

Overwriting memory is an error that can be very hard to track down, and can come in many disguises. So, when you get an odd error message or your data somehow becomes corrupted, or a built-in system command like printf() suddenly crashes when you feed it perfectly valid data, it's usually an indicator that you screwed up some memory "earlier during the day".

Changing an array's size

Now, having 20 items is all nice and fine, but you rarely have a fixed number of items in a list. Trouble is, standard C doesn't let you resize an array. You can't even write stuff like:

int    numItems;
printf( "Please tell me how many items you will need at most:" );
scanf( "%d", &numItems );
fpurge( stdin );
int    myIntArray[numItems]; // *** This line won't work. ***

But don't despair: since an array is simply a chunk of memory large enough to hold the number of things you want, there's an alternative solution: You can ask C for a chunk of memory and just pretend it was an array. To do that, you use malloc(). malloc() allocates memory (hence the name) of a certain size and then hands a pointer to it back to you:

#include <stdlib.h>

int main()
{
    int*    myArray = malloc( sizeof(int) * 20 );    // could write int myArray[20]; instead.

    myArray[5] = 42;

    return 0;
}
There are a few things I'll need to explain in the code above: The first thing is probably the malloc part. That's a function defined in the file stdlib.h as something like:
void*    malloc( int numberOfBytes );
Note the asterisk after void. I told you void meant that the function returns nothing. But how can this function return a pointer to nothing? Well, actually void doesn't mean nothing, it means ignore me or unknown. So, while a function returning void returns nothing, a function returning a void* (void-pointer) returns a pointer to an unknown chunk of data.

As you know, C is very picky about having its data types match. If malloc was defined as:

char* malloc( int numberOfBytes );
and we wrote:
char* myCharArray = malloc( 100 );
int*  myIntArray = malloc( 50 );
C would complain that you're trying to assign a char* to a variable with the differing type int* on line 2. If we changed malloc to return an int, it would complain about the char on line 1. The only solution we know so far is that we could have several versions of malloc(), one for each data type, i.e. malloc_int(), malloc_char() etc... That would be a lot of work though. That's why C has a void* (read: void-pointer). When you use void*, C knows that this is simply a chunk of data and it will let you turn it into any other kind of pointer as you please.

Essentially, C knows void* as a special case and just trusts you and just pretends it was an int pointer. But note that C doesn't do any conversion. It is your job to make sure that e.g. the memory an int* points to is large enough to hold an int. C will happily compile nonsense like:

int* myArray = malloc(1);
myArray[0] = 700;
which will overwrite three bytes following the memory you requested and probably cause a nice crash.
Caution! Note that C does not neccessarily clear the memory that you get from malloc(). There may be arbitrary garbage numbers in your array if some app used this memory range before you did, and you are responsible to give your array elements initial values that make sense. That said, some operating systems do clear any memory they give you for security reasons, so don't be surprised if one computer gives you nice and tidy empty arrays and the other doesn't. Heck, for all you know, the "garbage" data you're getting may just be a bunch of zeroes by coincidence.

But now let's get back to our sample code that is supposed to do array resizing:

int*    myArray = malloc( sizeof(int) * 20 );    // could write int myArray[20]; instead.

The sizeof() function in the sample code simply gives you the number of bytes you need to store a particular type. Because, while in our examples I usually pretended an int was 4 bytes in size (or 32 bits), on older computers like the original Macintosh with its 68000 CPU, they were actually only 2 bytes long (or 16 bits), and on more modern 64-bit CPUs, they can even be 8 bytes long (or 64 bits). So, we use sizeof() here to let the compiler tell us how large its ints are, instead of risking a wrong guess and a crash.

Now, the sample code above only creates a 20-element array. However, since you're calling a function to create it and calculating the size for the memory, instead of 20 you could just have a variable in the calculation:

int* myArray = malloc( sizeof(int) * numItems );
The astute observer will notice that I'm not resizing anything here. But we have a dynamic array. If you need to resize an array, create a new one with the new size, copy all your array elements over and then get rid of the old array. How you get rid of the old one?
free( myArray );   // free is defined as "void free( void* pointerToBeFreed );"
It's as easy as that. free() simply tells the operating system that you don't need the memory anymore and that it can reuse it when another part of your program asks for some memory. Once you have called free(), you mustn't use the memory the pointer points to anymore:
char*    myText = malloc( 100 * sizeof(char) );
myText[0] = 'A';    // Fine.
myText[1] = 'B';    // Perfectly valid.
free( myText );
myText[2] = 'C';    // No! This will go BANG!
Just like in our earlier example where we ran off the end of an array, we're stomping on memory that doesn't belong to us anymore. Especially on today's computers, where the operating system may be running several programs at once, you don't know whether your memory hasn't been reused already by the time you assign 'C'. And anyway, why tell the operating system you're done using the memory pointed to by myText and then use it anyway? Compulsive liars aren't popular, particular in operating system cycles.

So, is resizing an array really that hard? Do I really have to move everything over? Well, the computer has to do it anyway, but there's a shorthand for this. It's a function called realloc(). realloc() is defined as:

void* realloc( void* originalPointer, int newSize );
and you use it like:
char*    myText = malloc( 2 * sizeof(char) );
char*    myLargerText = 0;

myText[0] = 'A';    // Fine.
myText[1] = 'B';    // Perfectly valid.

// . . . some more stuff happens here . . .

// We need to store a third char? Make it larger!
myLargerText = realloc( myText, 3 * sizeof(char) );
if( myLargerText != 0 )    // Success! myText has been freed, myLargerText contains our new pointer.
{
    myLargerText[2] = 'C';

    // . . . do something useful with our text here . . .

    free( myLargerText );    // OK, we're done.
    // realloc already freed myText, no need to dispose of that.
}
else
    free( myText );    // OK, we're done.
realloc() simply takes a pointer to some memory and creates a new larger chunk of memory, copies over all the data, free()s the old memory, and then returns a pointer to the new chunk. If there is not enough memory for that, it returns 0 (zero) and doesn't dispose of the old memory.

Of course, you can also use realloc() to make your array smaller. In that case, items at the end of your array will simply be cut off. So, you'll still have to write your own code to move around data if you want to insert or delete array items in the middle.

Further Information: While the operating system will make your memory block smaller, it will not clear the memory after it. So, if you make an array smaller and then access an item beyond the end of it, you may very well still get the data you last wrote there. This is one of the things that makes memory bugs so hard to track down.

Previous | Next

Reader Comments: (RSS Feed)
No comments yet
Comment on this article:
Name:
E-Mail: (not shown on site)
Web Site URL: (optional)
Comment: (plain text only)
Please Enter the following word:
Or E-Mail Uli privately.