Uninitialized pointers and variables: Memory corruption and crashes.

The other day I was bored and just looking at a programming site and someone questioned why C did not auto-initialize non-static pointer variables. While this may be nice for the novice, it actually has pros and cons the way it is. But the fact is, C is closer to being a low-level language than a high level language. It gives the programmer power and with power comes responsibility. When you are irresponsible with programming it is inevitable you may cause issues. And no one is innocent completely, if they programmed any length of time. In fact, they say it takes 10 or more years to fully master C. Programming involves risks. Period. Not only that, what if you want an array of strings? In C that would be char* stringArr[] or put another way: char** stringArr; The compiler cannot come up with some random number and then initialize that many elements. Sure, it could initialize stringArr to NULL but if it were to try to allocate more it’d be in trouble (how many elements do you need?). That’s one reason for the lack of it initializing pointers.

Back to the discussion on the site though. What shocked me is what someone said. Well, two things they said; initially a remark, then a reply to someone who suggested they were wrong. More bizarre was no one corrected him again, and it was a VERY incorrect suggestion of this person. I will not name the person or user name; my goal is not to ridicule anyone but instead educate others what memory corruption can cause and why it is very bad to access unknown memory, invalid memory, memory past an array bounds, and uninitialized variables.

“imho any attempt to access the value of an uninitialized variable is a bug. it doesn’t matter whether it’s zero or random, you shouldn’t be trying to read it if you didn’t explicitly set it to something”

First, there is something wrong here already. Yes, he’s right you should not access a variable that is uninitialized. However, how do you know if it’s uninitialized or not ? With regular variables, you’ll likely get unreliable results and unless you’re expecting a set of answers you cannot tell (and if it’s user input for example, then you have even more things to track down). But if we’re talking about pointers and thus a location in memory – and we are – then it’s very simple : you check if it’s valid. Now, how do you check if it’s valid? You surely cannot guess what the random value is, so you have one choice : check for NULL. And checking for NULL is not a bug but a sanity check and a very important one (unless you actually did not initialize it and it’s not a static variable and you foolishly ignore any possible warnings your compiler gives you when it can).  We’ll get to what happens when a pointer is pointing to a random location – which has serious consequences whether it’s part of your program or not. In fact, you would be lucky if it is NOT part of your program’s memory space right then and there.

Someone responded to him saying it is much easier and better if it’s NULL.

He’s right. In fact, if it IS NULL, then it typically means, I might add, that it’s been initialized; either by being static or yes, you initialized it yourself (or luck)!

But the first person responds with the following :

“hmm it depends on your perspective. Personally I would think that garbage was a more clear indicator that I forgot to set the variable, rather than NULL which I might have done on purpose…”

Ouch! First, how do you know if it’s garbage or not ? It could be any number of things and if your so-called check is incorrect and then you dereference it, well you’re doing something wrong. Even with primitive types, you’ll get unreliable results at best. With pointers, it’s disastrous!

  1. Primitive variables : If you try to get the variable, or use it in some way – the results will be unreliable at best. IF it is the expected result, and it was uninitialized then it was nothing but LUCK. And it is still broken and that luck could be better worded as NOT lucky because then you won’t know you have a problem and find the source.
  2. Pointers: The point of a pointer is to point to a memory location where a variable (having the same type the pointer has) is.

Below is an example of what a pointer does. Pointers are very useful to avoid copying large data structures to functions, functions to functions, and it’s a way to have a function modify data by dereferencing the pointer and setting where it points to, to the value you specify.

 

#include <stdio.h>
int main() {
int number;
int* numPtr = &number;
printf("%d %d", number, *numPtr);
return 0;
}

Here is a rather simple example. It’s also rather broken in the sense that the variable number is not initialized. So, here we will get a random number twice: once by number and once by accessing number by dereferencing numPtr (which points to number), thereby printing number. But since an integer is a normal variable, it typically will hold a random value. But the idea in that example is to show how basic pointers work (it can get far more complicated).

Now, to show how a bad address will effect the program, an example that will almost assuredly crash: we will cast and assign an address to a pointer of type char*. Then try to print it.

#include <stdio.h>
int main() {
char* string = (char *)0xffff;
printf("%s", string);
return 0;
}

First, an explanation of the first line inside main:

We start by declaring string as a pointer to a char. The other part is pointing it to 0xFFFF. This may seem odd to no one who has seen this. And it may make some wonder why I’m doing such an odd thing. 0xFFFF is an address. The prefix 0x means the number is in hex; we could have done it in octal or even decimal as well but addresses are generally referred to in hexadecimal. Lastly, the cast you see (char *) means cast it to the type we are assigning this value to – a pointer to char (hence char *). This way, we can point a char* to an address. Casting of this form is often used for direct I/O with an address (say for a device driver). But in this case, we’re doing it merely to set it to point to a value that is out of our own address space! When we try to read what is there (or indeed modify) we will cause a segmentation fault. As a general rule, when you access memory of the programs address space, you will cause this and if your limits are set up to allow it, a file called a core will be dropped in the program’s current working directory. This core (often called a coredump too) is a dump of the program’s stack as it was when the program dumped core. It will also, naturally, have memory the program was using. Not allocated, but rather you can get the values of a variable, for example. This is used to help debug crashes – but if you actually overwrite data inside your address space, it may be long past when it crashes and the stack trace may actually be garbage. This is why it’s better to have a memory accessing error of this sort cause a crash immediately!

About the last point: let’s assume your address space is fairly large. What will happen if you overwrite past an arrays bounds? You will likely overwrite a neighbouring block of memory! This is an ugly situation as I described above. Essentially, if you corrupt your memory, you not only will eventually get a crash, you will get invalid results at best. And when the crash does happen – you will have a grim idea of where or what caused it as generally, the program is past that point of the code. And on top of that, the stack may be completely trashed and you will see nothing except ?’s and other garbage. There’s an article out there (or used to be) about possibly recovering from such a stack, but rest assured it’s not nearly as easy as if the stack was correct!

The problem, then, with the poster’s idea of it being a better indication of uninitialized variable if it’s garbage, is basically a terrible mistake. In summary:

  • Accessing or assigning something to an address outside of your memory will cause an immediate crash.
  • Overwriting an arrays boundaries has such unpredictable results, that to try to show it with various programs and with the same results, is ridiculous – the results are random of nature!

With array index errors, you usually will modify your own memory (unless you go WAY out of bounds). This can cause weird and strange, hard to track down issues, including:

  • Invalid pointers (which may cause freeing the memory allocated for the pointer to find invalid locations, thereby corrupting more memory and if nothing else incorrect results)
  • Variables may have unexpected content.
  • Strings may be missing the terminating character which allow operations to run off the end of the string, possibly trashing more memory.
  • You may find terminating characters in a neighbouring memory block.
  • You will eventually crash and discover the programs stack is trashed.

Similar results can be found with bad pointers (not allocated, or you free the allocation and then try to access it) and in general anything that is not predictable. In other words, you should always initialize your variables before you use them and you should not rely on some random value as to determine if a variable is valid.