2015/03/07: Reword a few things, add a footnote (about directly accessing a hardware location without a variable) and change formatting of source code a bit:
No background/foreground colouring but instead simply text style; change case of some variables (because I hate camel case and in general variables are lower case, as far as I am concerned); make pointer declarations more correct, i.e. the C way instead of the C++ way (which I was trying to adopt because I was trying to be okay with C++ too, something I mostly failed at): the variable is the pointer to the type which means the type isn’t a which means the * should be at the variable; fixed a description that was incorrect (a blunder at the time of writing) at worst and worded poorly at best.
The other day I was bored and just looking at a programming site and someone questioned why C did not auto-initialize non-static pointer variables. While this may be nice for the novice, it actually has pros and cons the way it is. But the fact is, C is closer to being a low-level language than a high level language. It gives the programmer power and with power comes responsibility. When you are irresponsible with programming it is inevitable you may cause issues. And no one is innocent completely, if they programmed any length of time. In fact, they say it takes 10 or more years to fully master C. Programming involves risks. Period. Not only that, what if you want an array of strings? In C that would be char* stringArr or put another way: char** stringArr; The compiler cannot come up with some random number and then initialize that many elements. Sure, it could initialize stringArr to NULL but if it were to try to allocate more it’d be in trouble (how many elements do you need?). That’s one reason for the lack of it initializing pointers.
Back to the discussion on the site though. What shocked me is what someone said. Well, two things they said; initially a remark, then a reply to someone who suggested they were wrong. More bizarre was no one corrected him again, and it was a VERY incorrect suggestion of this person. I will not name the person or user name; my goal is not to ridicule anyone but instead educate others what memory corruption can cause and why it is very bad to access unknown memory, invalid memory, memory past an array bounds, and uninitialized variables.
“imho any attempt to access the value of an uninitialized variable is a bug. it doesn’t matter whether it’s zero or random, you shouldn’t be trying to read it if you didn’t explicitly set it to something”
First, there is something wrong here already. Yes, he’s right you should not access a variable that is uninitialized. However, how do you know if it’s uninitialized or not ? With regular variables, you’ll likely get unreliable results and unless you’re expecting a set of answers you cannot tell (and if it’s user input for example, then you have even more things to track down). But if we’re talking about pointers and thus a location in memory – and we are – then it’s very simple : you check if it’s valid. Now, how do you check if it’s valid? You surely cannot guess what the random value is, so you have one choice : check for NULL. And checking for NULL is not a bug but a sanity check and a very important one (unless you actually did not initialize it and it’s not a static variable and you foolishly ignore any possible warnings your compiler gives you when it can). We’ll get to what happens when a pointer is pointing to a random location – which has serious consequences whether it’s part of your program or not. In fact, you would be lucky if it is NOT part of your program’s memory space right then and there.
Someone responded to him saying it is much easier and better if it’s NULL.
He’s right. In fact, if it IS NULL, then it typically means, I might add, that it’s been initialized; either by being static or yes, you initialized it yourself (or bad luck because it masks the error)!
But the first person responds with the following :
“hmm it depends on your perspective. Personally I would think that garbage was a more clear indicator that I forgot to set the variable, rather than NULL which I might have done on purpose…”
Ouch! First, how do you know if it’s garbage or not ? It could be any number of things and if your so-called check is incorrect and then you dereference it (keep in mind that testing if a pointer for validity will only return 0 if it is pointing to nowhere in the first place!), well you’re doing something wrong. Even with primitive types, you’ll get unreliable results at best. With pointers, it’s disastrous!
- Primitive variables : If you try to get the variable, or use it in some way – the results will be unreliable at best. IF it is the expected result, and it was uninitialized then it was nothing but LUCK. And it is still broken and that luck could be better worded as NOT lucky because then you won’t know you have a problem and find the source.
- Pointers: The point of a pointer is to point to a memory location where a variable (having the same type the pointer has) is.
Below is an example of what a pointer does. Pointers are very useful to avoid copying large data structures to functions, functions to functions, and it’s a way to have a function modify data by dereferencing the pointer and setting where it points to, to the value you specify.
int *nptr = &number;
printf(“%d %d”, number, *nptr);
Here is a rather simple example. It’s also rather broken in the sense that the variable number is not initialized. So, here we will get a random number twice: once by number and once by accessing number by dereferencing nptr (which points to number), thereby printing number. But since an integer is a normal variable, it typically will hold a random value. But the idea in that example is to show how basic pointers work (it can get far more complicated).
Now, to show how a bad address will effect the program, an example that will almost assuredly crash: we will cast and assign an address to a pointer to char and then try to print it.
char *string = (char *)0xffff;
First, an explanation of the first line inside main:
We start by declaring string as a pointer to a char. The other part is pointing it to 0xFFFF. This may seem odd to no one who has seen this. And it may make some wonder why I’m doing such an odd thing. 0xFFFF is an address. The prefix 0x means the number is in hex; we could have done it in octal or even decimal as well but addresses are generally referred to in hexadecimal. Lastly, the cast you see (char *) means cast it to the type we are assigning this value to – a pointer to char (hence char *). This way, we can point a char* to an address. Casting of this form is often used for direct I/O with an address (say for a device driver). But in this case, we’re doing it merely to set it to point to a value that is out of our own address space! When we try to read what is there (or indeed modify) we will cause a segmentation fault. As a general rule, when you access memory of the programs address space, you will cause this and if your limits are set up to allow it, a file called a core will be dropped in the program’s current working directory. This core (often called a coredump too) is a dump of the program’s stack as it was when the program dumped core. It will also, naturally, have memory the program was using. Not allocated, but rather you can get the values of a variable, for example. This is used to help debug crashes – but if you actually overwrite data inside your address space, it may be long past when it crashes and the stack trace may actually be garbage. This is why it’s better to have a memory accessing error of this sort cause a crash immediately!
About the last point: let’s assume your address space is fairly large. What will happen if you overwrite past an arrays bounds? You will likely overwrite a neighbouring block of memory! This is an ugly situation as I described above. Essentially, if you corrupt your memory, you not only will eventually get a crash, you will get invalid results at best. And when the crash does happen – you will have a grim idea of where or what caused it as generally, the program is past that point of the code. And on top of that, the stack may be completely trashed and you will see nothing except ?’s and other garbage. There’s an article out there (or used to be) about possibly recovering from such a stack, but rest assured it’s not nearly as easy as if the stack was correct!
The problem, then, with the poster’s idea of it being a better indication of uninitialized variable if it’s garbage, is basically a terrible mistake. In summary:
- Accessing or assigning something to an address outside of your memory will cause an immediate crash.
- Overwriting an arrays boundaries has such unpredictable results, that to try to show it with various programs and with the same results, is ridiculous – the results are random of nature!
With array index errors, you usually will modify your own memory (unless you go WAY out of bounds). This can cause weird and strange, hard to track down issues, including:
- Invalid pointers (which may cause freeing the memory allocated for the pointer to find invalid locations, thereby corrupting more memory and if nothing else incorrect results)
- Variables may have unexpected content.
- Strings may be missing the terminating character which allow operations to run off the end of the string, possibly trashing more memory.
- You may find terminating characters in a neighbouring memory block.
- You will eventually crash and discover the programs stack is trashed.
Similar results can be found with bad pointers (not allocated, or you free the allocation and then try to access it) and in general anything that is not predictable. In other words, you should always initialize your variables before you use them and you should not rely on some random value as to determine if a variable is valid.
 A better example (for indirection on a location in memory, without a pointer to some type [i.e. through a variable]), and by better I mean more correct, is this: how do you store a value (say an int) at a certain hardware address? Can you? Should you? You can but you have to use a cast. You would only do it (i.e. you shouldn’t in normal conditions, and if you do you would crash the program) in a device driver (or otherwise not in userspace [It might be that you can’t always do it in kernel space, either, but I’m not certain on that]). Put another way, if you have an int x and it is at location 1024 (how do you know this? that is the question and that is why you can’t do it without a cast; indeed, indirection is allowed only on expressions that evaluate to a pointer of some type), then can’t you store 255 at 1024 directly? Not without a cast you can’t. So this is how you would do it:
int *(int *)1024 = 255;
.. which casts 1024 to be a pointer to an int (int *) and then assigning the value 255 through indirection (on the pointer).