Programming

Computer programming is both an art and a science. On one hand, you have design decisions to make. But on the other, it comes down to pure logic and decision making based on many different outcomes. It is not something everyone would want to do or find exciting or fun. However, it’s a part of computing. Indeed, if you did not have a computer program, or some interpreter (which was programmed) you would have to design such to make a computer do anything. This is what programming is: telling a whatever – in this case machines – to do X, Y or Z and when.

Finding C Style typecasts During C++ Compiling

(As of 2013/05/11 there is a quick addendum at the end of the post)

This is a very quick post because it’s a very simple update to two posts from quite some time ago. I had discussed how to find C style casts in C++ source code and at the time of the first post I was thinking in casting to pointers. Then I made another post with to note that fact and a possible way to find all.

Well, there is a much better way. What is it ? With the g++ compiler you let it show you, that’s the way. How do you do it ? Use the option to g++:

-Wold-style-cast

and recompile all object files (if you use make you may very well have a make target called ‘clean’ then you could just run ‘make clean’ and then ‘make’).

g++ will then show you where every old style cast is used in the project’s source tree (that it encounters). Now it’s simply a matter of determining which new cast type you should be using for the specific cast, fix and recompile again. It’s that simple.

Addendum: Okay, to be truthful it is not entirely ‘that simple’. Yes, this is how to find the old style casts in YOUR source code but it should be noted that there are some system calls that are implemented as #define macros and do use the old style casts. There’s nothing you can (or should do) about those. Examples that come to mind are:

FD_* macros used for the select(2) system call.

WEXITSTATUS (and most likely the other related) macros for the wait(2) system call.

That all said the option is useful to use at times if you want to be sure your own source code does not use old style type casts.

 

The Enumeration Casting Problem

The C and C++ keyword enum is a useful feature in the language(s). What it allows, is, to be blunt, an enumeration. A common example is, a group of constants, that are related. For example, you might have an enum of colors – something like :

enum Color {
RED,
GREEN,
BLUE
};

By default, the first value is 0, the next is the previous value + 1. However, there’s some variations. First, you can tell the compiler the value of specific constants. For instance, if you want RED to start at 1, then you would do :

enum Color {
RED = 1,
GREEN,
BLUE
};

You can also declare something to be of type Color (after the enum Color has been defined; though see later about enum classes [feature of C++11]). There’s a common issue though, which I’ll get to in a bit. Firstly, know that there is also the anonymous enum. That generally means something without a name (clever name, I’m sure). What does it look like?

enum {
RED,
GREEN,
BLUE
};

And that concept is useful (though not required) in solving the issue I referred to. More on that later.

Now, in all the above cases, you can refer to the constant (by name, e.g., GREEN) by simply ‘GREEN’. Problem however, is if you have an enum of another type but also could use a name like GREEN. Now, that may not appear to be an issue, but it is. The reason is simple: the variable is of type Color, but that does not mean it is in scope Color. That is to say, it is very much like having two variables with the same name. For instance, an unsigned int called ‘i’ and an long int called ‘i’ in the same scope. The variables cannot be declared more than once by the same name in the same scope (you will get a redeclaration error).

So, C++11 added enum classes (and consequently, enum structs). Recall that a structure access modes are public by default and classes are not (hence the use of the friend declaration in C++). Otherwise, they are the same. So, how can you fix the name clashing?

You use enum classes! And just like a class/struct, in C++, you can also inherit (in the case of enums, as far as I recall, it is of integral types). That means that you can make an enum that inherits from long long (as opposed to the default int).

Here’s how :

enum class Color : long long {
RED = 1,
GREEN,
BLUE
};

However, how do you access it? It’s some what of a ‘static’ variable in a class. What that means, is, you specify the scope, in this case the scope ‘Color’. So, for example, to make a Color variable BLUE in the above enum, you would do this :

Color c = Color::BLUE;

Now, back to where I referred to a problem that is commonly encountered. One example, is, the switch statement, and in particular, case blocks. I won’t discuss that one, but when you do encounter it, you can get around it the same way as I will explain. It will basically be a type conversion error. Sure, you can add a cast, but that seems so wrong in a more type strict language. After all, this is C++, and not C – you don’t cast nearly as much, and ideally you don’t cast at all (obviously, that isn’t always possible, say socket handling via the BSD Socket APIs but its best to try to avoid it). Is there really any harm in it, for an enum? Not really for a basic, but it also isn’t necessary (in that case or enum classes).

The specific problem I had, is for example, an enum sort of like this :

enum class Flags {
DEBUG,
LOG
};

Now, although I could prefix DEBUG and LOG with something, I think it makes more sense to have intention fairly clear (and the names [not in this example] were relevant to multiple groups of flag types). So, if I had another enum, or some thing else with the name DEBUG or LOG, then I would have name clashes. So, my attempt to fix it (above) would allow me to specify Flags::DEBUG. However, when you make it a class, it is, well, a class. So, we’ll assume that we have a function called set_bit that takes one argument: an unsigned short. What happens if we try to pass a Flags::LOG to it? It won’t work, because its not actually an unsigned short. To that end, a ‘case’ statement in a switch block, of that type, won’t work either because switch/case expects integral types. So, what can be done to work around this?

Maybe there is a better way, and sure you could just have them be individual variables, but what I found as a nice way, is the following set up:

Use a namespace by the name Flags. Inside that namespace, you can put an anonymous enum (or not anonymous if you want) in with the variables. This will isolate the variable to namespace Flags.
Thus, you can now do :

namespace Flags {
enum {
DEBUG,
LOG
};
}

and indeed pass Flags::DEBUG or Flags::LOG to the set_bit function.

Only thing to be aware of, is that namespaces do have one feature that could be an issue. Its not really hard to fix, though, and issue is relatively speaking (loosely defined). Namespaces can be in multiple files; that is, you can define enums in the namespace in one file, and in another file do the same thing. Basically, though, if you have a risk of that, you can use the same idea. For instance, nesting namespaces (e.g., namespace somenamespace inside another, and inside somenamespace you have the enum).

The real benefit of this idea though, is that it allows you to make more use out of enums (when what you’re doing would make use of enums), without having to use casts or worry about name clashes. And of course, you can also just use a separate variables inside a namespace. It really is up to style and the actual reasons for needing the variables. In my case, I created a dynamically sized set of bits (read : std::bitset but not having to have the size at compile time. And no, it has nothing to do with a vector of bools) and I was tired of having code that looked like a file of #define’s (even though I didn’t use #define, and instead used const’s, the fact is it was individual variables and a long list with prefix’s to each variable). I didn’t like that. This is not C, and anyway C has const too nowadays. So, I wanted to clean things up, and in my case, this was quite useful.

C++11: std::unique_ptr, raw pointers, and containers

Last Updated on 2012/07/13.

Since I now have my own RPM repository on two servers each of 100Mbps, I’ve removed the /rpmbuild directory here on my server. That means you no longer need to build the backports of GCC 4.7.0 yourself but can instead just use the Xexyl RPM repository. I documented it here. Now, that that’s cleared up, it’s been a while that GCC 4.7.0 is out. Here is the updated article, along with a mistake I made fixed:

I’ll say that GCC 4.7.0 is more restrictive and THAT is NOT BAD but actually GOOD. It not only will tell you about more possible issues in your code, it can reveal potential bugs in your program (that end up being compile time errors or warnings in other cases). That is good. Never ignore warnings (obviously errors you can’t). Okay, you can ignore them when say compiling a huge program (like GCC), but those are tested thoroughly and that even runs a test suite after compiling. The reason you don’t want to ignore them at compile time is because if they crop up later in the runtime, then you may be hard pressed to know where the real problem is (I’ve discussed this before: see my article about memory corruption).

Among one of the new features of C++11 is the major improvement to smart pointers. The unique_ptr recognize and accepts move semantics but it does not accept or recognize copy semantics. Some constructors will automatically be default deleted because of this. But that’s not bad, if you think about it: if you have unique ownership, then you aren’t very likely to have one ‘owner’ delete (as opposed to C’s malloc and free functions) it, only to use it later and dump core (and that would be the best option, as I discussed before in the other article).

I’m not about to begin explaining move semantics or perfect forwarding: there’s enough documents out there about it, including the original proposal/draft. Instead, I’ll discuss the smart pointer known as unique_ptr (in namespace std), and unique_ptr’s in standard containers (maps, multimaps, their unordered counterparts [also new feature of C++11], and vectors).

A word about raw pointers. Since unique_ptr’s only allow one owner, you have do have to take care of the fact that you can move it to a new owner, or not. But, what if you want to, say, keep its owner (say, have it in a map or some container). You want the container to be the owner. But then you want to (or need) to modify something in a function or class function. Or what if you need the pointer in more than one location? For example, in the project I referred to in my latest post (before this one), it has a map of ‘characters’ (it’s a rewrite of a type of game that is the predecessor to today’s MMORPGS). This list/map/whatever owns the pointers, and when it is removed from the map, its destructor is called. However, it would be less efficient (certainly more time consuming for the engine) to go through (potentially) the entire list (say, over 2000) just to find a character or item in a location that is already known (we’ll call it a ‘room’, e.g., to look at a character there). In short, it is some times convenient to have a list (e.g., a vector) of raw pointers. The class that has this list does not necessarily own the memory, nor does it have to allocate, or do anything except remove and add the pointer to a list. When its removed from the object, the real piece of memory still exists, just the object no longer has direct access.

So, anyone who has worked with smart pointers would know there is a way to get the raw pointer. A smart pointer has the get() member function. Now, note that the variable in question, is not (for example) a Object*. Instead, it is std::unique_ptr<Object>. What does this have to do with anything? It means as much as this. Normally you  would dereference the pointer to a class or structure, by, say the -> operator, e.g., character->get(). However, it isn’t the way in this case. Remember that the variable is not a pointer. Rather, its std::unique_ptr holding a pointer to an instance of an object (say of type Object). In other words, it holds an Object*. That means we don’t use operator-> but instead the dot operator. What this basically means, is we would do something like this (if we had an unique_ptr and we wanted a raw pointer to pass to some function) :

Object* o = uptr.get();

After that, uptr is the unique_ptr (and still has ownership), o is the raw pointer, and *o is the actual object (recall that * is the dereference operator in this context and to dereference a pointer means to access the actual place in memory that the pointer points to).

Does this change when its a container of std::unique_ptr’s? If you have an iterator for example, then absolutely yes. Why? Well, before you have the actual std::unique_ptr, right? It’s not actually a pointer itself, so you use the dot operator. But if you’re iterating through a vector (for example) then you would use operator-> ( or the equilvalent way: (*iter).member ). This is because again the iterator is not the actual object but (essentially, certainly if you think in terms of functionality) a pointer. And what happens when you want to access a member of a pointer to some structure or class? You dereference it first. To add to that, and possibly confuse matters, if it’s a tuple (of some kind) then you have the first and second members. So if you have a std::pair<int, std::unique_ptr<Object> > called mypair, then mypair.first is an int, and mypair.second is an unique_ptr. So if you’re iterating through a container that holds that pair (say, a map) then you access it more like this :

iter->second.get()

or

(*iter).second.get()

Basically, iter is the iterator, you dereference iter and access the member second (which is in this case an unique_ptr) and then you access the get function on that object. That function will return the raw pointer.

There’s two other new smart pointer types: std::shared_ptr (think of reference counting) and std::weak_ptr (which is used along with shared_ptr’s). There’s many other nice additions. That includes threading support, new usage of auto keyword (yes, folks, it existed for decades despite what some incorrectly think), a new way to work with time (system time, durations, etc.) and much much more. I would suggest you get used to the new standard if you use C++, because it has a lot of nice additions.

To close, I’ll go back to something I said I won’t go into detail with – the move semantics. One thing I will show is an example used in the proposal. Due to variadic templates, we have perfect forwarding. This allows something really cool. Let’s say you don’t know how many arguments will be passed to a constructor. Then, say, you want to create a lot of unique_ptr’s. Well, you could write out the more explicit stuff, or you can do what I do to make code clean. That is to say, have a header file that includes the libstdc++ header file that makes std::unique_ptr (and other things) available and add a function that does perfect forwarding to the constructor. The file is simply this :

#ifndef __MEMORY__
#define __MEMORY__
#include <memory>
template<typename T, typename …Args>
std::unique_ptr<T>
make_unique(Args&&… args)
{
return std::unique_ptr<T>(new T(std::forward<Args>(args)…));
};
#endif /* __MEMORY__ */

To use, simply do something like :

auto obj = make_unique<Type>(arg1, argN, ...);

Yes, that’s what auto now does: it automatically deduces the type as best it can. If it can’t, the compiler will tell you so and you’ll have to fix it.

C / C++ Type Casts Continued

In a fairly recent post, I discussed how to find C style casts (as opposed to the C++ style which is intentionally more ugly than the already ugly C style) in a source file. But as I noted some days after writing it, it had some limitations. In my head at the time it made perfect sense because I was actually looking for Pointer to type casts. But that’s fairly limited, as there’s other types of casts that exist and are perfectly valid.

So, I’ll now discuss a few things related to the previous post.

  • Type Casting
  • C/C++ type casting
  • Workaround to find all C type casts

(The irony is: C++ style casts that are more ugly are probably far easier to find than their C counter parts. The reason is its more restrictive – see section on C/C++ type casting)

Type Casts

So, firstly, what IS a type cast? It’s very simply a way to interpret one data type as another. In C it looks and acts very much like a function. In C++ is looks more like a template function. In any case, they should usually be avoided when possible. But the keywords are when possible: it’s not always possible. One such time comes to mind: socket handling uses some casting (side note: I noticed a couple mistakes in the binding IPv4/IPv6 socket code I showed a while back and I just fixed those [includes a memory leak being plugged up]).

C/C++ Type Casting

A type cast in C is in the form of:

(type)expression

type can be a pointer to type in which case it would be (type*) or (type *). Note also that if expression (which could be a variable – or not) is NOT a pointer but you’re casting it TO then you should give the cast an address instead – that is to say, &expression.

Now, in C++ they look differently. Yes, you can use C style, but its not recommended as its less type safe, and that is a bad thing! Imagine casting type A to type B when they are 100% unrelated and have different types of data, different members and so on. It’s just not as safe. Sure, it might ‘work’ but that does not mean it really is working properly or as best as it can. Also keep in mind undefined behaviour and implementation defined behaviour. Those two terms can bite you hard when you are vulnerable! In short, don’t rely on C style casts in C++ programs because there’s less protection: it’ll cast whatever it is to whatever you cast it to without a warning. The fact that more recent versions of gcc and g++ warn you more than they used to is good: its best to fix a problem at compile time than at runtime (you don’t want a corrupted stack example – that gives you barely anything to work with).

Now then, C++ type casts have the same form, only there’s more than one kind of cast.

reinterpret_cast<Type>(expression)

dynamic_cast<Type>(expression)

static_cast<Type>(expression)

const_cast<Type>(expression)

Now, I could explain these. But because its been done far better than I could do elsewhere, and it being an absolutely excellent resource for C programmers who want to shift to C++, I’ll refer you to The C++ Annotations. HIGHLY recommended reading! In particular, for the C++ casting types, see here:  The C++ Annotations’ Chapter 3.5: A new syntax for casts.

Workaround to find all C type casts

Okay, so now that I maybe have explained things better, let me return to the original point. Because a C type cast is very much like a function – both in functionality and look of the call, it is actually difficult to search for them and only them IF there isn’t a restricted search. By restricted, I mean for example the previous command I showed: type to pointer casts are simple to find. If you know you only used type to pointer casts, then you’re fine with the post I referred to earlier in this post.

If you know its only types like int, char, or double (or float) then its also easy. You can just specifically do something like:

grep "(int)" file.cpp

However, what if you don’t know all the casts you might have. You could have some pointers and the original command would find those. But what if others exist but you don’t even know the basic type? For instance, you might have some (char) and (char*) as well as (MyType) and (MyType*) – yet not realize those are there.

You basically have to either:

  • Know the types (be very familiar with the code); or
  • Realize that you may have to look for any function call (as in syntax of). This would indeed fine casts. It’ll also find other things such as if (…) and while (…). You could choose not to show those, but then what if you have a cast in those (a common thing is something like switch((int)*input) which in essence means you need (if you want to use C++ style casts) look for ALL code that looks like function calls.

You can however make it a bit nicer. Firstly, you shouldn’t have a type that is only one character. You also shouldn’t have types that begin with a number or an underscore (in fact, depending on your compiler that isn’t even possible as per the standards – though you can have variables that start with _’s you shouldn’t generally do that as they are considered reserved). The former – a number – cannot begin the name of anything in C or C++ (and for good reasons).

So, with those facts being said, we can do a few things :

  • Make sure it starts with a lower case or upper case letter. After that a number, an underscore or any letter as well as certain characters (say *) can follow. It must be at least two characters also.
  • And the above must be surrounded by parentheses.

What we come up with is the following :

 

grep -En "(\([a-zA-Z][a-zA-Z_*&]+\))" *.c

The above says search for the pattern (between the “‘s) in any file that ends with .c in the current working directory.

Basically, we start out by the open parenthesis that is required before the rest. Then, we start a backreference (this is important: try without the \( and the \) if you’re curious what you’ll match instead. In fact, I suggest it if you’re unfamiliar: it’s quite different results because you’re matching more). Then, we say find (after the open parenthesis) a letter a-z or A-Z. The [ and the ] are also important. That’s allowing ranges – i.e., [a-zA-Z] means match ONE character that is in the range a-z OR A-Z.

Next, recall that it shouldn’t be one letter for the type. So, we search for:

  • any letter of any case – that’s the a-zA-Z part
  • an underscore – the _ part
  • a * – for pointer casts
  • a & – this is for the C++ concept of references. In C & has two meanings (unless I’m forgetting some): address of (e.g., &var) and bitwise AND (e.g., 1 & 10).

As for the + after bracket ( the ] ), that means match at LEAST ONE of the characters we searched for. With the two expressions, it means display any line that has two or more characters between ( and ). Yes, this does mean you could potentially see function calls such as :

 

fclose(fp);

How do you prevent printing that? Well perhaps the easiest way (yet less efficient way [especially if you have many you want to mask]) :
Pipe the command to another grep with the option -v, e.g. :

grep -En "(\([a-zA-Z][a-zA-Z_*&]+\))" *.c | grep -v "fclose"

Two other commands worth mentioning while I’m discussing regular expressions. Lookup ‘awk’ and also ‘sed’. They are incredibly powerful. In fact, you could call them grep with additional features. For example, the following three are equivalent :

sed -n '/test/p' *cpp
grep test *cpp
awk '/test/' *cpp

Why would you use sed or awk over grep? You might want to actually edit the output in place or the file in place. Or you might want to only change a word/pattern if another word/pattern is found on the same line. Again, sed is one such solution. Or you might want to only print out certain fields (eg with awk) as opposed to the whole line.

Note that there’s entire books on awk and sed, and some would say they’re a complete language even, so I’m not even going to attempt to explain them, but I do use them quite a lot. Very useful utilities.

Search for C Style Casts in One Command

(Update 2011/01/08: I just realized this. When I was doing this, it was related to pointer casts. Therefore, I think I should point out this is more for casting something to a pointer of some type. The other – a basic or primitive type is a bit harder because other things can match the pattern as its less restrictive. Still, it is possible to work around that in various ways).

During development, if you ever have or had to work with C code, you’re likely to find the old style C type casts in a program’s source code. This is code that looks like :

(int*) 0

What does that mean ? It means interpret 0 – disregarding what it really is – as a pointer to an int. What is so wrong with this ? Well, its not type safe. There’s many reasons this is bad, and while it may be ‘OK’ in a C program, it still has risks involved. This is why C++ has a new style for casts. I believe its even made so ugly because its an ugly operation so it makes one (hopefully) think twice before using casts. In most cases you should not need casts in C++ because of the way its designed.

However, assuming you need to cast, you should use the C++ style casts. They are more safe, can check types (say, dynamic_cast) and in general much safer overall and therefore more sane. That’s not to say you _should_ use them, but if you have to, they are a better option than the C style.

But, that being said, a lot of code out there is full of the old style. If you’re working with C++ you should use the new type. However, most would say its hard to spot. However, a basic regexp (regular expressions are very powerful indeed) can find ALL casts in all files you target that are of C style.

The command below will show you all files with all C style casts, regardless of spaces between the type and the *, and regardless of spaces after the close parenthesis. If its any surprise, yes, this is for Unix and Linux users. The command, grep comes to your rescue. Assume that you have a bunch of files ending with .cpp that you would like to check for C style casts, in the current directory. Then all you have to do is :

grep "(.*\*)" *.cpp

Do I have to explain it ? Okay, here we go :

grep is the command. When you quote something in the shell it can have different meanings. There’s also different types of quotes (single quotes, back quotes [the grave accent - `], etc.). They all have different meanings and its best to simply try looking at the particular shell’s man page (manual). So, in this case what we’re actually saying is :

  • The first ( is the opening brace of the cast style itself.
  • The .* means we want any character and that pattern is matched repeatedly (simplified this explanation – there’s other articles and sites entirely dedicated to regexp’s s they can become very complicated!).
  • The \* is saying we literally want to match a * (instead of using it as a pattern matching character). That would be the pointer notation in the cast.
  • The ) closes the cast.
  • The *.cpp means grep should look in every file with a name that ends with .cpp (in the current directory).

An example file that looks like :

(char *) 0
(char *)0
(char*)0
(char*) 0
(int **) 0

And a run of the above command on that file :

grep "(.*\*)" test.txt
(char *) 0
(char *)0
(char*)0
(char*) 0
(int **) 0

One more quick note about the "’s even though I already said to check the manual. If we did not have them, you would get an error that looks like : bash: syntax error near unexpected token `(‘

In other words, they are necessary.

In any case, it really is that simple to find C style casts. I imagine there could be more complicated casts, but if you understand regular expressions, then you can find them fairly easily (or at least as not as hard as it may seem).