2010-08-27

unsigned and size_t are hard.

This is a computer language story. So you are not interested in that, I recommend to go to next.

C++ has a type 'unsigned' X, e.g., 'unsigned int.' For example, a minus index of an array usually doesn't make sense, therefore this type is used for that. This type is good for bit array storage, but using an unsigned int instead of an int to gain one more bit is almost never a good idea. (*) Especially combination with implicit conversion makes this hard. I think this unsigned number is not intuitive when the computation result is minus, it still stays a positive number.

For example, unsigned int -1 is usually equal to 4294967295 on 32bit machine.  This is depends on how an integer number is represented in a computer. Even one who knows this internal representation wrote a code on 32bit environment, sometimes it doesn't work on a 64bit machine. For example, he/she assumes size_t and unsigned int are the same type, and uses -1 as an illegal value.

The following code usually doesn't work on a 64bit machine, but works on a 32bit machine.
---
#include <iostream>
#include <vector>

void foo(std::vector< int > & vec, size_t idx){
    if(idx == size_t(-1)){
        std::cout << "Illegal index" << std::endl;
        return;
    }
    std::cout << "OK! accessing a vector with idx = " << idx << std::endl;
    // vec[idx] = ...
}

int main()
{
    std::vector< int > vec;
    unsigned int idx = -1;      // illegal index
    foo(vec, idx);

    unsigned int uint_minus_1(-1);
    size_t   size_t_minus_1(-1);
    size_t   size_t_casted = static_cast< size_t >(uint_minus_1);

    std::cout << "(unsigned int)(-1)  = " << uint_minus_1   << std::endl;
    std::cout << "size_t(-1)          = " << size_t_minus_1 << std::endl;
    std::cout << "size_t(-1) (casted) = " << size_t_casted  << std::endl;
}
---
The result is as following. 
---
nvlp[16]bash % ./unsigned_fail
OK! accessing a vector with idx = 4294967295
(unsigned int)(-1)  = 4294967295
size_t(-1)          = 18446744073709551615
size_t(-1) (casted) = 4294967295
---

First of all, assign -1 to unsigned type is a problem. Also, implicit conversion makes invisible the problem. In C++ language, -1 has a different value according to the type. I think this is very difficult. Recent compiles might tell us this as a warning. I watch this warning since this is a potential error. One of my friend told me, "unsigned is evil." I also try to avoid unsigned type.

If you learn computer architecture, it sounds natural that -1 is equal to 4294967295. But, nowadays I try to think it is actually strange. If I think in that way, I could avoid the bugs in this example and I think I could write more portable and solid code.

(*) Bjarne Stroustrup, C++ Programming Language 3rd Ed. Section 4.4, paragraph 2. p.73

1 comment:

Erik said...

This post (from 2010 it looks like) is the first clear explanation I've been able to find on using size_t.

I can understand why, for example, conversion of a double to a size_t may not work even if the double is positive (an error that I've come across).

There is a bit of code, Matrix.cpp, written by Stroustrup where he mentions in his notes that he dislikes unsigned.

At the same time, I've come across a number of comments on other blogs that suggest using size_t in loops that index arrays may lead to a ~10 percent speed up.

From my experience, this seem generally not worth the problems that it causes.