Skip to main content

unsigned and size_t are hard.

This is a computer language story. So you are not interested in that, I recommend to go to next.

C++ has a type 'unsigned' X, e.g., 'unsigned int.' For example, a minus index of an array usually doesn't make sense, therefore this type is used for that. This type is good for bit array storage, but using an unsigned int instead of an int to gain one more bit is almost never a good idea. (*) Especially combination with implicit conversion makes this hard. I think this unsigned number is not intuitive when the computation result is minus, it still stays a positive number.

For example, unsigned int -1 is usually equal to 4294967295 on 32bit machine.  This is depends on how an integer number is represented in a computer. Even one who knows this internal representation wrote a code on 32bit environment, sometimes it doesn't work on a 64bit machine. For example, he/she assumes size_t and unsigned int are the same type, and uses -1 as an illegal value.

The following code usually doesn't work on a 64bit machine, but works on a 32bit machine.
---
#include <iostream>
#include <vector>

void foo(std::vector< int > & vec, size_t idx){
    if(idx == size_t(-1)){
        std::cout << "Illegal index" << std::endl;
        return;
    }
    std::cout << "OK! accessing a vector with idx = " << idx << std::endl;
    // vec[idx] = ...
}

int main()
{
    std::vector< int > vec;
    unsigned int idx = -1;      // illegal index
    foo(vec, idx);

    unsigned int uint_minus_1(-1);
    size_t   size_t_minus_1(-1);
    size_t   size_t_casted = static_cast< size_t >(uint_minus_1);

    std::cout << "(unsigned int)(-1)  = " << uint_minus_1   << std::endl;
    std::cout << "size_t(-1)          = " << size_t_minus_1 << std::endl;
    std::cout << "size_t(-1) (casted) = " << size_t_casted  << std::endl;
}
---
The result is as following. 
---
nvlp[16]bash % ./unsigned_fail
OK! accessing a vector with idx = 4294967295
(unsigned int)(-1)  = 4294967295
size_t(-1)          = 18446744073709551615
size_t(-1) (casted) = 4294967295
---

First of all, assign -1 to unsigned type is a problem. Also, implicit conversion makes invisible the problem. In C++ language, -1 has a different value according to the type. I think this is very difficult. Recent compiles might tell us this as a warning. I watch this warning since this is a potential error. One of my friend told me, "unsigned is evil." I also try to avoid unsigned type.

If you learn computer architecture, it sounds natural that -1 is equal to 4294967295. But, nowadays I try to think it is actually strange. If I think in that way, I could avoid the bugs in this example and I think I could write more portable and solid code.

(*) Bjarne Stroustrup, C++ Programming Language 3rd Ed. Section 4.4, paragraph 2. p.73

Comments

Erik said…
This post (from 2010 it looks like) is the first clear explanation I've been able to find on using size_t.

I can understand why, for example, conversion of a double to a size_t may not work even if the double is positive (an error that I've come across).

There is a bit of code, Matrix.cpp, written by Stroustrup where he mentions in his notes that he dislikes unsigned.

At the same time, I've come across a number of comments on other blogs that suggest using size_t in loops that index arrays may lead to a ~10 percent speed up.

From my experience, this seem generally not worth the problems that it causes.

Popular posts from this blog

Why A^{T}A is invertible? (2) Linear Algebra

Why A^{T}A has the inverse Let me explain why A^{T}A has the inverse, if the columns of A are independent. First, if a matrix is n by n, and all the columns are independent, then this is a square full rank matrix. Therefore, there is the inverse. So, the problem is when A is a m by n, rectangle matrix.  Strang's explanation is based on null space. Null space and column space are the fundamental of the linear algebra. This explanation is simple and clear. However, when I was a University student, I did not recall the explanation of the null space in my linear algebra class. Maybe I was careless. I regret that... Explanation based on null space This explanation is based on Strang's book. Column space and null space are the main characters. Let's start with this explanation. Assume  x  where x is in the null space of A .  The matrices ( A^{T} A ) and A share the null space as the following: This means, if x is in the null space of A , x is also in the null spa

Gauss's quote for positive, negative, and imaginary number

Recently I watched the following great videos about imaginary numbers by Welch Labs. https://youtu.be/T647CGsuOVU?list=PLiaHhY2iBX9g6KIvZ_703G3KJXapKkNaF I like this article about naming of math by Kalid Azad. https://betterexplained.com/articles/learning-tip-idea-name/ Both articles mentioned about Gauss, who suggested to use other names of positive, negative, and imaginary numbers. Gauss wrote these names are wrong and that is one of the reason people didn't get why negative times negative is positive, or, pure positive imaginary times pure positive imaginary is negative real number. I made a few videos about explaining why -1 * -1 = +1, too. Explanation: why -1 * -1 = +1 by pattern https://youtu.be/uD7JRdAzKP8 Explanation: why -1 * -1 = +1 by climbing a mountain https://youtu.be/uD7JRdAzKP8 But actually Gauss's insight is much powerful. The original is in the Gauß, Werke, Bd. 2, S. 178 . Hätte man +1, -1, √-1) nicht positiv, negative, imaginäre (oder gar um

Why parallelogram area is |ad-bc|?

Here is my question. The area of parallelogram is the difference of these two rectangles (red rectangle - blue rectangle). This is not intuitive for me. If you also think it is not so intuitive, you might interested in my slides. I try to explain this for hight school students. Slides:  A bit intuitive (for me) explanation of area of parallelogram  (to my site, external link) .