Oct 31, 2022 2 min read programming

Standard-safe arithmetic shift right in C

Last month I learned that the behavior of right-shifting a signed integer in C is implementation-defined. That is, say you have the following C code:

int foo(void) {
    int x = -128;
    unsigned int y = 3;
    return x >> y;
}

(This should have highlighting but I don't feel like setting that up in Ghost right now. 🙂)

Assuming a 32-bit int with two's complement representation, x is represented with the bit pattern 0xFFFFFF80 (given in hexadecimal). foo() returns the right shift x >> y. But there are two possible ways to interpret x >> y:

As a logical right shift. (LSR)
As an arithmetic right shift. (ASR)

If we interpret this as a logical right shift (LSR), we shift the underlying representation by 3 bits right, filling the left with zeros, and end up with a large positive number, specifically 0x1FFFFFF0, which in decimal is 536870896.

On the other hand, if we interpet this as an arithmetic right shift (ASR), we shift by 3 bits, filling the left with the sign bit (1, for negative), and end up with 0xFFFFFFF0, which represents the signed integer value -16.

What value does foo() return? According to Effective C by Robert C. Seacord (PDF page 71), this is implementation-defined! That is, your compiler decides whether this is a logical shift right or arithmetic shift right. Different compilers may choose differently; the standard does not require a specific interpretation. This means that if you want to perform a right shift in a specific portable way, you can't rely on the compiler to do the right thing here.

How can we do this?

If you want a portable logical shift it's relatively straightforward. Reinterpret the int x as an unsigned int, do the right shift, then reinterpret the result as a signed int again.

For an arithmetic right shift, it's a bit more complicated. Here's one implementation:

#include <assert.h>
#include <limits.h>

int asr(int x, int y) {
    assert(y >= 0 && "y < 0 triggers undefined behavior.");
    assert(
        y < CHAR_BIT * sizeof(x) && 
        "y greater than or equal to the (bit) width of x also triggers undefined behavior."
    );
    unsigned int extended_sign = (x >= 0) ? 0 : ~(UINT_MAX >> y);
    unsigned int reinterpreted_x = *(unsigned int*)(&x);
    return extended_sign | (reinterpreted_x >> y);
}

Here's how it works. First we calculate the extended sign part of x >> y. Next, we reinterpret x as unsigned. Finally, we perform a (logical) right shift on the unsigned x, performing a bitwise OR to incorporate the sign extension part.

This implementation should be portable. It should be correct no matter the width or the underlying representation of signed integers (sign bit vs. one's complement vs. two's complement). If it's not, I'd be interested to know how it fails to be portable.

Is this useful for anything in particular? Who knows. It was interesting to implement.

I plan to upload this to GitHub here.

You might also like...

Things I learned calling shm_open() on a Mac

Lessons learned from battling Godot 3 for twelve hours

Beware map indexing in C++

Eval game rule 10

You could have invented PEM encoding