Thoughts on Modern C++ - Yafei's Archive

Modern C++ is undeniably powerful. But, as with any programming language, it is also often hated by many. And there are certainly reasons.

The Legacy#

I have been using C++ for around a year for some projects, but without formal training or specific tutorials. I really started learning it by reading Nvidia’s PhysX codebase, which I found very well-organized and performance-driven. PhysX has been used by many games—as well as game engines like Unity. On the other hand, the codebase is far from being “modern” from many C++ users’ view. STL is not even used—no smart pointers, no std::vector, all implemented within PhysX in an old fashioned way.

It is not difficult to make some guesses as to the reasons:

PhysX was initially developed around 20 years ago when the standard hadn’t even introduced the modern features;
PhysX really wants to squeeze that last bit of performance, especially on low-end devices;
PhysX should support those devices that your regular program won’t run on, e.g. Nintendo Switch (NS). I don’t even know how much of the standard Nintendo has implemented for the NS, as it is completely proprietary.

I am not sure how each of these points has contributed to this design, and there are likely other factors influencing PhysX’s architecture. But here’s the practical question: should we always use modern C++ features when starting a new project now? I do not have the answer myself, but since I learned C++ by reading PhysX’s codebase, I have stayed away from many modern features.

The Confusing Standard#

I do like many of the modern additions and changes to the C++ standard, which simplify and unify many aspects. Still, the standard’s wording sometimes leaves even experienced users confused. I once blamed my own skill level until I read P1839R7 and realized ambiguities (and possibly defects) in the standard are a common headache. Below is quoted from the paper:

1
void print_hex(int n) {
2
    unsigned char* a = (unsigned char*)(&n);
3
    for (int i = 0; i < sizeof(int); ++i)
4
        printf("%02x ", a[i]);
5
}
6

7
int main() {
8
    print_hex(123456);
9
}

In C, this is a valid program. On a little-endian machine where sizeof(int) == 4, this will print 40 e2 01 00. In C++, this is widely assumed to be valid as well, and this functionality is widely used in existing code bases (think of binary file formats, hex viewers, and many other low-level use cases).

However, surprisingly, in C++ this code has undefined behaviour under the current specification. In fact, it is impossible in C++ to directly access the object representation of an object (i.e. to read its underlying bytes), even for built-in types such as int. Instead, we would have to use memcpy to copy the bytes into a separate array of unsigned char, and access them from there. However, this workaround only works for trivially copyable types. It also directly violates one of the fundamental principles of C++: to leave no room for a lower-level language.

The goal of this paper is to provide the necessary wording fixes to make accessing object representations such as in the code above defined behaviour. Existing compilers already assume that this should be valid. The goal of the paper is therefore to not require any changes to existing compilers or existing code, but to legalise existing code that already works in practice and was always intended to be valid.

Practically, compiler implementations and their documentation are the de facto standard, rather than the formal C++ standard. One example is union type punning.

1
#include <iostream>
2
#include <cstdint>
3

4
union Pun {
5
    std::uint32_t i;
6
    float         f;
7
};
8

9
int main() {
10
    Pun u;
11
    u.i = 0x3F800000;      // Bit pattern for 1.0f in IEEE-754
12
    std::cout << u.f      // Reads the float interpretation
13
              << std::endl;
14
    return 0;
15
}

The use case is well-defined by the C standard, but is UB in C++. The C++ standard does not define the behavior (likely deliberately) and leaves more freedom for compilers’ implementation. Practically I don’t see any compiler that does not support this. But it is very confusing to look up and be certain. GCC states this explicitly:

The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected. See Structures, Unions, Enumerations, and Bit-Fields.

However, I can’t find any information about MSVC’s treatment on this case, although it is obviously supported.

The ambiguity on the wording of the standard, either by design or by oversight, makes it almost a common practice to wait for at least 5 years before a standard can be used until things stabilize. There seems to be no hope for improvements in the situation.

How About CUDA?#

CUDA’s rise for GPGPU, and especially with recent hype in machine learning, has attracted another unique group of C++ users. NVCC’s implementation of C++ is very unique, as the code is compiled for the GPU device instead of the CPU. This adds an additional layer of confusion.

One example is pointer casting. Using this for type punning is UB that violates the strict aliasing rule (although at least MSVC shouldn’t have trouble with this). It is usually more proper to use union or memcpy to achieve this (although union doesn’t seem portable as per the standard), but Nvidia’s official documentation implies that at least this will definitely work for their GPUs. Below is taken from the documentation:

1
#if __CUDA_ARCH__ < 600
2
__device__ double atomicAdd(double* address, double val)
3
{
4
    unsigned long long int* address_as_ull =
5
                              (unsigned long long int*)address;
6
    unsigned long long int old = *address_as_ull, assumed;
7

8
    do {
9
        assumed = old;
10
        old = atomicCAS(address_as_ull, assumed,
11
                        __double_as_longlong(val +
12
                               __longlong_as_double(assumed)));
13

14
    // Note: uses integer comparison to avoid hang in case of NaN (since NaN != NaN)
15
    } while (assumed != old);
16

17
    return __longlong_as_double(old);
18
}
19
#endif

Casting from a double* to a unsigned long long int* and dereferencing it is obviously UB. But this code snippet suggests that it should be well-defined, at least for device code.

Conclusion#

My experience with developing a cross-platform C++/CUDA project has been painful, and the aspects mentioned above have been lingering in my mind since the beginning.

There has not been a practical alternative to C++ at least for CUDA programming (or even broadly, most GPGPU programming). I guess the only way is to live with it, live with a confusing standard, and live with UBs in my code.

Written by Yafei Ou and reviewed for grammar and conceptual accuracy by ChatGPT (o4-mini-high).
© 2025 Yafei Ou. All rights reserved.