Strict aliasing

preamble
Strict aliasing was introduced in C++98, but can be a gotcha. Strict pointer aliasing is allowed by the C++ standard, and is where the compiler is allowed to decide two pointers don’t point at the same object if they’re of unrelated types. e.g.

    uint32_t data32;
    uint16_t *pData16 = reinterpret_cast<uint16_t*>(&data32);
    uint16_t first = pData16[0];
    uint16_t second = pData16[1];

Here ‘first’ and ‘second’ may not point to what you expect.

Discussion
The compiler is at liberty to assume pointers to different types will not affect one another, in order to allow for more efficient code generation.
The solution to the above code is to to either use

  • A union
    union {
        uint32_t msg;
        unsigned uint16_t asBuffer[2];
    };
    
  • Disable strict aliasing of the compiler -f[no-]strict-aliasing in gcc
  • Use char*, as char* is an exception to the rule – the specification assumed that char* always aliases other types so you will not fall into the strict aliasing optimisation.
    uint16_t *pData16 = reinterpret_cast<uint16_t*>(reinterpret_cast<char*>(&data32));
    

    However adding a comment to the code for this will definitely be necessary as someone may remove you char* cast if they don’t know the rules.
  • Epilog
    After stating the above it must be noted that most likely the example given in the ‘preamble’ will work! It is however in undefined behaviour territory. GCC around 3.x seems to follow strict aliasing rules and you will probably not get what you expect, but later versions seem to follow the coders intent. I however have been testing on ‘clang’ and it appears not to generate any aliasing warning outputs.
    My feeling are now in 2018, you probably will get away with this bad code, or pick it up in debugging, but undefined behaviour is still undefined behaviour, and best to be avoided. I think this issue came more to the front due to gcc 3.x, following the letter of the law (c++ standard) rather than the programmer’s intent. I am not condoning writing non-compliant code!

    Reference
    https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
    https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule
    http://en.cppreference.com/w/cpp/language/reinterpret_cast
    https://stackoverflow.com/questions/34527729/why-compilers-no-longer-optimize-this-ub-with-strict-aliasing

    In-place reverse a char* string, in-place reverse word order of char* string

    Premable
    As a pure coding exercise here are examples of code to reverse a string and reverse the word order.

    In-place reverse a string
    The reversal of the string using the xor trick. https://en.wikipedia.org/wiki/XOR_swap_algorithm

    //In-place reverse
    void reverse(char* s)
    {
    	char* forwards = s;
    	char* backwards = s;
    
    	for(; *backwards != '\0'; ++backwards);
    	--backwards;
        while(forwards < backwards)
        {
        	*forwards = *forwards ^ *backwards;
        	*backwards = *forwards ^ *backwards;
        	*forwards = *forwards ^ *backwards;
        	++forwards;
        	--backwards;
        }
    }
    

    This code does not use additional memory, but is horrendously slow, as not only accessing individual bytes but uses bitwise operations as well. This solution is not recommended for any large processor, possibly only useful in embedded systems using 8-bit micro-controller. CPU much prefer working in bigger byte blocks. Better to use SSE registers, see example: https://stackoverflow.com/questions/19627181/reverse-a-string-using-sse

    Generally would rather use stl as this reduces the level of complexity, and gives any performance benefits stl can give.

    void reverse2(char* s)
    {
    	char* sEnd = s + strlen(s);
    	std::reverse(s, sEnd);
    }
    

    In-place reverse word order
    This consists of reversing the entire string, then iterating the string again reversing the individual words which are separated by spaces, hence making the words forward facing and readable again.

    //In-place reverse words
    void reverseWords(char* s)
    {
    	reverse(s);
    
    	char* b = s;
    	char* e = s;
    
    	for(;*e != '\0'; ++e)
    	{
    		if(*e == ' ')
    		{
    			*e = '\0';
    			reverse(b);
    			*e = ' ';
    			b = e;
    			++b;
    		}
    	}
    
    	reverse(b);
    }
    

    Code in full

    #include <iostream>
    #include <iterator>
    #include <algorithm>
    
    
    using namespace std;
    
    //In-place reverse
    void reverse(char* s)
    {
    	char* forwards = s;
    	char* backwards = s;
    
    	for(; *backwards != '\0'; ++backwards);
    	--backwards;
        while(forwards < backwards)
        {
        	*forwards = *forwards ^ *backwards;
        	*backwards = *forwards ^ *backwards;
        	*forwards = *forwards ^ *backwards;
        	++forwards;
        	--backwards;
        }
    }
    
    //In-place reverse words
    void reverseWords(char* s)
    {
    	reverse(s);
    
    	char* b = s;
    	char* e = s;
    
    	for(;*e != '\0'; ++e)
    	{
    		if(*e == ' ')
    		{
    			*e = '\0';
    			reverse(b);
    			*e = ' ';
    			b = e;
    			++b;
    		}
    	}
    
    	reverse(b);
    }
    
    void reverse2(char* s)
    {
    	char* sEnd = s + strlen(s);
    	std::reverse(s, sEnd);
    }
    
    int main()
    {
    	{
    	char s[100];
    	strcpy(s, "abcdefghi");
    
    	cout << s << endl;
    	reverse(s);
    	cout << s << endl;
    	reverse2(s);
    	cout << s << endl;
    	}
    
    	{
    	char s[100];
    	strcpy(s, "The quick brown fox");
    
    	cout << s << endl;
    	reverseWords(s);
    	cout << s << endl;
    	}
    
    	return 0;
    }
    

    Reverse a queue

    Preamble
    Reverse a singularly linked list using recursion.

    Solution
    Its trivial code, but nice now by passing the pointer to head_->_next to ReverseInternal() stores this pointer in the parameter of the call.

    Code:

    Node* ReverseInternal(Node *head_)
    {
        if (head_->_next == nullptr)
        {
            return head_;
        }
    
        Node* newHead = ReverseInternal(head_->_next);
        head_->_next->_next = head_;
    
        return newHead;
    }
    
    Node* Reverse(Node *head_)
    {
        if (head_ != nullptr)
        {
            Node* newHead = ReverseInternal(head_);
            head_->_next = nullptr;
            return newHead;
        }
        else
        {
            return head_;
        }
    }
    

    Full solution

    Code:

    #include <iostream>
    using namespace std;
    
    
    class Node
    {
    public:
        Node(string data_, Node* next_) : _data(data_), _next(next_) {}
        string _data;
        Node* _next;
    };
    
    [js]
    Node* ReverseInternal(Node *head_)
    {
        if (head_->_next == nullptr)
        {
            return head_;
        }
    
        Node* newHead = ReverseInternal(head_->_next);
        head_->_next->_next = head_;
    
        return newHead;
    }
    
    Node* Reverse(Node *head_)
    {
        if (head_ != nullptr)
        {
            Node* newHead = ReverseInternal(head_);
            head_->_next = nullptr;
            return newHead;
        }
        else
        {
            return head_;
        }
    }
    
    void printList(Node* list_)
    {
        Node* node = list_;
        while(node != nullptr)
        {
            cout << node->_data << " ";
            node = node->_next;
        }
    
        cout << endl;
    }
    
    int main()
    {
        unique_ptr<Node> D = make_unique<Node>("D", nullptr);
        unique_ptr<Node> C = make_unique<Node>("C", D.get());
        unique_ptr<Node> B = make_unique<Node>("B", C.get());
        unique_ptr<Node> A = make_unique<Node>("A", B.get());
    
        printList(A.get());
    
        Node* list = Reverse(A.get());
    
        printList(list);
    
        return 0;
    }
    

    Tree traversals

    Preamble
    None standard tree traversals. Determine the height of a tree, and height order tree traversal. Often used as interview questions.

    Determine height of a tree
    Iterate to the bottom of the tree and on returning return the max height of each branch, add one for this current level.

    Code:

    int height(Node* node_)
    {
        if (node_ == nullptr)
        {
            return 0;
        }
    
        int left  = height(node_->_left);
        int right = height(node_->_right);
    
        return max(left, right) + 1;
    }
    

    Height order traversal
    Iterate over all nodes by using a queue to push each level into the queue as it occurs.

    Code:

    queue<Node*> d;
    
    void heightWiseTransversal(Node* node_)
    {
    	if (node_ == nullptr)
    	{
    		return;
    	}
    
    	cout << node_->_data << endl;
    
    	d.push(node_->_left);
    	d.push(node_->_right);
    
    	Node* frontNode1 = d.front();
    	d.pop();
    	heightWiseTransversal(frontNode1);
    
    	Node* frontNode2 = d.front();
    	d.pop();
    	heightWiseTransversal(frontNode2);
    }
    

    Full code
    Just in case you want to run it.

    #include <iostream>
    #include <memory>
    #include <queue>
    using namespace std;
    
    class Node
    {
    public:
    	Node(string data_, Node* left_, Node* right_) : _data(data_), _left(left_), _right(right_) {}
    	string _data;
    	Node* _left;
    	Node* _right;
    };
    
    queue<Node*> d;
    
    void heightWiseTransversal(Node* node_)
    {
    	if (node_ == nullptr)
    	{
    		return;
    	}
    
    	cout << node_->_data << endl;
    
    	d.push(node_->_left);
    	d.push(node_->_right);
    
    	Node* frontNode1 = d.front();
    	d.pop();
    	heightWiseTransversal(frontNode1);
    
    	Node* frontNode2 = d.front();
    	d.pop();
    	heightWiseTransversal(frontNode2);
    }
    
    int height(Node* node_)
    {
        if (node_ == nullptr)
        {
            return 0;
        }
    
        int left  = height(node_->_left);
        int right = height(node_->_right);
    
        return max(left, right) + 1;
    }
    
    
    int main()
    {
    	//Note I keep lifetime management separate from functionality!
    
    	unique_ptr<Node> G = make_unique<Node>("G4", nullptr, nullptr);
    	unique_ptr<Node> F = make_unique<Node>("F3", nullptr, nullptr);
    	unique_ptr<Node> E = make_unique<Node>("E3", nullptr, nullptr);
    	unique_ptr<Node> D = make_unique<Node>("D3", G.get(), nullptr);
    	unique_ptr<Node> C = make_unique<Node>("C2", E.get(), F.get());
    	unique_ptr<Node> B = make_unique<Node>("B2", D.get(), nullptr);
        unique_ptr<Node> A = make_unique<Node>("A1", B.get(), C.get());
    
        cout << "Height wise traversal" << endl;
        heightWiseTransversal(A.get());
    
        cout << endl;
    
        cout << "height = " << height(A.get()) << endl;
    
    	return 0;
    }
    

    Output
    Height wise traversal
    A1
    B2
    C2
    D3
    E3
    F3
    G4

    Height = 4

    Reasons for std::move() and std::forward()

    when to use what?

    rvalue references – use move()

    universal references – use forward()

    universal references

    In order to understand both std::move() and std::forward() it is best to understand the idea of a universal reference. An example of a universal reference is a templated rvalue. e.g.

    template<typename T>
    void wrapper(T&& arg)

    It will bind to any type.

    Scott Meyers give a great overview of this.

    https://isocpp.org/blog/2012/11/universal-references-in-c11-scott-meyers

    https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Scott-Meyers-Universal-References-in-Cpp11

    lvalue and rvalue-ness is separate from type!

    If you consider the function

    void fn(Widget&& w)

    The type is Widget&&, but w (because it has a name) is an lvalue. NB type and lvalue/rvalue -ness are independent.  So if you use w in this function you probably need to use std::move(), or the copy operations will be called.

    std::move() – tell the complier to invoke a move operation

    std::move() returns a rvalue reference and this binds to the move operation.

    template<typename T>
    remove_references(T)&& move(T&& t)

    move() uses a universal reference to bind to anything and returns an r-value, with any extra &&’s removed.

    It is equivalent to doing a static_cast to the r-value.

    In full the prototype is:

    template< class T >
    constexpr typename std::remove_reference::type&& move(T&& t) noexcept;

    And an example of the code is:

    template inline
    typename std::remove_reference::type&&
    move(_Ty&& _Arg)
    {
    return ((typename std::remove_reference::type&&)_Arg);
    }

    std::forward() – tell the complier to do perfect forwarding

    std::forward() is defined as

    template<typename T> T&& forward(remove_references(T)&& t) – (1)
    template<typename T> 
    T&& forward(remove_references(T)& t)  - (2)

    1 – This overload makes it possible to forward a result of an expression (such as function call), which may be rvalue or lvalue, as the original value category of a forwarding reference argument.

    However it does not do the correct thing for lvalue references hence we need the overload 2.

    2 – Forwards lvalues as lvalues references.

    The lvalue binds as a lvalue reference.. The template parameter is thus something like Widget&, and the return value Widget& && => Widget& (via reference collapsing rules) which is what is returned. [Editors note! This is what I understand is happening, but maybe I need to revisit this later!]

    how forward is used

    The forward() method is used in code like this.

    template<class T>
    void wrapper(T&& arg)
    {
    foo(arg);
    }

    Here we want arg forwarded as the same type it was bound. Here arg is an lvalue reference (as it has a name). This is where forward() comes to the rescue. forward() converts arg back to the original type.

    template<class T>
    void wrapper(T&& arg)
    {
    foo(forward(arg));
    }

    In full the prototype of forward is:

    template< class T >
    constexpr T&& forward(typename std::remove_reference::type& t) noexcept;

    template< class T >
    constexpr T&& forward(typename std::remove_reference::type&& t) noexcept;

    And an example of the code is:

    template <class T>
    inline T&& forward(typename std::remove_reference<T>::type& t) noexcept
    {
    return static_cast<T&&>(t);
    }

    template <class T>
    inline T&& forward(typename std::remove_reference<T>::type&& t) noexcept
    {
    static_assert(!std::is_lvalue_reference<T>::value,
    “Can not forward an rvalue as an lvalue.”);
    return static_cast<T&&>(t);
    }

    Speed difference of iterating forwards and backwards on a vector

    Preamble

    An interesting interview question came up. Is it quicker to iterate a vector forwards or backwards, or is it the same both way?

    Spoiler

    ——————————————————-
    Benchmark Time CPU Iterations
    ——————————————————-
    vectorbackwards 237 ns 237 ns 2715315
    vectorforwards  96 ns 96 ns 7125770

    Investigation

    Firstly wrote a simple program with;


    for(auto it = a.begin(); it != a.end(); ++it)
    {
    sum += (*it);
    }

    for(auto it = a.rbegin(); it != a.rend(); ++it)
    {
    sum += (*it);
    }

    And picking out the interesting code from the godbolt compiler code with -O2


    .L2:
    add esi, DWORD PTR [rax]
    add rax, 4
    cmp rax, rdx
    jne .L2
    ...
    .L3:
    add esi, DWORD PTR [rax]
    sub rax, 4
    cmp rdx, rax
    jne .L3

    It is obvious from this code the compile only does a ‘sub’ rather than an ‘add’ when going over the loop.
    So job done? Iterating both forward and backward is almost identical code? So same speed? Well the compiler thinks so. But wait on! What about cache!

    I decided to run the test using google benchmark.

    2018-05-08 11:59:48
    Running /Users/brombo/git/interviews/benchtest/Debug/benchtest
    Run on (8 X 2500 MHz CPU s)
    CPU Caches:
    L1 Data 32K (x4)
    L1 Instruction 32K (x4)
    L2 Unified 262K (x4)
    L3 Unified 6291K (x1)
    ——————————————————-
    Benchmark Time CPU Iterations
    ——————————————————-
    vectorbackwards 237 ns 237 ns 2715315
    vectorforwards  96 ns 96 ns 7125770

    Benchmark code:

    #include <benchmark/benchmark.h>
    #include
    #include

    using namespace std;

    static int sum = 0;

    static void vectorbackwards(benchmark::State& state)
    {
    vector a{1,2,3,4,5,6,7,8,9,10};

    for (auto _ : state)
    {
    for(auto it = a.rbegin(); it != a.rend(); ++it)
    {
    sum += (*it);
    }
    }
    }

    BENCHMARK(vectorbackwards);

    static void vectorforwards(benchmark::State& state)
    {
    vector a{1,2,3,4,5,6,7,8,9,10};

    for (auto _ : state)
    {
    for(auto it = a.begin(); it != a.end(); ++it)
    {
    sum += (*it);
    }
    }
    }

    BENCHMARK(vectorforwards);

    BENCHMARK_MAIN();

    Conclusion

    Reverse iterating a vector is 2.5x slower than iterating forward, albeit the compiler generated code is almost identical. The effect has to be cache/cpu related.

    Caveat 

    I ran my test on a 2014 MacBook Pro, and it would be good to run the test on a later architecture. It does strike me that this area of CPU/Cache may have improved, but needs testing.

    Install google benchmark on a Mid 2014 MacBook Pro

    preamble

    How do I work out how fast a piece of code is? You can make reasonable guesses, but the only way you ever know is to actually measure it. So decided to install google benchmark.

    installation

    Installing google benchmark is fairly straight forward.  Google benchmark can be obtained from github. https://github.com/google/benchmark

    I am building on a Mid 2014, MacBook Pro and it didn’t build straight off.  In CMakeLists.txt the entry


    # ICC17u2: overloaded virtual function "benchmark::Fixture::SetUp" is only partially overridden
    # (because of deprecated overload)
    add_cxx_compiler_flag(-wd654)

    seemed to cause a problem.
    My compile doesn’t seem to know about -wd654, so commenting out this line allowed the installation to continue. I am not too worried about it as I believe the comment ICC is referring to the intel compiler and I’m using clang.

    There appears to be a thread on this at:

    https://github.com/google/benchmark/issues/354

    My compilation did stop again, but this was due to gtest not being installed. This was straight forward to fix.  Build and install a version of gtest.

    Conclusion

    Google benchmark seems to build on a 2014 MacBook Pro, with minor issues.

    Cost of virtual function – 2

    Preamble

    Previously I stated that the cost of using a virtual function has gone down, but I have been doing a bit more thinking on this and feel I should add a further comment.

    Virtual functions are still not great

    Yes the individual cost of calling a virtual function has gone down dramatically with faster ram, caches and cpu branch prediction and architecture (its about 2x the speed of a standard function), but the real cost of a virtual function appears to lie in what it actually does to the compilation of the code. Placing a virtual function call in the code stops the compiler from inlining/optimizing the code at that point.

    It is probably simpler to consider, would you consider adding an ‘if’ to you code if it isn’t really needed. It 1) adds complexity, 2) increases latency. A virtual function does exactly the same.

    Functionality provided by language features

    Virtual functions provide a lot of functionality when writing the code but this can often be achieved using other C++ language features.

    virtual functions – decoupling, delegation, code reuse, common interface variable.

    functions – delegation.

    classes – decoupling.

    templates – code reuse.

    type erasure – common interface variable.

    So it might be worth considering using only the required language feature at the required time, in order to write cleaner code, and avoid the proliferation of class hierarchies in code.

    A nice example of writing better code without virtual functions is shown here, I stole some of his ideas in this blog entry! :

    Conclusion

    Virtual functions are useful for runtime polymorphism, but often this is not required and really what is required is code reuse (templates). Virtual functions provide a lot of functionality, often more than needed.

    Having said that, if you need a virtual function, don’t be afraid to use one! Its a wonderful feature of the language.

    We are all STILL learning to write code!!!

    Cost of virtual function

    Preamble

    Speed currently is about 20ns but was an order of magnitude different a few years ago. Things change!

    Information

    Doing some googling on the subject you can find older pieces that specify that virtual functions as very in-efficent (circa 1995) mainly due to the compiler having to throw out the instruction cache and restart each time. This however becomes less evident using a super scalar architecture and really we are then left with whether the instructions are in the code cache or not, which is probably at worst in L3 cache unless your code is large.

    from

    http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/

    http://ithare.com/wp-content/uploads/part101_infographics_v06.png

    And if you want to read a LOT about optimisations.

    http://www.agner.org/optimize/optimizing_cpp.pdf

    Conclusion

    Virtual functions aren’t the bottleneck they once were, but as still not as fast as direct function calls. Really depends in what problem domain you are working. <100us, use a virtual function.  <10us you have to be a bit more careful. But a better design will always trump everything.