Simple producer/consumer queue

Preamble
A very simple example of a single producer/consumer using locks and move operations.

Code
[js]
template
class ProdCon
{
public:
void push(T&& w_)
{
scoped_lock lock{_mtx};
if (_q.empty())
{
_cv.notify_one();
}
_q.emplace(move(w_));
}

T pop()
{
unique_lock lock{_mtx};
_cv.wait(lock, [this](){ return !_q.empty(); } );
auto& w = _q.front();
_q.pop();
return move(w);
}
private:
queue _q;
mutex _mtx;
condition_variable _cv;
};
[/js]

Points to note are:

  • Objects of type T are only ever moved.
  • The queue is only signalled when it is empty for efficiency reasons.
  • Full code
    [js]
    #include
    #include
    #include
    #include #include
    using namespace std;

    template
    class ProdCon
    {
    public:
    void push(T&& w_)
    {
    scoped_lock lock{_mtx};
    if (_q.empty())
    {
    _cv.notify_one();
    }
    _q.emplace(move(w_));
    }

    T pop()
    {
    unique_lock lock{_mtx};
    _cv.wait(lock, [this](){ return !_q.empty(); } );
    auto& w = _q.front();
    _q.pop();
    return move(w);
    }
    private:
    queue _q;
    mutex _mtx;
    condition_variable _cv;
    };

    //——————-TestCode———————–

    static int widgetNum = 1;

    class Widget
    {
    public:
    Widget() : _widgetNum{widgetNum++} {}
    Widget(const Widget&) = delete;
    Widget& operator=(const Widget&) = delete;
    Widget(Widget&&) = default;
    Widget& operator=(Widget&&) = default;

    int num() { return _widgetNum; }

    private:
    int _widgetNum;
    };

    void staticTest()
    {
    cout << "staticTest" << endl; widgetNum = 1; Widget w1, w2, w3, w4; ProdCon pc;

    pc.push(move(w1));
    pc.push(move(w2));

    cout << pc.pop().num() << endl; cout << pc.pop().num() << endl; pc.push(move(w3)); cout << pc.pop().num() << endl; pc.push(move(w4)); cout << pc.pop().num() << endl; } void dynamicTest() { cout << "dynamicTest" << endl; widgetNum = 1; Widget w1, w2, w3, w4; ProdCon pc;

    auto prod = [&](){ pc.push(move(w1)); pc.push(move(w2)); pc.push(move(w3)); pc.push(move(w4)); };
    auto con = [&](){ for(int i = 1; i<=4; ++i) {cout << pc.pop().num() << endl;}; }; auto fut = async(launch::async, con); async(launch::async, prod); //wait for consumer fut.get(); } int main() { staticTest(); dynamicTest(); return 0; } [/js]

    Synology 918+ Memory J3455

    Preamble
    Something doesn’t make sense with the J3455 and the memory configurations. What does it support?
    Discussion
    The synology NAS 918+ is reported to support only 8GB of memory, which ties in nicely with intel’s specification.
    https://ark.intel.com/products/95594/Intel-Celeron-Processor-J3455-2M-Cache-up-to-2_3-GHz
    There is talk that this is 8×2(channels) but I don’t really believe that at this moment as it has been reported some people have run 32GB on this NAS.
    Investigating the hardware by examining /proc/cpuinfo
    reveals:
    address sizes : 39 bits physical, 48 bits virtual
    Which seems to suggest that the cpu can address up to 512GB (39 bits) of memory.
    As you can buy big DDR3 modules, the interface specification isn’t a restriction, and no-one would design a board to restrict the memory addressing without good reason.
    So my feeling are that possibly the J3455 can address up to 512GB, has only been tested up to 8GB, in hardware QA test before shipping each chip. (Or even possibly CPU internally is only optimised for 8GB, but that I suspect is a longer shot.)
    So dependent on how luck you are, your CPU may/may not work for a larger memory configuration. As they say you mileage may vary…

    Strict aliasing

    preamble
    Strict aliasing was introduced in C++98, but can be a gotcha. Strict pointer aliasing is allowed by the C++ standard, and is where the compiler is allowed to decide two pointers don’t point at the same object if they’re of unrelated types. e.g.
    [js]
    uint32_t data32;
    uint16_t *pData16 = reinterpret_cast(&data32);
    uint16_t first = pData16[0];
    uint16_t second = pData16[1];
    [/js]
    Here ‘first’ and ‘second’ may not point to what you expect.

    Discussion
    The compiler is at liberty to assume pointers to different types will not affect one another, in order to allow for more efficient code generation.
    The solution to the above code is to to either use

  • A union
    [js]
    union {
    uint32_t msg;
    unsigned uint16_t asBuffer[2];
    };
    [/js]
  • Disable strict aliasing of the compiler -f[no-]strict-aliasing in gcc
  • Use char*, as char* is an exception to the rule – the specification assumed that char* always aliases other types so you will not fall into the strict aliasing optimisation.
    [js]
    uint16_t *pData16 = reinterpret_cast(reinterpret_cast(&data32));
    [/js]
    However adding a comment to the code for this will definitely be necessary as someone may remove you char* cast if they don’t know the rules.
  • Epilog
    After stating the above it must be noted that most likely the example given in the ‘preamble’ will work! It is however in undefined behaviour territory. GCC around 3.x seems to follow strict aliasing rules and you will probably not get what you expect, but later versions seem to follow the coders intent. I however have been testing on ‘clang’ and it appears not to generate any aliasing warning outputs.
    My feeling are now in 2018, you probably will get away with this bad code, or pick it up in debugging, but undefined behaviour is still undefined behaviour, and best to be avoided. I think this issue came more to the front due to gcc 3.x, following the letter of the law (c++ standard) rather than the programmer’s intent. I am not condoning writing non-compliant code!

    Reference
    https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
    https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule
    http://en.cppreference.com/w/cpp/language/reinterpret_cast
    https://stackoverflow.com/questions/34527729/why-compilers-no-longer-optimize-this-ub-with-strict-aliasing

    In-place reverse a char* string, in-place reverse word order of char* string

    Premable
    As a pure coding exercise here are examples of code to reverse a string and reverse the word order.

    In-place reverse a string
    The reversal of the string using the xor trick. https://en.wikipedia.org/wiki/XOR_swap_algorithm
    [js]
    //In-place reverse
    void reverse(char* s)
    {
    char* forwards = s;
    char* backwards = s;

    for(; *backwards != ‘\0’; ++backwards);
    –backwards;
    while(forwards < backwards) { *forwards = *forwards ^ *backwards; *backwards = *forwards ^ *backwards; *forwards = *forwards ^ *backwards; ++forwards; --backwards; } } [/js] This code does not use additional memory, but is horrendously slow, as not only accessing individual bytes but uses bitwise operations as well. This solution is not recommended for any large processor, possibly only useful in embedded systems using 8-bit micro-controller. CPU much prefer working in bigger byte blocks. Better to use SSE registers, see example: https://stackoverflow.com/questions/19627181/reverse-a-string-using-sse

    Generally would rather use stl as this reduces the level of complexity, and gives any performance benefits stl can give.
    [js]
    void reverse2(char* s)
    {
    char* sEnd = s + strlen(s);
    std::reverse(s, sEnd);
    }
    [/js]

    In-place reverse word order
    This consists of reversing the entire string, then iterating the string again reversing the individual words which are separated by spaces, hence making the words forward facing and readable again.

    [js]
    //In-place reverse words
    void reverseWords(char* s)
    {
    reverse(s);

    char* b = s;
    char* e = s;

    for(;*e != ‘\0’; ++e)
    {
    if(*e == ‘ ‘)
    {
    *e = ‘\0’;
    reverse(b);
    *e = ‘ ‘;
    b = e;
    ++b;
    }
    }

    reverse(b);
    }
    [/js]

    Code in full
    [js]
    #include
    #include
    #include

    using namespace std;

    //In-place reverse
    void reverse(char* s)
    {
    char* forwards = s;
    char* backwards = s;

    for(; *backwards != ‘\0’; ++backwards);
    –backwards;
    while(forwards < backwards) { *forwards = *forwards ^ *backwards; *backwards = *forwards ^ *backwards; *forwards = *forwards ^ *backwards; ++forwards; --backwards; } } //In-place reverse words void reverseWords(char* s) { reverse(s); char* b = s; char* e = s; for(;*e != '\0'; ++e) { if(*e == ' ') { *e = '\0'; reverse(b); *e = ' '; b = e; ++b; } } reverse(b); } void reverse2(char* s) { char* sEnd = s + strlen(s); std::reverse(s, sEnd); } int main() { { char s[100]; strcpy(s, "abcdefghi"); cout << s << endl; reverse(s); cout << s << endl; reverse2(s); cout << s << endl; } { char s[100]; strcpy(s, "The quick brown fox"); cout << s << endl; reverseWords(s); cout << s << endl; } return 0; } [/js]

    Reverse a queue

    Preamble
    Reverse a singularly linked list using recursion.

    Solution
    Its trivial code, but nice now by passing the pointer to head_->_next to ReverseInternal() stores this pointer in the parameter of the call.

    Code:
    [js]
    Node* ReverseInternal(Node *head_)
    {
    if (head_->_next == nullptr)
    {
    return head_;
    }

    Node* newHead = ReverseInternal(head_->_next);
    head_->_next->_next = head_;

    return newHead;
    }

    Node* Reverse(Node *head_)
    {
    if (head_ != nullptr)
    {
    Node* newHead = ReverseInternal(head_);
    head_->_next = nullptr;
    return newHead;
    }
    else
    {
    return head_;
    }
    }
    [/js]

    Full solution

    Code:
    [js]
    #include
    using namespace std;

    class Node
    {
    public:
    Node(string data_, Node* next_) : _data(data_), _next(next_) {}
    string _data;
    Node* _next;
    };

    [js]
    Node* ReverseInternal(Node *head_)
    {
    if (head_->_next == nullptr)
    {
    return head_;
    }

    Node* newHead = ReverseInternal(head_->_next);
    head_->_next->_next = head_;

    return newHead;
    }

    Node* Reverse(Node *head_)
    {
    if (head_ != nullptr)
    {
    Node* newHead = ReverseInternal(head_);
    head_->_next = nullptr;
    return newHead;
    }
    else
    {
    return head_;
    }
    }

    void printList(Node* list_)
    {
    Node* node = list_;
    while(node != nullptr)
    {
    cout << node->_data << " "; node = node->_next;
    }

    cout << endl; } int main() { unique_ptr D = make_unique(“D”, nullptr);
    unique_ptr C = make_unique(“C”, D.get());
    unique_ptr B = make_unique(“B”, C.get());
    unique_ptr A = make_unique(“A”, B.get());

    printList(A.get());

    Node* list = Reverse(A.get());

    printList(list);

    return 0;
    }
    [/js]

    Tree traversals

    Preamble
    None standard tree traversals. Determine the height of a tree, and height order tree traversal. Often used as interview questions.

    Determine height of a tree
    Iterate to the bottom of the tree and on returning return the max height of each branch, add one for this current level.

    Code:
    [js]
    int height(Node* node_)
    {
    if (node_ == nullptr)
    {
    return 0;
    }

    int left = height(node_->_left);
    int right = height(node_->_right);

    return max(left, right) + 1;
    }
    [/js]

    Height order traversal
    Iterate over all nodes by using a queue to push each level into the queue as it occurs.

    Code:
    [js]
    queue d;

    void heightWiseTransversal(Node* node_)
    {
    if (node_ == nullptr)
    {
    return;
    }

    cout << node_->_data << endl; d.push(node_->_left);
    d.push(node_->_right);

    Node* frontNode1 = d.front();
    d.pop();
    heightWiseTransversal(frontNode1);

    Node* frontNode2 = d.front();
    d.pop();
    heightWiseTransversal(frontNode2);
    }
    [/js]

    Full code
    Just in case you want to run it.

    [js]
    #include
    #include
    #include
    using namespace std;

    class Node
    {
    public:
    Node(string data_, Node* left_, Node* right_) : _data(data_), _left(left_), _right(right_) {}
    string _data;
    Node* _left;
    Node* _right;
    };

    queue d;

    void heightWiseTransversal(Node* node_)
    {
    if (node_ == nullptr)
    {
    return;
    }

    cout << node_->_data << endl; d.push(node_->_left);
    d.push(node_->_right);

    Node* frontNode1 = d.front();
    d.pop();
    heightWiseTransversal(frontNode1);

    Node* frontNode2 = d.front();
    d.pop();
    heightWiseTransversal(frontNode2);
    }

    int height(Node* node_)
    {
    if (node_ == nullptr)
    {
    return 0;
    }

    int left = height(node_->_left);
    int right = height(node_->_right);

    return max(left, right) + 1;
    }

    int main()
    {
    //Note I keep lifetime management separate from functionality!

    unique_ptr G = make_unique(“G4”, nullptr, nullptr);
    unique_ptr F = make_unique(“F3”, nullptr, nullptr);
    unique_ptr E = make_unique(“E3”, nullptr, nullptr);
    unique_ptr D = make_unique(“D3”, G.get(), nullptr);
    unique_ptr C = make_unique(“C2”, E.get(), F.get());
    unique_ptr B = make_unique(“B2”, D.get(), nullptr);
    unique_ptr A = make_unique(“A1”, B.get(), C.get());

    cout << "Height wise traversal" << endl; heightWiseTransversal(A.get()); cout << endl; cout << "height = " << height(A.get()) << endl; return 0; } [/js] Output
    Height wise traversal
    A1
    B2
    C2
    D3
    E3
    F3
    G4

    Height = 4

    Reasons for std::move() and std::forward()

    when to use what?

    rvalue references – use move()

    universal references – use forward()

    universal references

    In order to understand both std::move() and std::forward() it is best to understand the idea of a universal reference. An example of a universal reference is a templated rvalue. e.g.

    template<typename T>
    void wrapper(T&& arg)

    It will bind to any type.

    Scott Meyers give a great overview of this.

    https://isocpp.org/blog/2012/11/universal-references-in-c11-scott-meyers

    https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Scott-Meyers-Universal-References-in-Cpp11

    lvalue and rvalue-ness is separate from type!

    If you consider the function

    void fn(Widget&& w)

    The type is Widget&&, but w (because it has a name) is an lvalue. NB type and lvalue/rvalue -ness are independent.  So if you use w in this function you probably need to use std::move(), or the copy operations will be called.

    std::move() – tell the complier to invoke a move operation

    std::move() returns a rvalue reference and this binds to the move operation.

    template<typename T>
    remove_references(T)&& move(T&& t)

    move() uses a universal reference to bind to anything and returns an r-value, with any extra &&’s removed.

    It is equivalent to doing a static_cast to the r-value.

    In full the prototype is:

    template< class T >
    constexpr typename std::remove_reference::type&& move(T&& t) noexcept;

    And an example of the code is:

    template inline
    typename std::remove_reference::type&&
    move(_Ty&& _Arg)
    {
    return ((typename std::remove_reference::type&&)_Arg);
    }

    std::forward() – tell the complier to do perfect forwarding

    std::forward() is defined as

    template<typename T> T&& forward(remove_references(T)&& t) – (1)
    template<typename T> 
    T&& forward(remove_references(T)& t)  - (2)

    1 – This overload makes it possible to forward a result of an expression (such as function call), which may be rvalue or lvalue, as the original value category of a forwarding reference argument.

    However it does not do the correct thing for lvalue references hence we need the overload 2.

    2 – Forwards lvalues as lvalues references.

    The lvalue binds as a lvalue reference.. The template parameter is thus something like Widget&, and the return value Widget& && => Widget& (via reference collapsing rules) which is what is returned. [Editors note! This is what I understand is happening, but maybe I need to revisit this later!]

    how forward is used

    The forward() method is used in code like this.

    template<class T>
    void wrapper(T&& arg)
    {
    foo(arg);
    }

    Here we want arg forwarded as the same type it was bound. Here arg is an lvalue reference (as it has a name). This is where forward() comes to the rescue. forward() converts arg back to the original type.

    template<class T>
    void wrapper(T&& arg)
    {
    foo(forward(arg));
    }

    In full the prototype of forward is:

    template< class T >
    constexpr T&& forward(typename std::remove_reference::type& t) noexcept;

    template< class T >
    constexpr T&& forward(typename std::remove_reference::type&& t) noexcept;

    And an example of the code is:

    template <class T>
    inline T&& forward(typename std::remove_reference<T>::type& t) noexcept
    {
    return static_cast<T&&>(t);
    }

    template <class T>
    inline T&& forward(typename std::remove_reference<T>::type&& t) noexcept
    {
    static_assert(!std::is_lvalue_reference<T>::value,
    “Can not forward an rvalue as an lvalue.”);
    return static_cast<T&&>(t);
    }

    Speed difference of iterating forwards and backwards on a vector

    Preamble

    An interesting interview question came up. Is it quicker to iterate a vector forwards or backwards, or is it the same both way?

    Spoiler

    ——————————————————-
    Benchmark Time CPU Iterations
    ——————————————————-
    vectorbackwards 237 ns 237 ns 2715315
    vectorforwards  96 ns 96 ns 7125770

    Investigation

    Firstly wrote a simple program with;


    for(auto it = a.begin(); it != a.end(); ++it)
    {
    sum += (*it);
    }

    for(auto it = a.rbegin(); it != a.rend(); ++it)
    {
    sum += (*it);
    }

    And picking out the interesting code from the godbolt compiler code with -O2


    .L2:
    add esi, DWORD PTR [rax]
    add rax, 4
    cmp rax, rdx
    jne .L2
    ...
    .L3:
    add esi, DWORD PTR [rax]
    sub rax, 4
    cmp rdx, rax
    jne .L3

    It is obvious from this code the compile only does a ‘sub’ rather than an ‘add’ when going over the loop.
    So job done? Iterating both forward and backward is almost identical code? So same speed? Well the compiler thinks so. But wait on! What about cache!

    I decided to run the test using google benchmark.

    2018-05-08 11:59:48
    Running /Users/brombo/git/interviews/benchtest/Debug/benchtest
    Run on (8 X 2500 MHz CPU s)
    CPU Caches:
    L1 Data 32K (x4)
    L1 Instruction 32K (x4)
    L2 Unified 262K (x4)
    L3 Unified 6291K (x1)
    ——————————————————-
    Benchmark Time CPU Iterations
    ——————————————————-
    vectorbackwards 237 ns 237 ns 2715315
    vectorforwards  96 ns 96 ns 7125770

    Benchmark code:

    #include <benchmark/benchmark.h>
    #include
    #include

    using namespace std;

    static int sum = 0;

    static void vectorbackwards(benchmark::State& state)
    {
    vector a{1,2,3,4,5,6,7,8,9,10};

    for (auto _ : state)
    {
    for(auto it = a.rbegin(); it != a.rend(); ++it)
    {
    sum += (*it);
    }
    }
    }

    BENCHMARK(vectorbackwards);

    static void vectorforwards(benchmark::State& state)
    {
    vector a{1,2,3,4,5,6,7,8,9,10};

    for (auto _ : state)
    {
    for(auto it = a.begin(); it != a.end(); ++it)
    {
    sum += (*it);
    }
    }
    }

    BENCHMARK(vectorforwards);

    BENCHMARK_MAIN();

    Conclusion

    Reverse iterating a vector is 2.5x slower than iterating forward, albeit the compiler generated code is almost identical. The effect has to be cache/cpu related.

    Caveat 

    I ran my test on a 2014 MacBook Pro, and it would be good to run the test on a later architecture. It does strike me that this area of CPU/Cache may have improved, but needs testing.

    Install google benchmark on a Mid 2014 MacBook Pro

    preamble

    How do I work out how fast a piece of code is? You can make reasonable guesses, but the only way you ever know is to actually measure it. So decided to install google benchmark.

    installation

    Installing google benchmark is fairly straight forward.  Google benchmark can be obtained from github. https://github.com/google/benchmark

    I am building on a Mid 2014, MacBook Pro and it didn’t build straight off.  In CMakeLists.txt the entry


    # ICC17u2: overloaded virtual function "benchmark::Fixture::SetUp" is only partially overridden
    # (because of deprecated overload)
    add_cxx_compiler_flag(-wd654)

    seemed to cause a problem.
    My compile doesn’t seem to know about -wd654, so commenting out this line allowed the installation to continue. I am not too worried about it as I believe the comment ICC is referring to the intel compiler and I’m using clang.

    There appears to be a thread on this at:

    https://github.com/google/benchmark/issues/354

    My compilation did stop again, but this was due to gtest not being installed. This was straight forward to fix.  Build and install a version of gtest.

    Conclusion

    Google benchmark seems to build on a 2014 MacBook Pro, with minor issues.