Blog – Page 2 – Brombo Software development

30th May 2018

Simple producer/consumer queue

Preamble
A very simple example of a single producer/consumer using locks and move operations.

Code
[js]
template
class ProdCon
{
public:
void push(T&& w_)
{
scoped_lock lock{_mtx};
if (_q.empty())
{
_cv.notify_one();
}
_q.emplace(move(w_));
}

T pop()
{
unique_lock lock{_mtx};
_cv.wait(lock, [this](){ return !_q.empty(); } );
auto& w = _q.front();
_q.pop();
return move(w);
}
private:
queue _q;
mutex _mtx;
condition_variable _cv;
};
[/js]

Points to note are:

Objects of type T are only ever moved.

The queue is only signalled when it is empty for efficiency reasons.

Full code
[js]
#include
#include
#include
#include #include
using namespace std;

template
class ProdCon
{
public:
void push(T&& w_)
{
scoped_lock lock{_mtx};
if (_q.empty())
{
_cv.notify_one();
}
_q.emplace(move(w_));
}

T pop()
{
unique_lock lock{_mtx};
_cv.wait(lock, [this](){ return !_q.empty(); } );
auto& w = _q.front();
_q.pop();
return move(w);
}
private:
queue _q;
mutex _mtx;
condition_variable _cv;
};

//——————-TestCode———————–

static int widgetNum = 1;

class Widget
{
public:
Widget() : _widgetNum{widgetNum++} {}
Widget(const Widget&) = delete;
Widget& operator=(const Widget&) = delete;
Widget(Widget&&) = default;
Widget& operator=(Widget&&) = default;

int num() { return _widgetNum; }

private:
int _widgetNum;
};

void staticTest()
{
cout << "staticTest" << endl; widgetNum = 1; Widget w1, w2, w3, w4; ProdCon pc;

pc.push(move(w1));
pc.push(move(w2));

cout << pc.pop().num() << endl; cout << pc.pop().num() << endl; pc.push(move(w3)); cout << pc.pop().num() << endl; pc.push(move(w4)); cout << pc.pop().num() << endl; } void dynamicTest() { cout << "dynamicTest" << endl; widgetNum = 1; Widget w1, w2, w3, w4; ProdCon pc;

auto prod = [&](){ pc.push(move(w1)); pc.push(move(w2)); pc.push(move(w3)); pc.push(move(w4)); };
auto con = [&](){ for(int i = 1; i<=4; ++i) {cout << pc.pop().num() << endl;}; }; auto fut = async(launch::async, con); async(launch::async, prod); //wait for consumer fut.get(); } int main() { staticTest(); dynamicTest(); return 0; } [/js]

29th May 201829th May 2018

Synology 918+ Memory J3455

Preamble
Something doesn’t make sense with the J3455 and the memory configurations. What does it support?
Discussion
The synology NAS 918+ is reported to support only 8GB of memory, which ties in nicely with intel’s specification.
https://ark.intel.com/products/95594/Intel-Celeron-Processor-J3455-2M-Cache-up-to-2_3-GHz
There is talk that this is 8×2(channels) but I don’t really believe that at this moment as it has been reported some people have run 32GB on this NAS.
Investigating the hardware by examining /proc/cpuinfo
reveals:
address sizes : 39 bits physical, 48 bits virtual
Which seems to suggest that the cpu can address up to 512GB (39 bits) of memory.
As you can buy big DDR3 modules, the interface specification isn’t a restriction, and no-one would design a board to restrict the memory addressing without good reason.
So my feeling are that possibly the J3455 can address up to 512GB, has only been tested up to 8GB, in hardware QA test before shipping each chip. (Or even possibly CPU internally is only optimised for 8GB, but that I suspect is a longer shot.)
So dependent on how luck you are, your CPU may/may not work for a larger memory configuration. As they say you mileage may vary…

29th May 201829th May 2018

Strict aliasing

preamble
Strict aliasing was introduced in C++98, but can be a gotcha. Strict pointer aliasing is allowed by the C++ standard, and is where the compiler is allowed to decide two pointers don’t point at the same object if they’re of unrelated types. e.g.
[js]
uint32_t data32;
uint16_t *pData16 = reinterpret_cast(&data32);
uint16_t first = pData16[0];
uint16_t second = pData16[1];
[/js]
Here ‘first’ and ‘second’ may not point to what you expect.

Discussion
The compiler is at liberty to assume pointers to different types will not affect one another, in order to allow for more efficient code generation.
The solution to the above code is to to either use

A union
[js]
union {
uint32_t msg;
unsigned uint16_t asBuffer[2];
};
[/js]

Disable strict aliasing of the compiler -f[no-]strict-aliasing in gcc

Use char*, as char* is an exception to the rule – the specification assumed that char* always aliases other types so you will not fall into the strict aliasing optimisation.
[js]
uint16_t *pData16 = reinterpret_cast(reinterpret_cast(&data32));
[/js]
However adding a comment to the code for this will definitely be necessary as someone may remove you char* cast if they don’t know the rules.

Epilog
After stating the above it must be noted that most likely the example given in the ‘preamble’ will work! It is however in undefined behaviour territory. GCC around 3.x seems to follow strict aliasing rules and you will probably not get what you expect, but later versions seem to follow the coders intent. I however have been testing on ‘clang’ and it appears not to generate any aliasing warning outputs.
My feeling are now in 2018, you probably will get away with this bad code, or pick it up in debugging, but undefined behaviour is still undefined behaviour, and best to be avoided. I think this issue came more to the front due to gcc 3.x, following the letter of the law (c++ standard) rather than the programmer’s intent. I am not condoning writing non-compliant code!

Reference
https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule
http://en.cppreference.com/w/cpp/language/reinterpret_cast
https://stackoverflow.com/questions/34527729/why-compilers-no-longer-optimize-this-ub-with-strict-aliasing

24th May 201824th May 2018

In-place reverse a char* string, in-place reverse word order of char* string

Premable
As a pure coding exercise here are examples of code to reverse a string and reverse the word order.

In-place reverse a string
The reversal of the string using the xor trick. https://en.wikipedia.org/wiki/XOR_swap_algorithm
[js]
//In-place reverse
void reverse(char* s)
{
char* forwards = s;
char* backwards = s;

for(; *backwards != ‘\0’; ++backwards);
–backwards;
while(forwards < backwards) { *forwards = *forwards ^ *backwards; *backwards = *forwards ^ *backwards; *forwards = *forwards ^ *backwards; ++forwards; --backwards; } } [/js] This code does not use additional memory, but is horrendously slow, as not only accessing individual bytes but uses bitwise operations as well. This solution is not recommended for any large processor, possibly only useful in embedded systems using 8-bit micro-controller. CPU much prefer working in bigger byte blocks. Better to use SSE registers, see example: https://stackoverflow.com/questions/19627181/reverse-a-string-using-sse

Generally would rather use stl as this reduces the level of complexity, and gives any performance benefits stl can give.
[js]
void reverse2(char* s)
{
char* sEnd = s + strlen(s);
std::reverse(s, sEnd);
}
[/js]

In-place reverse word order
This consists of reversing the entire string, then iterating the string again reversing the individual words which are separated by spaces, hence making the words forward facing and readable again.

[js]
//In-place reverse words
void reverseWords(char* s)
{
reverse(s);

char* b = s;
char* e = s;

for(;*e != ‘\0’; ++e)
{
if(*e == ‘ ‘)
{
*e = ‘\0’;
reverse(b);
*e = ‘ ‘;
b = e;
++b;
}
}

reverse(b);
}
[/js]

Code in full
[js]
#include
#include
#include

using namespace std;

//In-place reverse
void reverse(char* s)
{
char* forwards = s;
char* backwards = s;

for(; *backwards != ‘\0’; ++backwards);
–backwards;
while(forwards < backwards) { *forwards = *forwards ^ *backwards; *backwards = *forwards ^ *backwards; *forwards = *forwards ^ *backwards; ++forwards; --backwards; } } //In-place reverse words void reverseWords(char* s) { reverse(s); char* b = s; char* e = s; for(;*e != '\0'; ++e) { if(*e == ' ') { *e = '\0'; reverse(b); *e = ' '; b = e; ++b; } } reverse(b); } void reverse2(char* s) { char* sEnd = s + strlen(s); std::reverse(s, sEnd); } int main() { { char s[100]; strcpy(s, "abcdefghi"); cout << s << endl; reverse(s); cout << s << endl; reverse2(s); cout << s << endl; } { char s[100]; strcpy(s, "The quick brown fox"); cout << s << endl; reverseWords(s); cout << s << endl; } return 0; } [/js]

23rd May 20187th June 2018

Reverse a queue

Preamble
Reverse a singularly linked list using recursion.

Solution
Its trivial code, but nice now by passing the pointer to head_->_next to ReverseInternal() stores this pointer in the parameter of the call.

Code:
[js]
Node* ReverseInternal(Node *head_)
{
if (head_->_next == nullptr)
{
return head_;
}

Node* newHead = ReverseInternal(head_->_next);
head_->_next->_next = head_;

return newHead;
}

Node* Reverse(Node *head_)
{
if (head_ != nullptr)
{
Node* newHead = ReverseInternal(head_);
head_->_next = nullptr;
return newHead;
}
else
{
return head_;
}
}
[/js]

Full solution

Code:
[js]
#include
using namespace std;

class Node
{
public:
Node(string data_, Node* next_) : _data(data_), _next(next_) {}
string _data;
Node* _next;
};

[js]
Node* ReverseInternal(Node *head_)
{
if (head_->_next == nullptr)
{
return head_;
}

Node* newHead = ReverseInternal(head_->_next);
head_->_next->_next = head_;

return newHead;
}

Node* Reverse(Node *head_)
{
if (head_ != nullptr)
{
Node* newHead = ReverseInternal(head_);
head_->_next = nullptr;
return newHead;
}
else
{
return head_;
}
}

void printList(Node* list_)
{
Node* node = list_;
while(node != nullptr)
{
cout << node->_data << " "; node = node->_next;
}

cout << endl; } int main() { unique_ptr D = make_unique(“D”, nullptr);
unique_ptr C = make_unique(“C”, D.get());
unique_ptr B = make_unique(“B”, C.get());
unique_ptr A = make_unique(“A”, B.get());

printList(A.get());

Node* list = Reverse(A.get());

printList(list);

return 0;
}
[/js]

23rd May 2018

Posting C++ code to a WordPress blog

Preamble
Raw C++ doesn’t post well to WordPress. Needed to find a new way to show code.

Solution
After a little research decided to try Enlighter, which seems to have 9000+ active users.

https://en-gb.wordpress.org/plugins/enlighter/

So far quite impressed with the output.

23rd May 2018

Tree traversals

Preamble
None standard tree traversals. Determine the height of a tree, and height order tree traversal. Often used as interview questions.

Determine height of a tree
Iterate to the bottom of the tree and on returning return the max height of each branch, add one for this current level.

Code:
[js]
int height(Node* node_)
{
if (node_ == nullptr)
{
return 0;
}

int left = height(node_->_left);
int right = height(node_->_right);

return max(left, right) + 1;
}
[/js]

Height order traversal
Iterate over all nodes by using a queue to push each level into the queue as it occurs.

Code:
[js]
queue d;

void heightWiseTransversal(Node* node_)
{
if (node_ == nullptr)
{
return;
}

cout << node_->_data << endl; d.push(node_->_left);
d.push(node_->_right);

Node* frontNode1 = d.front();
d.pop();
heightWiseTransversal(frontNode1);

Node* frontNode2 = d.front();
d.pop();
heightWiseTransversal(frontNode2);
}
[/js]

Full code
Just in case you want to run it.

[js]
#include
#include
#include
using namespace std;

class Node
{
public:
Node(string data_, Node* left_, Node* right_) : _data(data_), _left(left_), _right(right_) {}
string _data;
Node* _left;
Node* _right;
};

queue d;

void heightWiseTransversal(Node* node_)
{
if (node_ == nullptr)
{
return;
}

cout << node_->_data << endl; d.push(node_->_left);
d.push(node_->_right);

Node* frontNode1 = d.front();
d.pop();
heightWiseTransversal(frontNode1);

Node* frontNode2 = d.front();
d.pop();
heightWiseTransversal(frontNode2);
}

int height(Node* node_)
{
if (node_ == nullptr)
{
return 0;
}

int left = height(node_->_left);
int right = height(node_->_right);

return max(left, right) + 1;
}

int main()
{
//Note I keep lifetime management separate from functionality!

unique_ptr G = make_unique(“G4”, nullptr, nullptr);
unique_ptr F = make_unique(“F3”, nullptr, nullptr);
unique_ptr E = make_unique(“E3”, nullptr, nullptr);
unique_ptr D = make_unique(“D3”, G.get(), nullptr);
unique_ptr C = make_unique(“C2”, E.get(), F.get());
unique_ptr B = make_unique(“B2”, D.get(), nullptr);
unique_ptr A = make_unique(“A1”, B.get(), C.get());

cout << "Height wise traversal" << endl; heightWiseTransversal(A.get()); cout << endl; cout << "height = " << height(A.get()) << endl; return 0; } [/js] Output
Height wise traversal
A1
B2
C2
D3
E3
F3
G4

Height = 4

18th May 201818th May 2018

Reasons for std::move() and std::forward()

when to use what?

rvalue references – use move()

universal references – use forward()

universal references

In order to understand both std::move() and std::forward() it is best to understand the idea of a universal reference. An example of a universal reference is a templated rvalue. e.g.

template<typename T>
void wrapper(T&& arg)

It will bind to any type.

Scott Meyers give a great overview of this.

https://isocpp.org/blog/2012/11/universal-references-in-c11-scott-meyers

https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Scott-Meyers-Universal-References-in-Cpp11

lvalue and rvalue-ness is separate from type!

If you consider the function

void fn(Widget&& w)

The type is Widget&&, but w (because it has a name) is an lvalue. NB type and lvalue/rvalue -ness are independent. So if you use w in this function you probably need to use std::move(), or the copy operations will be called.

std::move() – tell the complier to invoke a move operation

std::move() returns a rvalue reference and this binds to the move operation.
template<typename T> remove_references(T)&& move(T&& t)

move() uses a universal reference to bind to anything and returns an r-value, with any extra &&’s removed.

It is equivalent to doing a static_cast to the r-value.

In full the prototype is:
template< class T > constexpr typename std::remove_reference::type&& move(T&& t) noexcept;
And an example of the code is:
template inline typename std::remove_reference::type&& move(_Ty&& _Arg) { return ((typename std::remove_reference::type&&)_Arg); }

std::forward() – tell the complier to do perfect forwarding

std::forward() is defined as

template<typename T> T&& forward(remove_references(T)&& t) – (1)template<typename T> T&& forward(remove_references(T)& t) - (2)

1 – This overload makes it possible to forward a result of an expression (such as function call), which may be rvalue or lvalue, as the original value category of a forwarding reference argument.

However it does not do the correct thing for lvalue references hence we need the overload 2.

2 – Forwards lvalues as lvalues references.

The lvalue binds as a lvalue reference.. The template parameter is thus something like Widget&, and the return value Widget& && => Widget& (via reference collapsing rules) which is what is returned. [Editors note! This is what I understand is happening, but maybe I need to revisit this later!]

how forward is used

The forward() method is used in code like this.
template<class T> void wrapper(T&& arg) { foo(arg); }

Here we want arg forwarded as the same type it was bound. Here arg is an lvalue reference (as it has a name). This is where forward() comes to the rescue. forward() converts arg back to the original type.

template<class T> void wrapper(T&& arg) { foo(forward(arg)); }

In full the prototype of forward is:
template< class T > constexpr T&& forward(typename std::remove_reference::type& t) noexcept;

template< class T > constexpr T&& forward(typename std::remove_reference::type&& t) noexcept;
And an example of the code is:
template <class T> inline T&& forward(typename std::remove_reference<T>::type& t) noexcept { return static_cast<T&&>(t); }

template <class T>
inline T&& forward(typename std::remove_reference<T>::type&& t) noexcept
{
static_assert(!std::is_lvalue_reference<T>::value,
“Can not forward an rvalue as an lvalue.”);
return static_cast<T&&>(t);
}

8th May 20188th May 2018

Speed difference of iterating forwards and backwards on a vector

Preamble

An interesting interview question came up. Is it quicker to iterate a vector forwards or backwards, or is it the same both way?

Spoiler

——————————————————-
Benchmark Time CPU Iterations
——————————————————-
vectorbackwards 237 ns 237 ns 2715315
vectorforwards 96 ns 96 ns 7125770

Investigation

Firstly wrote a simple program with;

for(auto it = a.begin(); it != a.end(); ++it)
{
sum += (*it);
}

for(auto it = a.rbegin(); it != a.rend(); ++it)
{
sum += (*it);
}

And picking out the interesting code from the godbolt compiler code with -O2

.L2: add esi, DWORD PTR [rax] add rax, 4 cmp rax, rdx jne .L2 ... .L3: add esi, DWORD PTR [rax] sub rax, 4 cmp rdx, rax jne .L3

It is obvious from this code the compile only does a ‘sub’ rather than an ‘add’ when going over the loop.
So job done? Iterating both forward and backward is almost identical code? So same speed? Well the compiler thinks so. But wait on! What about cache!

I decided to run the test using google benchmark.

2018-05-08 11:59:48
Running /Users/brombo/git/interviews/benchtest/Debug/benchtest
Run on (8 X 2500 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 262K (x4)
L3 Unified 6291K (x1)
——————————————————-
Benchmark Time CPU Iterations
——————————————————-
vectorbackwards 237 ns 237 ns 2715315
vectorforwards 96 ns 96 ns 7125770

Benchmark code:

#include <benchmark/benchmark.h>
#include
#include

using namespace std;

static int sum = 0;

static void vectorbackwards(benchmark::State& state)
{
vector a{1,2,3,4,5,6,7,8,9,10};

for (auto _ : state)
{
for(auto it = a.rbegin(); it != a.rend(); ++it)
{
sum += (*it);
}
}
}

BENCHMARK(vectorbackwards);

static void vectorforwards(benchmark::State& state)
{
vector a{1,2,3,4,5,6,7,8,9,10};

for (auto _ : state)
{
for(auto it = a.begin(); it != a.end(); ++it)
{
sum += (*it);
}
}
}

BENCHMARK(vectorforwards);

BENCHMARK_MAIN();

Conclusion

Reverse iterating a vector is 2.5x slower than iterating forward, albeit the compiler generated code is almost identical. The effect has to be cache/cpu related.

Caveat

I ran my test on a 2014 MacBook Pro, and it would be good to run the test on a later architecture. It does strike me that this area of CPU/Cache may have improved, but needs testing.

8th May 20188th May 2018

Install google benchmark on a Mid 2014 MacBook Pro

preamble

How do I work out how fast a piece of code is? You can make reasonable guesses, but the only way you ever know is to actually measure it. So decided to install google benchmark.

installation

Installing google benchmark is fairly straight forward. Google benchmark can be obtained from github. https://github.com/google/benchmark

I am building on a Mid 2014, MacBook Pro and it didn’t build straight off. In CMakeLists.txt the entry

# ICC17u2: overloaded virtual function "benchmark::Fixture::SetUp" is only partially overridden # (because of deprecated overload) add_cxx_compiler_flag(-wd654)

seemed to cause a problem.
My compile doesn’t seem to know about -wd654, so commenting out this line allowed the installation to continue. I am not too worried about it as I believe the comment ICC is referring to the intel compiler and I’m using clang.

There appears to be a thread on this at:

https://github.com/google/benchmark/issues/354

My compilation did stop again, but this was due to gtest not being installed. This was straight forward to fix. Build and install a version of gtest.

Conclusion

Google benchmark seems to build on a 2014 MacBook Pro, with minor issues.