Blog – Page 3 – Brombo Software development

2nd May 20182nd May 2018

Cost of virtual function – 2

Preamble

Previously I stated that the cost of using a virtual function has gone down, but I have been doing a bit more thinking on this and feel I should add a further comment.

Virtual functions are still not great

Yes the individual cost of calling a virtual function has gone down dramatically with faster ram, caches and cpu branch prediction and architecture (its about 2x the speed of a standard function), but the real cost of a virtual function appears to lie in what it actually does to the compilation of the code. Placing a virtual function call in the code stops the compiler from inlining/optimizing the code at that point.

It is probably simpler to consider, would you consider adding an ‘if’ to you code if it isn’t really needed. It 1) adds complexity, 2) increases latency. A virtual function does exactly the same.

Functionality provided by language features

Virtual functions provide a lot of functionality when writing the code but this can often be achieved using other C++ language features.

virtual functions – decoupling, delegation, code reuse, common interface variable.

functions – delegation.

classes – decoupling.

templates – code reuse.

type erasure – common interface variable.

So it might be worth considering using only the required language feature at the required time, in order to write cleaner code, and avoid the proliferation of class hierarchies in code.

A nice example of writing better code without virtual functions is shown here, I stole some of his ideas in this blog entry! :

Conclusion

Virtual functions are useful for runtime polymorphism, but often this is not required and really what is required is code reuse (templates). Virtual functions provide a lot of functionality, often more than needed.

Having said that, if you need a virtual function, don’t be afraid to use one! Its a wonderful feature of the language.

We are all STILL learning to write code!!!

20th April 201820th April 2018

Cost of virtual function

Preamble

Speed currently is about 20ns but was an order of magnitude different a few years ago. Things change!

Information

Doing some googling on the subject you can find older pieces that specify that virtual functions as very in-efficent (circa 1995) mainly due to the compiler having to throw out the instruction cache and restart each time. This however becomes less evident using a super scalar architecture and really we are then left with whether the instructions are in the code cache or not, which is probably at worst in L3 cache unless your code is large.

from

http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/

http://ithare.com/wp-content/uploads/part101_infographics_v06.png

And if you want to read a LOT about optimisations.

http://www.agner.org/optimize/optimizing_cpp.pdf

Conclusion

Virtual functions aren’t the bottleneck they once were, but as still not as fast as direct function calls. Really depends in what problem domain you are working. <100us, use a virtual function. <10us you have to be a bit more careful. But a better design will always trump everything.

18th April 201818th April 2018

Memory access times

Preamble

Looks like speeds to main memory are getting faster….

Main article from StackOverflow has a lot more detail.

Core i7 Xeon 5500 Series

Core i7 Xeon 5500 Series Data Source Latency (approximate) [Pg. 22]

local L1 CACHE hit, ~4 cycles ( 2.1 – 1.2 ns )
local L2 CACHE hit, ~10 cycles ( 5.3 – 3.0 ns )
local L3 CACHE hit, line unshared ~40 cycles ( 21.4 – 12.0 ns )
local L3 CACHE hit, shared line in another core ~65 cycles ( 34.8 – 19.5 ns )
local L3 CACHE hit, modified in another core ~75 cycles ( 40.2 – 22.5 ns )

remote L3 CACHE (Ref: Fig.1 [Pg. 5]) ~100-300 cycles ( 160.7 – 30.0 ns )

local DRAM ~60 ns
remote DRAM ~100 ns

General

0.5 ns - CPU L1 dCACHE reference 1 ns - speed-of-light (a photon) travel a 1 ft (30.5cm) distance 5 ns - CPU L1 iCACHE Branch mispredict 7 ns - CPU L2 CACHE reference 71 ns - CPU cross-QPI/NUMA best case on XEON E5-46* 100 ns - MUTEX lock/unlock 100 ns - own DDR MEMORY reference 135 ns - CPU cross-QPI/NUMA best case on XEON E7-* 202 ns - CPU cross-QPI/NUMA worst case on XEON E7-* 325 ns - CPU cross-QPI/NUMA worst case on XEON E5-46* 10,000 ns - Compress 1K bytes with Zippy PROCESS 20,000 ns - Send 2K bytes over 1 Gbps NETWORK 250,000 ns - Read 1 MB sequentially from MEMORY 500,000 ns - Round trip within a same DataCenter 10,000,000 ns - DISK seek 10,000,000 ns - Read 1 MB sequentially from NETWORK 30,000,000 ns - Read 1 MB sequentially from DISK 150,000,000 ns - Send a NETWORK packet CA -> Netherlands | | | | | | | ns| | | us| | ms|

Source

https://stackoverflow.com/questions/4087280/approximate-cost-to-access-various-caches-and-main-memory

https://gist.github.com/jboner/2841832

12th April 201813th April 2018

Basic Performance counting on Intel Architecture

Preamble

Performance monitoring is becoming an increasing part of low latency C++. As the speed and flexibility of CPUs have increased the ability of existing technologies to monitor code performance has been found wanting.

Timing code using gmtime() causes a kernel call and delivers insufficiently accurate times for fast executing code.

Valgrind simulates the processor but not down to the actually CPU which may say reduce its clock speed due to local heating or some other weird hardware event. Also running instrumented code can be extremely slow.

Intel has gone down the vTune route which under the hood uses msr functions. These read the actual hardware counters stored by the CPU. As intel are doing this themselves we can be sure that this functionality will be around for the foreseeable future and can look to leverage it.

asm: rdtsc – ReaD Time Stamp Counter

This is the simplest instruction which allows us to measure accurately timing between start and end of two points.

https://en.wikipedia.org/wiki/Time_Stamp_Counter

There is a link to some code cycle.h at the bottom of the text, which shows how to execute the instruction.

asm: rdmsr/wrmsr – ReaD/WRite Model Specific Register

It appears that rdmsr/wrmsr were initially aimed at memory bank switching and alike, but as time has gone on the functions allows access to specific hardware statistics.

https://en.wikipedia.org/wiki/Model-specific_register

asm: rdpmc – ReaD Performance Message Counter

Using rdtsc will give an indication of how the code performs but may not give specific details of any bottlenecks caused by instructions/cache misses. rdpmc allows for a finer grain of examine the processor. Execution time appears to be 24-40 cycles.

Example how to read the performance counters.

https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/595214

from the above page:

// rdpmc_instructions uses a "fixed-function" performance counter to return the count of retired instructions on
//       the current core in the low-order 48 bits of an unsigned 64-bit integer.
unsigned long rdpmc_instructions()
{
   unsigned a, d, c;

   c = (1<<30);
   __asm__ volatile("rdpmc" : "=a" (a), "=d" (d) : "c" (c));

   return ((unsigned long)a) | (((unsigned long)d) << 32);;
}

// rdpmc_actual_cycles uses a "fixed-function" performance counter to return the count of actual CPU core cycles
//       executed by the current core.  Core cycles are not accumulated while the processor is in the "HALT" state,
//       which is used when the operating system has no task(s) to run on a processor core.
unsigned long rdpmc_actual_cycles()
{
   unsigned a, d, c;

   c = (1<<30)+1;
   __asm__ volatile("rdpmc" : "=a" (a), "=d" (d) : "c" (c));

   return ((unsigned long)a) | (((unsigned long)d) << 32);;
}

// rdpmc_reference_cycles uses a "fixed-function" performance counter to return the count of "reference" (or "nominal")
//       CPU core cycles executed by the current core.  This counts at the same rate as the TSC, but does not count
//       when the core is in the "HALT" state.  If a timed section of code shows a larger change in TSC than in
//       rdpmc_reference_cycles, the processor probably spent some time in a HALT state.
unsigned long rdpmc_reference_cycles()
{
   unsigned a, d, c;

   c = (1<<30)+2;
   __asm__ volatile("rdpmc" : "=a" (a), "=d" (d) : "c" (c));

   return ((unsigned long)a) | (((unsigned long)d) << 32);;
}

Inevitable full documentation

Section 18.2 of Volume 3

https://www.intel.co.uk/content/www/uk/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html

27th March 201827th March 2018

Size of C++ numeric types

Best description of C++ numeric data type sizes I can find.

http://en.cppreference.com/w/cpp/language/types#Data_models

17th March 2018

Synology gitlab – Fix – GitLab 502 error – fails to come up.

Preamble

Installed gitlab. Mostly straight forward, but if it crashes or the box is powered off, then gitlab will not come up again.

Solution

Gitlab runs in docker so you need to log onto the docker and delete the unicorn process id.

sudo docker exec -it synology_gitlab bash

rm /home/git/gitlab/tmp/pids/unicorn.pid

I found this on the blog below.

https://blog.stead.id.au/2017/03/synology-gitlab-error-502.html

comment

Gitlab appears to use a LOT of memory and on a small box, such as a synology, this may not be great. I will have to see how it goes as only have 4GB at present.

Gitlab uses unicorn.pid file to track if the process is running. This is fairly standard way of doing things, but if something bad happens this unicorn.pid isn’t cleaned up. This isn’t a great deployment solution. I might look at writing a startup script to always remove this. This however is just a patch and synology really should look at making this package more robust.

13th March 2018

SC-DRF (sequential consistency – Data Race Free)

Preamble

Herb Sutter gives a nice talk on C++11 and Sequential Consistency. The C++11 standard codifies what this means, and thus provides a consistency across all compilers (which are standard compliant).

9th March 2018

Install GDB 8.1/8.0.1 in Eclipse Oxygen 2 on High Sierra 10.13.3,

There appears to be a lot about installing gdb into eclipse on the web. However to actually get it to work is a little tricky. Basically it comes down to gdb 8.1 doesn’t work and you need to go to an earlier version 8.0.1

The basic instructions are.
1) Install Xcode
2) Install eclipse.
3) install gdb using homebrew. (Need to install 8.0.1) see link below. Latest doesn’t work.
4) Create a certificate to sign gdb. This again is tricky as creating the certificate under system doesn’t directly work, so you need to install it under login and then drag and drop it to system.
5) You don’t need to mess around with “csrutil enable –without debug”.

Caveats I learnt: If you remove a the signing from the gdb executable, you need to reinstall it or you can’t resign it. I believe you can however re-sign it if it is still signed.

https://gist.github.com/gravitylow/fb595186ce6068537a6e9da6d8b5b96d

Cut and paste from the page:
Thanks to lokoum

OK I got it. Thank you @marcoparente !!
$brew uninstall –force gdb
$brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/c3128a5c335bd2fa75ffba9d721e9910134e4644/Formula/gdb.rb
$gdb –version
This should show 8.0.1
$codesign -fs [cert-name] /usr/local/bin/gdb
and add
‘set startup-with-shell off’ in ~/.gdbinit

That’s it !!!

14th December 201726th December 2017

SMTP

Preamble

Been busy just not posting. Installed SMTP server.

Solution

I have several domains but need to be able to send and receive from the all.
The solution appears to be to run Mail Server Plus. Note don’t use the standard one as that is old. It is fairly simple to install, plus you need to poke a few holes through the firewall so the mail can be send to my server. It is however useful to setup the postfix stuff which allows mail to received and forward. This is documented in

https://forum.synology.com/enu/viewtopic.php?t=128179

It must be noted you can assign names such as

[email protected] andrew [email protected]

so it will send [email protected] to 2 recipients one even being forward out onto the SMTP outbound.

I default my second mail server, at a lower priority, to the domain email routing server of the supplier. So if my server goes down. the domain server will still forward to my mail, to say a gmail account.

Conclusion

SMTP seems to be working ok, but a bit worried about being hacked.

16th July 201716th July 2017

Domains and email forwarding

Preamble
Looking for a domain, and needing to forward e-mail.

Actions
I have been looking for a personal domain to start another blog and I found this site.
https://www.domcomp.com which does a price comparison on website prices.
Needless to say this lead me to buying another domain bromhead.me from godaddy for £3 + tax. Was shocked at the tax as wasn’t expecting to pay another 60p! 🙂
GoDaddy is a reasonable supplier but for this budget price they don’t seem to provide basic e-mail forwarding. They provide DNS utils so it was simple to add a MX record and forward my mail to http://improvmx.com and then from there bounce it to my normal account.