Cost of virtual function – 2


Previously I stated that the cost of using a virtual function has gone down, but I have been doing a bit more thinking on this and feel I should add a further comment.

Virtual functions are still not great

Yes the individual cost of calling a virtual function has gone down dramatically with faster ram, caches and cpu branch prediction and architecture (its about 2x the speed of a standard function), but the real cost of a virtual function appears to lie in what it actually does to the compilation of the code. Placing a virtual function call in the code stops the compiler from inlining/optimizing the code at that point.

It is probably simpler to consider, would you consider adding an ‘if’ to you code if it isn’t really needed. It 1) adds complexity, 2) increases latency. A virtual function does exactly the same.

Functionality provided by language features

Virtual functions provide a lot of functionality when writing the code but this can often be achieved using other C++ language features.

virtual functions – decoupling, delegation, code reuse, common interface variable.

functions – delegation.

classes – decoupling.

templates – code reuse.

type erasure – common interface variable.

So it might be worth considering using only the required language feature at the required time, in order to write cleaner code, and avoid the proliferation of class hierarchies in code.

A nice example of writing better code without virtual functions is shown here, I stole some of his ideas in this blog entry! :


Virtual functions are useful for runtime polymorphism, but often this is not required and really what is required is code reuse (templates). Virtual functions provide a lot of functionality, often more than needed.

Having said that, if you need a virtual function, don’t be afraid to use one! Its a wonderful feature of the language.

We are all STILL learning to write code!!!

Cost of virtual function


Speed currently is about 20ns but was an order of magnitude different a few years ago. Things change!


Doing some googling on the subject you can find older pieces that specify that virtual functions as very in-efficent (circa 1995) mainly due to the compiler having to throw out the instruction cache and restart each time. This however becomes less evident using a super scalar architecture and really we are then left with whether the instructions are in the code cache or not, which is probably at worst in L3 cache unless your code is large.


And if you want to read a LOT about optimisations.


Virtual functions aren’t the bottleneck they once were, but as still not as fast as direct function calls. Really depends in what problem domain you are working. <100us, use a virtual function.  <10us you have to be a bit more careful. But a better design will always trump everything.

Memory access times


Looks like speeds to main memory are getting faster….

Main article from StackOverflow has a lot more detail.

Core i7 Xeon 5500 Series

Core i7 Xeon 5500 Series Data Source Latency (approximate) [Pg. 22]

local L1 CACHE hit, ~4 cycles ( 2.1 – 1.2 ns )
local L2 CACHE hit, ~10 cycles ( 5.3 – 3.0 ns )
local L3 CACHE hit, line unshared ~40 cycles ( 21.4 – 12.0 ns )
local L3 CACHE hit, shared line in another core ~65 cycles ( 34.8 – 19.5 ns )
local L3 CACHE hit, modified in another core ~75 cycles ( 40.2 – 22.5 ns )

remote L3 CACHE (Ref: Fig.1 [Pg. 5]) ~100-300 cycles ( 160.7 – 30.0 ns )

local DRAM ~60 ns
remote DRAM ~100 ns


0.5 ns - CPU L1 dCACHE reference
1 ns - speed-of-light (a photon) travel a 1 ft (30.5cm) distance
5 ns - CPU L1 iCACHE Branch mispredict
7 ns - CPU L2 CACHE reference
71 ns - CPU cross-QPI/NUMA best case on XEON E5-46*
100 ns - MUTEX lock/unlock
100 ns - own DDR MEMORY reference
135 ns - CPU cross-QPI/NUMA best case on XEON E7-*
202 ns - CPU cross-QPI/NUMA worst case on XEON E7-*
325 ns - CPU cross-QPI/NUMA worst case on XEON E5-46*
10,000 ns - Compress 1K bytes with Zippy PROCESS
20,000 ns - Send 2K bytes over 1 Gbps NETWORK
250,000 ns - Read 1 MB sequentially from MEMORY
500,000 ns - Round trip within a same DataCenter
10,000,000 ns - DISK seek
10,000,000 ns - Read 1 MB sequentially from NETWORK
30,000,000 ns - Read 1 MB sequentially from DISK
150,000,000 ns - Send a NETWORK packet CA -> Netherlands
| | | |
| | | ns|
| | us|
| ms|



Basic Performance counting on Intel Architecture


Performance monitoring is becoming an increasing part of low latency C++.  As the speed and flexibility of CPUs have increased the ability of existing technologies to monitor code performance has been found wanting.

Timing code using gmtime() causes a kernel call and delivers insufficiently accurate times for fast executing code.

Valgrind simulates the processor but not down to the actually CPU which may say reduce its clock speed due to local heating or some other weird hardware event. Also running instrumented code can be extremely slow.

Intel has gone down the vTune route which under the hood uses msr functions. These read the actual hardware counters stored by the CPU.  As intel are doing this themselves we can be sure that this functionality will be around for the foreseeable future and can look to leverage it.

asm: rdtsc – ReaD Time Stamp Counter

This is the simplest instruction which allows us to measure accurately timing between start and end of two points.

There is a link to some code cycle.h at the bottom of the text, which shows how to execute the instruction.

asm: rdmsr/wrmsr – ReaD/WRite Model Specific Register

It appears that rdmsr/wrmsr were initially aimed at memory bank switching and alike, but as time has gone on the functions allows access to specific hardware statistics.

asm: rdpmc – ReaD Performance Message Counter

Using rdtsc will give an indication of how the code performs but may not give specific details of any bottlenecks caused by instructions/cache misses. rdpmc allows for a finer grain of examine the processor. Execution time appears to be 24-40 cycles.

Example how to read the performance counters.

from the above page:

// rdpmc_instructions uses a "fixed-function" performance counter to return the count of retired instructions on
//       the current core in the low-order 48 bits of an unsigned 64-bit integer.
unsigned long rdpmc_instructions()
   unsigned a, d, c;

   c = (1<<30);
   __asm__ volatile("rdpmc" : "=a" (a), "=d" (d) : "c" (c));

   return ((unsigned long)a) | (((unsigned long)d) << 32);;

// rdpmc_actual_cycles uses a "fixed-function" performance counter to return the count of actual CPU core cycles
//       executed by the current core.  Core cycles are not accumulated while the processor is in the "HALT" state,
//       which is used when the operating system has no task(s) to run on a processor core.
unsigned long rdpmc_actual_cycles()
   unsigned a, d, c;

   c = (1<<30)+1;
   __asm__ volatile("rdpmc" : "=a" (a), "=d" (d) : "c" (c));

   return ((unsigned long)a) | (((unsigned long)d) << 32);;

// rdpmc_reference_cycles uses a "fixed-function" performance counter to return the count of "reference" (or "nominal")
//       CPU core cycles executed by the current core.  This counts at the same rate as the TSC, but does not count
//       when the core is in the "HALT" state.  If a timed section of code shows a larger change in TSC than in
//       rdpmc_reference_cycles, the processor probably spent some time in a HALT state.
unsigned long rdpmc_reference_cycles()
   unsigned a, d, c;

   c = (1<<30)+2;
   __asm__ volatile("rdpmc" : "=a" (a), "=d" (d) : "c" (c));

   return ((unsigned long)a) | (((unsigned long)d) << 32);;

Inevitable full documentation

Section 18.2 of Volume 3


Synology gitlab – Fix – GitLab 502 error – fails to come up.


Installed gitlab. Mostly straight forward, but if it crashes or the box is powered off, then gitlab will not come up again.


Gitlab runs in docker so you need to log onto the docker and delete the unicorn process id.

sudo docker exec -it synology_gitlab bash

rm /home/git/gitlab/tmp/pids/

I found this on the blog below.


Gitlab appears to use a LOT of memory and on a small box, such as a synology, this may not be great. I will have to see how it goes as only have 4GB at  present.

Gitlab uses file to track if the process is running. This is fairly standard way of doing things, but if something bad happens this isn’t cleaned up. This isn’t a great deployment solution. I might look at writing a startup script to always remove this. This however is just a patch and synology really should look at making this package more robust.

Install GDB 8.1/8.0.1 in Eclipse Oxygen 2 on High Sierra 10.13.3,

There appears to be a lot about installing gdb into eclipse on the web. However to actually get it to work is a little tricky. Basically it comes down to gdb 8.1 doesn’t work and you need to go to an earlier version 8.0.1

The basic instructions are.
1) Install Xcode
2) Install eclipse.
3) install gdb using homebrew. (Need to install 8.0.1) see link below. Latest doesn’t work.
4) Create a certificate to sign gdb. This again is tricky as creating the certificate under system doesn’t directly work, so you need to install it under login and then drag and drop it to system.
5) You don’t need to mess around with “csrutil enable –without debug”.

Caveats I learnt: If you remove a the signing from the gdb executable, you need to reinstall it or you can’t resign it. I believe you can however re-sign it if it is still signed.

Cut and paste from the page:
Thanks to lokoum

OK I got it. Thank you @marcoparente !!
$brew uninstall –force gdb
$brew install
$gdb –version
This should show 8.0.1
$codesign -fs [cert-name] /usr/local/bin/gdb
and add
‘set startup-with-shell off’ in ~/.gdbinit

That’s it !!!



Been busy just not posting. Installed SMTP server.


I have several domains but need to be able to send and receive from the all.
The solution appears to be to run Mail Server Plus. Note don’t use the standard one as that is old. It is fairly simple to install, plus you need to poke a few holes through the firewall so the mail can be send to my server. It is however useful to setup the postfix stuff which allows mail to received and forward. This is documented in

It must be noted you can assign names such as

[email protected] andrew [email protected]

so it will send [email protected] to 2 recipients one even being forward out onto the SMTP outbound.

I default my second mail server, at a lower priority, to the domain email routing server of the supplier. So if my server goes down. the domain server will still forward to my mail, to say a gmail account.


SMTP seems to be working ok, but a bit worried about being hacked.

Domains and email forwarding

Looking for a domain, and needing to forward e-mail.

I have been looking for a personal domain to start another blog and I found this site. which does a price comparison on website prices.
Needless to say this lead me to buying another domain from godaddy for £3 + tax. Was shocked at the tax as wasn’t expecting to pay another 60p! 🙂
GoDaddy is a reasonable supplier but for this budget price they don’t seem to provide basic e-mail forwarding. They provide DNS utils so it was simple to add a MX record and forward my mail to and then from there bounce it to my normal account.