Conversion of unsigned int to float appears to be difficult for x86-64 processor. However, signed int to float is directly supported in the hardware.
The GCC compiler is clever enough to use the hardware instruction CVTSI2SS. However in the case of unsigned int to float, it needs to treat numbers which would be negative in two’s complement notation differently. So the generated code contains a branch!
Doing this with random numbers means the branch can go either way, with 50% probability. No amount of hardware branch prediction can help with this!
Rewriting the code to use signed integers instead of unsigned ones (basically, dropping the most significant bit) speeded up some critical piece of code by 10%.
A very noticeable improvement!
After scouring the net to find some decent-looking PHP-based Forum software, I eventually came across Simple Machines Forum (SMF).
So I messed with it all last night, and this night, and got it more or less functional. The first few users have registered as of late this evening!
There are still unresolved issues:- how to push articles from the forum into the mailing list, and perhaps some way to pull the activity from the mailing list into the forum [less the spam, of course!].
But there are a still a few other pressing problems to address, before we will get around to look at this, such as getting email notifications to work [would be nice to get the article-notifications straight to the smart phone, for instance!].
Plus, we’re still scouting for some nicer themes.
Some time ago, I wrote an implementation of the “half float” class. This implements 16-bit floating point numbers, particularly conversion of these to and from single precision 32-bit floats, which are supported by most hardware.
The crucial idea was to shrink the necessary tables enough to have a reasonable chance of fitting them into processory caches. Previous solutions I’ve seen used huge tables. Spilling out of cache incurs a big speed penalty, and thus a simple algorithm using a very large table is not necessarily faster than a more complex algorithm with a much smaller table.
The current state of affairs is that 1536 bytes (1024 bytes for the base table, and 512 bytes fot the shift table) for the float to half conversion, and 8576 bytes (three tables of 8192 bytes, 256 bytes, and 128 bytes, respectively) for the half to float conversion.
I have the feeling it should be possible to do this with smaller tables, however!
The original reference is in the paper: fasthalffloatconversion.pdf.
While GCC still generates faster code (for me, at least!), LLVM does generate nice warnings.
In fact, LLVM caught two problems for me today that GCC overlooked. Self-comparisons, like:
Technically, this is of course perfectly legal C++. But its highly suspect! LLVM flags this as a “tautological” comparison. Very good warning.
On the other hand, LLVM seems to be a bit confused with alignment declarations. Basically, alignments should apply to storage declarations (even if specified for the type; obviously alignment can’t apply to a type, only a storage location containing a type).
Thus causes lots of alignment warnings when using SSE code, even when using unaligned loads and store.
For the past few years, the FOX website has been served up by a (underpowered) 200MHz PowerPC (LinkStation aka KuroBox, actually).
Its been very reliable,but the software on it had been frozen in time, and not easily upgradable (especially the operating system itself). So when a few months ago the machine finally gave up the ghost, it wasn’t a big decision to upgrade to something a bit newer, and more easily updated.
The new server is a very low power dual core Atom 525. Its been running for a while already, running FOX’s git repository. Now the entire FOX web site has been moved over and is fully functional.
The next goal is to set up some content management to slowly migrate some of the static web pages over to. The first thing being added is this blog.