How to divide fast

The computation of a / b or a %b is one of the less common operations in many programs. Nevertheless it is part of many time critical operations. Many hash tables for example have a prime number as size and require the computation of the remainder. The division instructions are generally also the slowest integer operations CPUs offer and by a far margin.

The price of the division makes it attractive to check for cheaper alternatives. For constants, GCC and other compilers will replace unsigned divisions with simpler code based on an old paper. This doesn't help with numbers that are variable, but invariant at runtime. For resizable hash tables the resizing is a rare operation and the hot path (hashing) could benefit greatly from using this optimisations.

This leads to the new functions in sys/bitops.h: fast_divide32, fast_remainder32 and fast_divide32_prepare. The third function computes the constants needed for the other two. Benchmarking with the services(5) lookup code using CDB shows a 10% improvement for getservbyport. This is partially a result of the better scheduling as multiplications can run in parallel with other instructions and utilize the pipelines better.