Dynamic mbuf cluster limit

The NetBSD kernel uses special memory buffers (mbufs) for network operations. Storage for large packets is allocated as clusters, typically 2KB in size.

Since forever, the limit for the number of clusters was static. Depending on the architecture and the presence of the GATEWAY option, the kernel used at most 4MB for the various forms of network IO. This limit was easy to exhaust, e.g. by starting a Bittorrent client with file descriptor limit of 1024.

This limit exists to avoid exhausting the system memory by a remote attack. It couldn't be raised on most architectures, because the kernel reserved a fixed amount of virtual address space at boot time.

On some architectures, this is completely unnecessary, because memory pools as used for mbuf clusters, but also other common data structures, are using a special direct mapping. This means that any given physical memory address can be easily converted into a virtual address -- without having to modify the page tables.

Other architectures can just grow the kernel memory map or have a huge reservation for the normal kernel memory allocator. AMD64 is such an architecture. By default, the kernel reserves up to 1GB for internal allocations, so it can just allocation the address space for mbuf clusters from the same range.

Over all, removing the special kernel submap was an easy exercise, only two architectures needed special care. Both i386 and the ARM family lack direct mappings and share address space between kernel and userland. On i386, the kernel is allowed to use only 512MB, including mapping device memory and the like. An additional limit was therefore needed on this architectures.

As a result, kern.mbuf.nmbclusters can now be increased at run time with sysctl(8). Some limits are enforced to prevent resource starvation. Basically, at most 1/4 of all memory can be used for mbuf clusters. Performance concerns were raised for architectures without direct mapping due to higher lock contention on the kernel memory map, but the normal pool cache makes that a rare event.