SMP patch for NETISR, FreeBSD 7.x
by Alter (alterX@alter.org.ua (remove X))
Network dispatch netisr can now be paralellized on SMP machines.
Since netisr was not changed for a long time, this patch may bu suitable for
earlier versions of kernel.
Netisr behavior is now managed via sysctl:
Also added one specific option:
Fixed compilation bugs for uniprocessor systems (there were several references to SMP-specific variables)
I've made unified diff to satisfy FreeBSD development standards
FreeBSD bugtrack ID: 156769
The 1st version of patch
How to cook this (some words about optimization)
Some more details:
- net.isr.direct -
When this variable is set to 1, system performs all outgoing packets processing immetiately
on send attempt, in the same execution context.
Setting to 0 makes outgoing packets to be placed in the queue, This queue is processed by separate
parallel thread(s). It worth enabling this option if number of CPU cores is less or equal to number
of physical network cards. This option improves TRANSMIT performance. It is good for servers, but
not very good for routers. If you have unbalanced NIC load or free CPU core, you should disable this option
to spread load among CPUs.
If you enable net.inet.ip.fastforwarding together with net.isr.direct you well meet one side-effect
- net.isr.direct_arp - (new)
process outgoing ARP packets immediately.
Should be always enabled. I can't imagine why ARP should be queued.
This options added in order to process ARP requests with higher priority when
- net.isr.maxthreads - (new)
number of parallel threads procesing packet queue.
Should be less than number of CPU cores. You should perform some system monitoring to determine
how much resources are consumed by NIC and other hardware interrupts, software (DB, httpd and other daemons)
and use rest for routing.
For example, if we have 4 cores, NICs has 80% interupts from 2 cores and software has 100% of one core,
then we definitly have 140% or more free CPU time. In such case we can distribute packet processing between 2 cores.
If you have heavy routing load, but lower (or less important) software load, you may use evet 3 threads.
- net.inet.ip.fastforwarding -
process incoming packets immediately (including ipfw) in context of interrupt service routine before
passing to netisr queue. Like net.isr.direct, this option should be used when number of
CPU cores is less or equal than number of NICs. Impacts INCOMING traffic performance. Note, that
not all NICs can correctly queue incoming packets while current packet is under processing.
If you have net.inet.ip.fastforwarding enabled, you will meet the following side-effect:
routing and passing packet through ipfw are precessed in the same execution context inside
interrupt service routine. It means, that processing of new incoming packets is blocked during this time.
Such behavior is efective when you have mainly incoming treaffic or have the only CPU core in system.
- net.inet.ip.dummynet.io_fast -
process packets in original execution context if average traffic doesn't exceed specified limit.
If you do not emulate delays and packet loss, this option saves a lot of resources.
Using this option practically eliminates need of parallel dummynet.
If you have ipfw patch, additional option is available.
Setting this value to 2 prevents placing packeets in dummynet queue at all.
- net.inet.ip.intr_queue_maxlen, net.route.netisr_maxqlen -
these values manage maximum incoming and outgoing queue sizes. You should increase these values on high routing load.
System drops new packets when queue is full. So, greater values prevents (reduces) packet loss on peak load.
- kern.polling.enable -
enable interrupt polling. When enabled, interrupt service routines check if some new interrupt has come
while previuos one is processed. If so, new interrupt is services in the same execution context.
Such technology reduces thread/context switching overhead. On the other hand, we lose system response
time. It can happens, that significant amount of time CPU services interrupts and has not enought time to
service user-land applications. Such situation is observed as very slow system, while traffic is passed
through the server without any problem. According to my experience, this option may be useful
only on uniproicessor routers without running applications.
2 cores + 2 intel NICs (em)
8 cores + 2 Broadcom NICs (bge)
All above are just common recomendations. There is no the only one true way.
You should tune parameters for each specific task and hardware. The most resource-consuming
tasks should be moved from interrupt service to separate threads. Another way is specific
hardware specially designed for multi-thread and SMP processing.
Also, don't forget about ipfw rules and other optimisation.
Mail to alterX@alter.org.ua (remove X)