
This post is a repost of technical blog originally published by Denys Haryachyy, shared here with permission as part of ongoing research and engineering work around FastNetMon’s inline traffic processing capabilities.
The article examines the underlying performance mechanics needed to run DPDK-based packet processing at 100GbE without packet loss, and how subtle operating system and interrupt behaviour can directly impact stability under extreme load. It is closely related to our R&D project, code-named FastACL, which builds on these same principles at the system level to deliver a VPP-based inline DDoS filtering engine.
Learning DPDK: Eliminating NIC Receive Drops at 100GbE
TL;DR — everything that matters, in one paragraph. A DPDK poll-mode application can lose packets at 100GbE even when the CPU has spare cycles — because a hardware IRQ landing on a busy-poll worker core stalls it for a few microseconds, and at 100+ Mpps that’s enough to overflow the NIC’s RX descriptor ring before the worker can drain it. The fix is isolation, not more CPU. Pinning the NIC’s completion IRQs off the worker cores (onto a housekeeping core) cut the receive-miss rate from 1 in 2,275 packets to 1 in 16,000,000 — about a 7,000× reduction. To take the residual to essentially zero, isolate the worker cores from the kernel entirely with boot parameters:
isolcpus,nohz_full,rcu_nocbs,irqaffinity=0, andprocessor.max_cstate=1. None of this costs throughput — it just stops anything from interrupting the cores that poll the NIC.
A DPDK worker is a tight loop that does nothing but poll the NIC and process packets. It has no slack: if the OS steals it for even a few microseconds, the NIC keeps filling the RX ring with no one draining it. At low rates that’s invisible; at 100GbE the ring overflows and you get silent drops.

rx_missed rises.
rx_missed drops to ≈ 0.The Symptom: rx_missed Under Load
The tell is the NIC’s rx_missed (a.k.a. rx_missed_errors / PHY discards) counter rising under load while CPU utilization shows headroom. The packets never reach the application — the NIC dropped them because the RX descriptor ring was full when they arrived. More workers won’t help; the cores aren’t saturated. Something is interrupting them.
The Cause: Hardware IRQs on Poll Cores
Even with a bifurcated or poll-mode driver, the NIC still raises completion/async interrupts, and by default the kernel is free to deliver them to any core — including the ones running your DPDK workers. On the box behind these numbers, the mlx5 completion IRQs defaulted onto cores 11–13, right on top of busy-poll workers (Figure 1).
Each IRQ preempts the poll loop for only a few microseconds. But do the math: at ~30 Mpps per queue, a single 100 µs stall is **~3,000 packets** — and an 8,192-entry RX ring fills in well under that. One stray interrupt during a burst is a ring overflow.
Pinning IRQs Off the Workers
The first and biggest win is to keep NIC IRQs away from worker cores. Steer every device IRQ to a housekeeping core (core 0) by writing its smp_affinity:
# Send every mlx5 IRQ to core 0 (mask 0x1)for irq in $(grep -l mlx5 /proc/irq/*/* 2>/dev/null | grep -o '[0-9]\+' ); do echo 1 > /proc/irq/$irq/smp_affinity 2>/dev/nulldone# and stop irqbalance from moving them backsystemctl stop irqbalance
The effect on this hardware:
| State | rx-miss rate | Drop rate |
|---|---|---|
| Before IRQ pin | 1 / 2,275 pkts | 0.044 % |
| After IRQ pin | 1 / 16,000,000 pkts | 0.0000062 % |
That’s a ~7,000× reduction from one change — and irqbalance must be stopped, or it will quietly reassign the IRQs back onto the workers a minute later.
Full Isolation: isolcpus, nohz_full, rcu_nocbs
To take the residual misses to essentially zero, remove the worker cores from the kernel’s reach at boot. Append to the kernel command line (/etc/default/grub, then update-grub and reboot):
isolcpus=1-32 nohz_full=1-32 rcu_nocbs=1-32 \irqaffinity=0 processor.max_cstate=1
Each one closes a different interruption source:
isolcpus=1-32— keep the scheduler from placing any other task on the worker cores.nohz_full=1-32— stop the periodic 1 kHz timer tick on those cores (no per-millisecond interrupt).rcu_nocbs=1-32— move RCU callback processing off the worker cores onto housekeeping cores.irqaffinity=0— default all IRQs to core 0 from boot, before userspace even starts.processor.max_cstate=1— forbid deep C-states, so a core never takes 50–200 µs to wake from idle.
Use the same core range as your DPDK workers. Together these guarantee that the only thing ever running on a worker core is the poll loop — which is the entire point.
Summary
- NIC drops at 100GbE are usually interruption, not CPU saturation —
rx_missedrises with cores to spare. - A hardware IRQ stalls a busy-poll worker for µs; at 30 Mpps/queue that overflows the RX ring.
- Pin NIC IRQs to a housekeeping core (
smp_affinity) and stopirqbalance— here a ~7,000× drop reduction. - Isolate the worker cores at boot:
isolcpus,nohz_full,rcu_nocbs,irqaffinity=0,processor.max_cstate=1. - None of it costs throughput — it removes everything that competes with the poll loop.
References
- DPDK — Linux core isolation for performance —
isolcpusand related boot options for DPDK apps. - Linux kernel —
nohz_full(NO_HZ) documentation — tickless operation on isolated cores. - Linux kernel parameters (
isolcpus,rcu_nocbs,irqaffinity) — the boot-cmdline reference.






