Performance degradation when buffering netBufs

Hi,

I’m having a perf issue when capturing large amount of small packets. The card I’m using is Napatech A/S NT20E3-2-PTP. I’m testing with a single Rx stream, and about two million 64 byte-packets is received each second. I am using NT_NetRxGet-function to retrieve packets.

When releasing netBuf (NT_NetRxRelease) instantly after handling a packet, I am able to handle the traffic the same speed it’s received by NIC. With this kind of setup I am able to handle about 4.5 million packets per second (100% CPU utilization).

However, if the netBuf is buffered, then NT_NetRxGet-calls start burning lots of CPU cycles. With this kind of setup I’m able to handle only about 1 million packets per second.

Here is a Perf report about CPU cycle consumption of the NT_NetRxGet calls:
– 67.21% CaptureWorker::run_loop
– 32.72% NtPacketStream::receive_packet
– 32.39% _Get
– 31.98% _GetPacket
– 30.62% _GetNewElem
– 29.87% _GetNetworkDataListElement (inlined)
– _AllocateElement
– 28.68% Nt_posix_memalign
– 28.55% __posix_memalign
– _mid_memalign
– 24.89% _int_memalign
– 20.22% _int_malloc
– 15.41% sysmalloc
– 7.72% page_fault
– 7.68% do_page_fault
– __do_page_fault
– 6.91% handle_mm_fault
– 6.40% handle_pte_fault
– 2.56% alloc_pages_vma
– 2.36% __alloc_pages_nodemask
1.34% get_page_from_freelist
0.68% clear_page_c_e
– 2.20% mem_cgroup_newpage_charge
mem_cgroup_charge_common
– 1.27% page_add_new_anon_rmap
– 0.87% lru_cache_add
– __lru_cache_add
0.69% pagevec_lru_move_fn
3.50% retint_userspace_restore_args
1.90% system_call_after_swapgs
1.28% irq_return
2.52% malloc_consolidate
2.62% _int_free
0.61% __memcpy_sse2_unaligned_erms (inlined)

Napatech library seems to do lots of mprotect sys-calls on these NT_NetRxGet calls. Additionally data cache gets trashed, and lots of data cache misses start occurring, which further degrades the performance.

However, it seems that Napatech caches these descriptors. So I am able to mitigate the issue by stopping the test case, freeing the buffered descriptors, and restarting the test case.

My question is: Is it possible to pre-allocate the Napatech packet descriptors someway, so that performance would be good from the start?

BR, Antti

PS, Used Napatech version is 3.19.0.30-a9847

Default Asked on February 14, 2020 in Napatech FPGA SmartNICs.
Add Comment