Profiling Suricata with JEMALLOC
JEMALLOC is a memory allocation library: http://www.canonware.com/jemalloc/
It offers many interesting things for a tool like Suricata. Ken Steele of EZchip (formerly Tilera) made me aware of it. In Ken’s testing it helps performance.
Install
wget http://www.canonware.com/download/jemalloc/jemalloc-3.6.0.tar.bz2
tar xvfj jemalloc-3.6.0.tar.bz2
cd jemalloc-3.6.0
./configure --prefix=/opt/jemalloc/
make
sudo make install
Then use it by preloading it:
LD_PRELOAD=/opt/jemalloc/lib/libjemalloc.so ./src/suricata -c suricata.yaml -l tmp/ -r ~/sync/pcap/sandnet.pcap -S emerging-all.rules -v
I haven’t benchmarked this, but if you’re running a high performance setup it may certainly be worth a shot.
Profiling
The library comes with many interesting profiling and debugging features.
make clean
./configure --prefix=/opt/jemalloc-prof/ --enable-prof
make
sudo make install
Start Suricata like this:
LD_PRELOAD=/opt/jemalloc-prof/lib/libjemalloc.so ./src/suricata -c suricata.yaml -l tmp/ -r ~/sync/pcap/sandnet.pcap -S emerging-all.rules -v
Now we don’t see any change as we need to tell jemalloc what we want.
Exit stats
Dumps a lot of stats to the screen.
MALLOC_CONF=stats_print:true LD_PRELOAD=/opt/jemalloc-prof/lib/libjemalloc.so ./src/suricata -c suricata.yaml -l tmp/ -r ~/sync/pcap/sandnet.pcap -S emerging-all.rules -v
Memory leak checks
MALLOC_CONF=prof_leak:true,lg_prof_sample:0 LD_PRELOAD=/opt/jemalloc-prof/lib/libjemalloc.so ./src/suricata -c suricata.yaml -l tmp/ -r ~/sync/pcap/sandnet.pcap -S emerging-all.rules -v
[... suricata output ...]
<jemalloc>: Leak summary: 2011400 bytes, 4523 objects, 645 contexts
<jemalloc>: Run pprof on "jeprof.22760.0.f.heap" for leak detail
Then use the pprof tool that comes with jemalloc to inspect the dumped stats.
$ /opt/jemalloc-prof/bin/pprof --show_bytes ./src/suricata jeprof.22760.0.f.heap
Using local file ./src/suricata.
Using local file jeprof.22760.0.f.heap.
Welcome to pprof! For help, type 'help'.
(pprof) top
Total: 2011400 B
1050112 52.2% 52.2% 1050112 52.2% PacketGetFromAlloc
600064 29.8% 82.0% 600064 29.8% SCProfilePacketStart
103936 5.2% 87.2% 103936 5.2% SCACCreateDeltaTable
65536 3.3% 90.5% 66192 3.3% pcap_fopen_offline
35520 1.8% 92.2% 35520 1.8% ConfNodeNew
26688 1.3% 93.6% 26688 1.3% __GI___strdup
20480 1.0% 94.6% 20480 1.0% MemBufferCreateNew
20480 1.0% 95.6% 20480 1.0% _TmSlotSetFuncAppend
14304 0.7% 96.3% 14304 0.7% pcre_compile2
14064 0.7% 97.0% 25736 1.3% SCPerfRegisterQualifiedCounter
So it seems we don’t properly clean up our packet pools yet.
Create a PDF of this info:
$ /opt/jemalloc-prof/bin/pprof --show_bytes --pdf ./src/suricata jeprof.22760.0.f.heap > jemalloc.pdf
Dumping stats during runtime
Dump stats after every 16MiB of allocations (lg_prof_interval:24, means every 2^24 bytes, so 16MiB)
I’ve done this in a separate directory since it dumps many files.
$ mkdir jemalloc-profile
$ cd jemalloc-profile/
$ MALLOC_CONF="prof:true,prof_prefix:victor.out,lg_prof_interval:24" LD_PRELOAD=/opt/jemalloc-prof/lib/libjemalloc.so ../src/suricata -c ../suricata.yaml -l ../tmp/ -r ~/sync/pcap/sandnet.pcap -S ../emerging-all.rules -v
Then you should see new *.heap files appear, many during startup. But after some time it should slow down.
You can then visualize the diff between two dumps:
$ /opt/jemalloc-prof/bin/pprof --show_bytes --pdf ../src/suricata --base victor.out.24159.150.i150.heap victor.out.24159.200.i200.heap > jemalloc.pdf
This creates a PDF of the 200th dump taking the 150th dump as a baseline. As we dump every ~16MiB, this covers about 50 * 16 = 800MiB worth of allocations.
Further reading
http://www.canonware.com/jemalloc/ https://github.com/jemalloc/jemalloc/wiki https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-Profiling
Many thanks to Ken Steele for pointing me to the lib and providing me with some good examples.