Hello, Network guys!
In this article I’ll try to share my experience with different traffic monitoring solutions. As you know I’m FastNetMon author and we support wide range of capture engines from really long list of vendors.
From first days FastNetMon was developed with unification in mind. So we tried hard to implement all possible options to monitor your traffic and I think we achieved this goal.
As good start we enumerate list of most popular traffic monitoring engines:
– sFLOW v4, sFLOW v5
– Netflow (v5, v9, IPFIX, jFlow, cFlow and other)
I prefer to keep out of scope port mirror and SPAN monitoring options because they are pretty well known and implemented in hardware.
Netflow and sFLOW comes from different worlds. sFlow is pretty standard thing for switches while Netflow is more about routers. First and most important difference between this protocols arrived from this assumption. As you know switches deal with packets while routers deal with flows. What’s reason to work with flows instead of packets in routers? Most routers have enough huge routing tables and each lookup request is pretty complicated for them. So we could look on flow-bases approach as way to optimize performance.
sFlow is a sampled protocol, switch generates one sFlow packet for each N-th packed which passes through it. Each sFlow packet consist of meta field with some information about port numbers, sampling rate, packet size, generating device and packet header (usually 50-150 bytes). Packet header stored in raw format with Ethernet headers, IP headers and packet payload. It could be parsed with tcpdump/Wireshark or other tool which could process wire packets.
Netflow it flow based protocol and router generates only single netflow packet for whole flow. And flow could have unlimited length and duration. Netflow packet does not have packet payload but has bunch of fields describing packet (protocol, port, autonomous system number, TCP flags, fragmentation flags). In modern Netflow implementations you could add own options into standard packet.
For optimisation reasons Netflow often has support for sampling and router could produce flows only for N-th flow passed through it.
After this brief explanation I would like to talk about implementation details. If you want to write your sFlow implementation by yourself it feels pretty obvious: just create counter, divide it by modulus on some fixed number N, and then select packet when remainder of the division become zero. Then you need to select first 70 bytes of header and send sFlow packet to collector. Really simple, isn’it?
But when flow-oriented approach arrives we are going into troubles. As we know there are no term of flow in case of UDP, you should prepare custom connection tracking if you want to generate flows from UDP traffic. TCP and other connection-aware protocols also so complicated to track properly. You need to prepare pretty big hash table with information about each flow in your network. On some corner cases (100GE network, huge number of short sessions) it could be very complicated due to restrictions in hardware resources and memory.
At this moment we could conclude that sFlow implementation is definitely cheaper, faster and less error-prone. Nevertheless there
are plenty switch models with different issues in sFlow implementation. For some historic reasons switches have very slow
CPU’s and obviously cheap (for your laptop!) sFlow generation could be pure challenge for your switch. For fixing this issue some vendors introduced “flexible sample rate” approach which provides ability to survive traffic bursts and do not overload switch’s CPU. But they are doing it wrong because according to math theory of sampling you could not increase sampling rate indefinitely. For example, if you want accurate traffic information for your 10GE link you need at least 1:1000 sampling rate.
Netflow implementation is complicated, expensive (big number of vendors will invoice your for license or additional hardware module) and very tricky to implement. Consequently you should keep your eyes on Netflow’s behaviour for your particular device.
There are few configuration options for Netflow based protocols:
– Active flow timeout
– Inactive flow timeout
– Cache size
Active flow timeout is a duration between moment when information about active flow should be send to collector. Inactive flow timeout is same but for already finished flows. Cache size is a total number of records in flow tracking table used by netflow.
As you can see we have few options to tune an all of them have very significant impact on performance of your device.
Also I would like to highlight that many vendors could ignore your values specified for active / inactive flow timeouts and sue some predefined value inside.
Few years ago completely vendor free version of Netflow was approved as standard and got name IPFIX. It introduced a lot of flexibility for this protocol and provide big number of benefits over Netflow. But due to pretty relaxed requirements to agent’s implementation they have wide range of differences specific for particular vendor or software version.
Finally, it’s complicated to find best protocol for all. So you should look on your purposes and select best option for your requirements.
Image by Claus Rebler.