Network traffic flows (flows) are useful for building a coarse-grained understanding of traffic on a computer network, providing a convenient unit for the measurement and/or treatment of traffic.
Flows can be measured to understand what hosts are talking on the network, with details of addresses, volumes and types of traffic. This view of the network can be useful for troubleshooting, detecting security incidents, planning and billing
But what exactly is a flow, and how is it defined?
This question sounds trivial to answer, however when we dig deeper we find nuances and corner cases that make flows interesting, and ultimately difficult to define.
Background
To truly understand flows, we need to start with some background.
Networks started out as circuit-switched. When a host wanted to communicate with another host it asked the network set up a circuit. After the information flow had finished, the circuit was torn down.
Figure 1 – Example Circuit-Switched Network
Circuit-switched networks have their heritage in phone networks. They have a number of drawbacks, including poor scalability and low capacity utilisation[1].
An alternative was needed to build what ultimately became the Internet – packet-switched networks. Messages are chopped up into variable sized pieces that are individually addressed and sent as packets across the network.
Figure 2 – Example Packet-Switched Network
The receiving host reassembles the payload from the packets back into the message. Note: packets can also contain control information, such as flow control and paths do not have to be symmetric.
Defining flows in a circuit-switched network is easy as the circuit is a flow and follows a protocol to establish and decommission (circuit = flow); however in a packet switched network things are less obvious.
Imagine for a second that you are at observation point A in the circuit-switched network of Figure 1, you would see:
Figure 3 – Zoom in on Circuit-Switched Network
Two flows would be observed – circuits between hosts 3 & 4 and hosts 5 & 6. Observing flows in a circuit-switched network is relatively easy because the network is involved in setting up the circuits, so knows their state, and the endpoints.
Imagine now that you are at observation point A in the packet-switched network of Figure 2 instead, you would see:
Figure 4 – Zoom in on Packet-Switched Network
Suddenly things are less clear. There is a packet coming in from Host 5 destined for Host 6. Assuming we observe for a period of time we see more packets arrive and depart. Observing flows on a packet-switched network takes time, and requires recording and analysing packet information.
First (Naive) Attempt to Define a Flow
Using our knowledge of packet-switched networks, a first attempt to answer ‘what is a flow‘ might be:
A flow is a sequence of packets carrying information between two hosts |
This definition looks like:
Figure 5 – First Attempt to Define a Flow
There is however a problem. What happens if there is more than one type of communication happening between the hosts, for example A has an SSH and HTTPS connection to B? Are the SSH and HTTPS packets part of the same flow? Intuitively we say no, they are different sessions, and even entirely different protocols. We can do better…
Second Attempt to Define
How about we use protocol information from packet headers as common properties to identify packets into flows? This will separate different types of connection into different flows.
A large proportion of packets on a network are likely to be IP layer-3 protocol, with TCP or UDP as the layer-4 transport protocol. It is therefore reasonable to consider using TCP and UDP parameters as flow keys from the packet headers.
We’ll use a 5 parameters from the packet headers; source IP, destination IP, protocol, TCP or UDP source port and TCP or UDP destination port as an ordered list, commonly known as the 5-tuple, as the common properties to map the packets to flows.
Figure 6 – Example 5-tuple
Here goes attempt 2:
A flow is a sequence of packets carrying information between two hosts, where packets have common properties:
|
Our scenario now looks like this:
Figure 7 – Second Attempt to Define a Flow
Perfect you say. Life is good. We have a definition for a flow, nothing more to see here… But hang on, what about the direction of the packets?
Third Attempt – Bidirectional
A 5-tuple is unidirectional (one-way). By default it will only match packets travelling in one direction, since packets in the reverse direction have transposed IP addresses and port numbers, and thus a different 5-tuple hash.
There are good reasons to consider flows as bidirectional as opposed to unidirectional, including ability to determine client/server behaviour and calculate round trip times as well as improving detection of security incidents such as scanning[2][3].
Consider a simple TCP connection in Figure 8 where packets bounce back and forth between Host A & B:
Figure 8 – Ladder Diagram of a Simple TCP Flow
We see a classic TCP 3-way handshake (SYN, SYN+ACK, ACK) followed by exchange of data. Unidirectional flow analysis at an observation point in the network would see two separate flows, one per direction, as per Figure 9:
Figure 9 – Unidirectional Ladders for Simple TCP Flow
The problem with unidirectional flow measurement is we miss the opportunity to capture some important metadata about the flow. Sure, each unidirectional flow can store directional metadata for bytes and packets in that direction. This can include inter-packet timing (see labelled dots in Figure 9). But we miss the opportunity to gather metadata that requires measuring traffic parameters across both directions.
Consider bidirectional observation in Figure 10:
Figure 10 – Ladder of Bidirectional Observations for Simple TCP Flow
We can now observe and measure the TCP 3-way handshake (points F1, B1, F2), and look at other metrics like response times.
For bidirectional flows, we need two 5-tuples, the second of which reverses the tuple order of both the IP addresses and the port numbers. Below is an example of SSH flow forward and reverse 5-tuples:
Figure 11 – Reversing a 5-tuple
Right, that wasn’t too difficult. But wait, a small detail lurks that requires further attention. How do we know what is the forward direction of the flow? We determine the direction based on the first packet observed, which is assumed to be travelling in the client to server direction, but this isn’t going to be 100% reliable as packets could be out of order and/or the observation could start part way through a flow. We’ll use with this method, but need to remember when we look at results that it isn’t perfect.
An alternative method is inspecting transport protocol fields. In TCP for instance, the presence of just the SYN flag is a reasonable indicator that the packet is the first one in the flow.
Our definition for a flow is now:
A flow is a sequence of packets carrying information between two hosts where packets have common properties:
|
Fourth Attempt – Including Non-TCP IP Traffic
Up until this point we’ve assumed that the transport protocol is TCP. What about other IP transport protocols?
UDP is the obvious choice for second-most-common transport protocol, especially with the rise of real-time traffic over UDP as well as new protocols such as QUIC[4]. UDP fits easily into the same model as it has source and destination port numbers, as per QUIC example below:
Figure 12 – Reversing a UDP 5-tuple (same as TCP)
Stream Control Transmission Protocol (SCTP) is another transport protocol that uses port numbers and thus works with a 5-tuple.
But what about other protocols, IPsec for example?
IPsec ESP (Encapsulating Security Payload) is a protocol that does not include source/destination port numbers, so we need to fall back to a 3-tuple as understanding the payload of the protocol is unlikely to be practical.
Figure 13 – Bidirectional 3-Tuple
Our definition for a flow is now:
A flow is a sequence of packets carrying information between two hosts where packets have common properties:
All packets in the flow share the same 5-tuple or transposed 5-tuple else: All packets in the flow share the same 3-tuple or transposed 3-tuple |
There is however yet another factor to consider – time.
Fifth Attempt – Flow expiration
A flow only exists for a certain amount of time. It is possible that the same 5-tuple (or 3-tuple) could be reused at a different point in time, for a different flow, between the same hosts.
Consider two hosts where one initiates many new TCP connections to the other. Each new TCP connection gets a new source port, generally incremented by 1 from the previous allocation. IANA allocate the range 49152 to 65535 for these ephemeral ports, giving 16384 ports. Over time the TCP source port will roll through the range and the original source port be reused, and this flow will have the same 5-tuple as the original flow. This presents a problem, as it is not the same flow!
To solve this we need flows to be expired where no packets are seen for more than a specified amount of time. Here we go again:
A flow is a sequence of packets carrying information between two hosts where packets have common properties:
All packets in the flow share the same 5-tuple or transposed 5-tuple else: All packets in the flow share the same 3-tuple or transposed 3-tuple
|
Sixth Attempt – Arbitrary Parameters
Sometimes you may want different parameters to identify flows, or these may be forced upon you by the type of hardware/software in the network.
In a Cisco router for instance, flows are identified by a 7-Tuple that adds Type of Service (ToS) and input sub-interface to the standard 5-Tuple[5]. There may also be situations where layer-2 fields, such as source or destination MAC address, may make sense as flow keys (although note that they are only locally significant).
There are other parameters that could be used as common properties for flow identification; it is ultimately up to the operator to decide and the capabilities of the equipment. Based on this, we further refine the definition:
A flow is a sequence of packets carrying information between two hosts where packets have common properties:
All packets in the flow share the same 5-tuple or transposed 5-tuple else: All packets in the flow share the same 3-tuple or transposed 3-tuple
|
Wrapping it all up
We’ve shown that producing a single all-encompassing flow definition is a difficult problem. Flow definition is implementation-specific, dependent on user requirements as well as the capabilities of networking equipment that measures the flows.
In part 2 of this blog post we’ll go into further considerations for flows such as IPv6 flow labels, fragmentation (IPv4 and IPv6 problem..), encryption, non-IP packets and how flows are measured and used in other systems such as SDN.
[1] See: https://www.eecs.yorku.ca/course_archive/2015-16/W/3214/CSE3214_01_PacketCircuitSwitching_2016_posted_part2.pdf
[2] For some related papers, see: https://www.researchgate.net/profile/Brian_Trammell/publication/245587221_Bidirectional_Flow_Measurement_IPFIX_and_Security_Analysis/links/0f3175331dd3b49103000000/Bidirectional-Flow-Measurement-IPFIX-and-Security-Analysis.pdf and https://is.muni.cz/th/hilnn/cse2009.pdf
[3] For relevant RFC, see: https://tools.ietf.org/html/rfc5103
[4] For more on standardisation of QUIC, see: https://datatracker.ietf.org/wg/quic/about/
[5] See: https://www.cisco.com/en/US/tech/tk812/technologies_white_paper09186a008022bde8.shtml
Thanks to Adrian for the feedback, post has been updated.
LikeLike
Excellent post. Thanks for writing this!
LikeLike
Great post ! I’ve learned a lot on network flow
LikeLike