What is a Network Traffic Flow?

traffic_wide

Network traffic flows (flows) are useful for building a coarse-grained understanding of traffic on a computer network, providing a convenient unit for the measurement and/or treatment of traffic.

Flows can be measured to understand what hosts are talking on the network, with details of addresses, volumes and types of traffic. This view of the network can be useful for troubleshooting, detecting security incidents, planning and billing

But what exactly is a flow, and how is it defined?

This question sounds trivial to answer, however when we dig deeper we find nuances and corner cases that make flows interesting, and ultimately difficult to define.

Background

To truly understand flows, we need to start with some background.

Networks started out as circuit-switched. When a host wanted to communicate with another host it asked the network set up a circuit. After the information flow had finished, the circuit was torn down.

flow2

Figure 1 – Example Circuit-Switched Network

Circuit-switched networks have their heritage in phone networks. They have a number of drawbacks, including poor scalability and low capacity utilisation[1].

An alternative was needed to build what ultimately became the Internet – packet-switched networks. Messages are chopped up into variable sized pieces that are individually addressed and sent as packets across the network.

flow3

Figure 2 – Example Packet-Switched Network

The receiving host reassembles the payload from the packets back into the message. Note: packets can also contain control information, such as flow control and paths do not have to be symmetric.

Defining flows in a circuit-switched network is easy as the circuit is a flow and follows a protocol to establish and decommission (circuit = flow); however in a packet switched network things are less obvious.

Imagine for a second that you are at observation point A in the circuit-switched network of Figure 1, you would see:

flow4

Figure 3 – Zoom in on Circuit-Switched Network

Two flows would be observed – circuits between hosts 3 & 4 and hosts 5 & 6. Observing flows in a circuit-switched network is relatively easy because the network is involved in setting up the circuits, so knows their state, and the endpoints.

Imagine now that you are at observation point A in the packet-switched network of Figure 2 instead, you would see:

flow5

Figure 4 – Zoom in on Packet-Switched Network

Suddenly things are less clear. There is a packet coming in from Host 5 destined for Host 6. Assuming we observe for a period of time we see more packets arrive and depart. Observing flows on a packet-switched network takes time, and requires recording and analysing packet information.

First (Naive) Attempt to Define a Flow

Using our knowledge of packet-switched networks, a first attempt to answer ‘what is a flow‘ might be:

A flow is a sequence of packets carrying information between two hosts

This definition looks like:

flow6

Figure 5 – First Attempt to Define a Flow

There is however a problem. What happens if there is more than one type of communication happening between the hosts, for example A has an SSH and HTTPS connection to B? Are the SSH and HTTPS packets part of the same flow? Intuitively we say no, they are different sessions, and even entirely different protocols. We can do better…

Second Attempt to Define

How about we use protocol information from packet headers as common properties to identify packets into flows? This will separate different types of connection into different flows.

A large proportion of packets on a network are likely to be IP layer-3 protocol, with TCP or UDP as the layer-4 transport protocol. It is therefore reasonable to consider using TCP and UDP parameters as flow keys from the packet headers.

We’ll use a 5 parameters from the packet headers; source IP, destination IP, protocol, TCP or UDP source port and TCP or UDP destination port as an ordered list, commonly known as the 5-tuple, as the common properties to map the packets to flows.

flow7

Figure 6 – Example 5-tuple

Here goes attempt 2:

A flow is a sequence of packets carrying information between two hosts, where packets have common properties:

  • All packets in the flow share the same 5-tuple

Our scenario now looks like this:

flow8

Figure 7 – Second Attempt to Define a Flow

Perfect you say. Life is good. We have a definition for a flow, nothing more to see here… But hang on, what about the direction of the packets?

Third Attempt – Bidirectional

A 5-tuple is unidirectional (one-way). By default it will only match packets travelling in one direction, since packets in the reverse direction have transposed IP addresses and port numbers, and thus a different 5-tuple hash.

There are good reasons to consider flows as bidirectional as opposed to unidirectional, including ability to determine client/server behaviour and calculate round trip times as well as improving detection of security incidents such as scanning[2][3].

Consider a simple TCP connection in Figure 8 where packets bounce back and forth between Host A & B:

flow9

Figure 8 – Ladder Diagram of a Simple TCP Flow

We see a classic TCP 3-way handshake (SYN, SYN+ACK, ACK) followed by exchange of data. Unidirectional flow analysis at an observation point in the network would see two separate flows, one per direction, as per Figure 9:

flow10

Figure 9 – Unidirectional Ladders for Simple TCP Flow

The problem with unidirectional flow measurement is we miss the opportunity to capture some important metadata about the flow. Sure, each unidirectional flow can store directional metadata for bytes and packets in that direction. This can include inter-packet timing (see labelled dots in Figure 9). But we miss the opportunity to gather metadata that requires measuring traffic parameters across both directions.

Consider bidirectional observation in Figure 10:

flow11a

Figure 10 – Ladder of Bidirectional Observations for Simple TCP Flow

We can now observe and measure the TCP 3-way handshake (points F1, B1, F2), and look at other metrics like response times.

For bidirectional flows, we need two 5-tuples, the second of which reverses the tuple order of both the IP addresses and the port numbers. Below is an example of SSH flow forward and reverse 5-tuples:

flow12

Figure 11 – Reversing a 5-tuple

Right, that wasn’t too difficult. But wait, a small detail lurks that requires further attention. How do we know what is the forward direction of the flow? We determine the direction based on the first packet observed, which is assumed to be travelling in the client to server direction, but this isn’t going to be 100% reliable as packets could be out of order and/or the observation could start part way through a flow. We’ll use with this method, but need to remember when we look at results that it isn’t perfect.

An alternative method is inspecting transport protocol fields. In TCP for instance, the presence of just the SYN flag is a reasonable indicator that the packet is the first one in the flow.

Our definition for a flow is now:

A flow is a sequence of packets carrying information between two hosts where packets have common properties:

  • All packets in the flow share the same 5-tuple or transposed 5-tuple

Fourth Attempt – Including Non-TCP IP Traffic

Up until this point we’ve assumed that the transport protocol is TCP. What about other IP transport protocols?

UDP is the obvious choice for second-most-common transport protocol, especially with the rise of real-time traffic over UDP as well as new protocols such as QUIC[4]. UDP fits easily into the same model as it has source and destination port numbers, as per QUIC example below:

flow13

Figure 12 – Reversing a UDP 5-tuple (same as TCP)

Stream Control Transmission Protocol (SCTP) is another transport protocol that uses port numbers and thus works with a 5-tuple.

But what about other protocols, IPsec for example?

IPsec ESP (Encapsulating Security Payload) is a protocol that does not include source/destination port numbers, so we need to fall back to a 3-tuple as understanding the payload of the protocol is unlikely to be practical.

flow14

Figure 13 – Bidirectional 3-Tuple

Our definition for a flow is now:

A flow is a sequence of packets carrying information between two hosts where packets have common properties:

  • For transport protocols with port numbers (i.e.TCP/UDP/SCTP):

All packets in the flow share the same 5-tuple or transposed 5-tuple

else:

All packets in the flow share the same 3-tuple or transposed 3-tuple

There is however yet another factor to consider – time.

Fifth Attempt – Flow expiration

A flow only exists for a certain amount of time. It is possible that the same 5-tuple (or 3-tuple) could be reused at a different point in time, for a different flow, between the same hosts.

Consider two hosts where one initiates many new TCP connections to the other. Each new TCP connection gets a new source port, generally incremented by 1 from the previous allocation. IANA allocate the range 49152 to 65535 for these ephemeral ports, giving 16384 ports. Over time the TCP source port will roll through the range and the original source port be reused, and this flow will have the same 5-tuple as the original flow. This presents a problem, as it is not the same flow!

To solve this we need flows to be expired where no packets are seen for more than a specified amount of time. Here we go again:

A flow is a sequence of packets carrying information between two hosts where packets have common properties:

  • For transport protocols with port numbers (i.e.TCP/UDP/SCTP):

All packets in the flow share the same 5-tuple or transposed 5-tuple

else:

All packets in the flow share the same 3-tuple or transposed 3-tuple

  • All inter-packet times are less than arbitrary flow expiry timeout value

Sixth Attempt – Arbitrary Parameters

Sometimes you may want different parameters to identify flows, or these may be forced upon you by the type of hardware/software in the network.

In a Cisco router for instance, flows are identified by a 7-Tuple that adds Type of Service (ToS) and input sub-interface to the standard 5-Tuple[5]. There may also be situations where layer-2 fields, such as source or destination MAC address, may make sense as flow keys (although note that they are only locally significant).

There are other parameters that could be used as common properties for flow identification; it is ultimately up to the operator to decide and the capabilities of the equipment. Based on this, we further refine the definition:

A flow is a sequence of packets carrying information between two hosts where packets have common properties:

  • For transport protocols with port numbers (i.e.TCP/UDP/SCTP):

All packets in the flow share the same 5-tuple or transposed 5-tuple

else:

All packets in the flow share the same 3-tuple or transposed 3-tuple

  • All inter-packet times are less than arbitrary flow expiry timeout value
  • Can use any arbitrary parameters as flow keys, including ToS, interface etc.

Wrapping it all up

We’ve shown that producing a single all-encompassing flow definition is a difficult problem. Flow definition is implementation-specific, dependent on user requirements as well as the capabilities of networking equipment that measures the flows.

In part 2 of this blog post we’ll go into further considerations for flows such as IPv6 flow labels, fragmentation (IPv4 and IPv6 problem..), encryption, non-IP packets and how flows are measured and used in other systems such as SDN.

[1] See: https://www.eecs.yorku.ca/course_archive/2015-16/W/3214/CSE3214_01_PacketCircuitSwitching_2016_posted_part2.pdf

[2] For some related papers, see: https://www.researchgate.net/profile/Brian_Trammell/publication/245587221_Bidirectional_Flow_Measurement_IPFIX_and_Security_Analysis/links/0f3175331dd3b49103000000/Bidirectional-Flow-Measurement-IPFIX-and-Security-Analysis.pdf and https://is.muni.cz/th/hilnn/cse2009.pdf

[3] For relevant RFC, see: https://tools.ietf.org/html/rfc5103

[4] For more on standardisation of QUIC, see: https://datatracker.ietf.org/wg/quic/about/

[5] See: https://www.cisco.com/en/US/tech/tk812/technologies_white_paper09186a008022bde8.shtml

5 thoughts on “What is a Network Traffic Flow?

  1. Matt Hayes September 29, 2018 / 1:25 am

    Thanks to Adrian for the feedback, post has been updated.

    Like

  2. Gabe Rosas October 29, 2020 / 11:44 pm

    Excellent post. Thanks for writing this!

    Like

  3. harley xu April 22, 2021 / 1:43 pm

    Great post ! I’ve learned a lot on network flow

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s