TL;DR
This paper introduces MAFIA, a set of reusable primitives for network measurement tasks on programmable switches, enabling flexible, concise, and hardware-compatible measurement code without expert knowledge.
Contribution
The paper presents MAFIA, a novel framework of primitives that simplifies and unifies network measurement task implementation on programmable switches.
Findings
MAFIA primitives enable concise measurement task expression.
Compiled MAFIA code is comparable to manual P4 code in size and resource usage.
MAFIA is applicable on current hardware without requiring low-level expertise.
Abstract
The emergence of programmable switches has sparked a significant amount of work on new techniques to perform more powerful measurement tasks, for instance, to obtain fine-grained traffic and performance statistics. Previous work has focused on the efficiency of these measurements alone and has neglected flexibility, resulting in solutions that are hard to reuse or repurpose and that often overlap in functionality or goals. In this paper, we propose the use of a set of reusable primitive building blocks that can be composed to express measurement tasks in a concise and simple way. We describe the rationale for the design of our primitives, that we have named MAFIA (Measurements As FIrst-class Artifacts), and using several examples we illustrate how they can be combined to realize a comprehensive range of network measurement tasks. Writing MAFIA code does not require expert knowledge of…
Click any figure to enlarge with its caption.
Figure 1
|
|
|
HOW Legacy SDN | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Traffic Eng. |
|
|
SNMP; NetFlow; sFlow Counters; Samples; Sketches | |||||||||||||||||
|
Performance |
|
|
|
SNMP; NetFlow; sFlow Counters; Probes; Samples; Bloom filters; Sketches | ||||||||||||||||
|
Verification |
|
Config. Analysis - Tags; | ||||||||||||||||||
|
Troubleshooting |
|
|
|
|||||||||||||||||
|
Security |
|
|
|
|
||||||||||||||||
| Primitive | API | P4 Implementation | ||
| Tables | Actions | LoC | ||
| Match | match(conditional) | 1 | builtin | 9 |
| Tag | tag(header_field, expr) | 1 | 1 | 9 |
| Sample | duplicate(stream) | 1 | 1 | 22 |
| collect(endpoint) | 1 | S | S | |
| Timestamp | timestamp(t) | 1 | 2 | 10 |
| Counter | set, reset | 1 | 4 | 12 |
| BloomFilter |
membership:
{insert, test, reset, init}. |
1 | ||
|
counting:
{set, reset, init, all, any, sum, avg, min, max} |
1 | |||
| Sketch |
pcsa/hll:
{update, test, reset} |
|||
|
count-min:
{set, reset, sum, avg, min, max} |
1 | |||
|
store:
{set, reset, all, any, sum, avg, min, max} |
1 | |||
| Window | window | (variable) | ||
| P4 LoC | ||||
| Measurement | (Manual) | (Compiler) | ||
| Use case | API: Primitives | raw | opt. | |
|
Flow volume and
duration |
3 Match;
3 Counter HashMap; 2 Timestamp HashMap; |
121 | 185 |
146
() |
|
Approximate
flow volume |
1 Match;
1 Sketch (count-min) |
107 | 120 |
120
() |
|
Flow
cardinality |
1 Match;
1 Sketch (PCSA) |
86 | 92 |
92
() |
|
Flow
cardinality |
1 Match;
1 Sketch (HyperLogLog) |
96 | 102 |
102
() |
|
Counter
thresholds |
5 Match;
2 Counter HashMap; 2 Sample |
139 | 193 |
170
() |
|
Stochastic
sampling |
2 Match;
1 Tag; 1 Sample; |
103 | 126 |
118
() |
|
Deterministic
sampling |
5 Match;
3 Counter HashMap; 1 Tag; 1 Sample; |
131 | 207 |
167
() |
|
Postcard
generation |
2 Match;
4 Tag; 1 Sample; |
94 | 121 |
101
() |
|
Trajectory
encoding |
5 Match;
1 BloomFilter; 1 Timestamp+HashMap ; 6 Tag; 1 Sample; 1 Counter; |
244 | 299 |
260
() |
|
Two-phase
heavy hitter |
4 Match;
1 Counter; 1 Counters HashMap; 1 Sketch (count-min); 1 BloomFilter; |
261 | 345 |
281
() |
|
Top-k
congested flows |
3 Match;
1 Counter; 2 Sketch (count-min); 3 Tag; |
198 | 240 |
204
() |
| Path changes |
3 Match;
1 Sketch (count-min); 1 Sketch; 1 BloomFilter; 1 Tag; |
325 | 389 |
345
() |
|
Path change
latency |
2 Match;
1 Timestamp; 1 Sample; 1 Tag; |
38 | 44 |
41
() |
| Measurement | Pipeline depth | Pipeline width | Num. Atoms |
Banzai
Atom Type |
| Flow volume and duration | 4 | 4 | 11 | Sub |
| Approximate flow volume | 4 | 5 | 18 | RAW |
| Flow cardinality | 3 | 3 | 6 | RW |
| Flow cardinality | 3 | 2 | 4 | RW |
| Counter thresholds | 5 | 2 | 9 | If-Else-RAW |
| Stochastic sampling | 3 | 1 | 3 | If-Else-RAW |
| Deterministic sampling | 6 | 2 | 8 | Pairs |
| Postcard generation | 1 | 5 | 5 | RW |
| Trajectory encoding | 6 | 3 | 8 | RW |
| Two-phase heavy hitter | 8 | 12 | 41 | If-Else-RAW |
| Top-k congested flows | 9 | 6 | 38 | If-Else-RAW |
| Path changes | 9 | 13 | 49 | If-Else-RAW |
| Path change latency | 4 | 2 | 5 | RW |
|
|
|||
| Flow volume and duration | [11, 38, 15, 12, 39, 40, 41] | ⬇ flowid = Key(ip.src,ip.dest,tcp.src,tcp.dest,ip.proto) now_ts = Timestamp(); byte_counter = HashMap(key=flowid, size=1024, type=Counter(width=32)); packet_counter = HashMap(key=flowid, size=1024, type=Counter(width=32)); start_ts = HashMap(key=flowid, size=1024, type=Timestamp()); flow_duration = HashMap(key=flowid, size=1024, type=Counter(width=32)); pkts ( byte_counter.set(byte_counter + pkt.size) packet_counter.set( packet_counter + 1) ) ( ( match(start_ts == 0) timestamp(start_ts) ) ( match(start_ts != 0) timestamp(now_ts) flow_duration.set(now_ts - start_ts)) ) ) | ||
|
Approximate
flow volume (Count-Min Sketch) |
[41, 13, 36, 24, 42] | ⬇ flowid = Key(ip.src,ip.dest,tcp.src,tcp.dest,ip.proto) flow_size = Sketch(alg=”countmin”, nhash=4, key=flowid, size=256, width=32); pkts flow_size.set(flow_size + pkt.size) | ||
|
Flow cardinality
(PCSA Sketch) |
[13, 33] | ⬇ flowid = Key(ip.src,ip.dest,tcp.src,tcp.dest,ip.proto) num_flows = Sketch(alg=”pcsa”,key=flowid,nhash=1,size=128); pkts num_flows.update() | ||
|
Flow cardinality
(HyperLogLog Sketch) |
[24, 34] | ⬇ flowid = Key(ip.src,ip.dest,tcp.src,tcp.dest,ip.proto) num_flows = Sketch(alg=”hyperloglog”,key=flowid,nhash=1,size=256); pkts num_flows.update() | ||
|
Counter
thresholds |
[41] | ⬇ flowid = Key(ip.src,ip.dest,tcp.src,tcp.dest,ip.proto) byte_counter = HashMap(key=flowid, size=1024, type=Counter(width=32)); packet_counter = HashMap(key=flowid, size=1024, type=Counter(width=32)); pkts packet_counter.set(packet_counter + 1) match(packet_counter > PACKET_THRESHOLD) duplicate(pkts_exceeded) pkts byte_counter.set(byte_counter + pkt.size) match(byte_counter > BYTE_THRESHOLD) duplicate(bytes_exceeded) pkts_exceeded collect(COLLECTOR) bytes_exceeded collect(COLLECTOR) |
|
|
|||
|
Stochastic
Sampling |
[43, 16, 44] | ⬇ pkts match(random([0:100]) < SamplingRatio) duplicate(samples) ) samples collect(COLLECTOR) | ||
|
Deterministic
Sampling |
[43] | ⬇ flowid = Key(ip.src,ip.dest,tcp.src,tcp.dest,ip.proto) n = HashMap(key=flowid, size=1024, type=Counter(width=32)) m = HashMap(key=flowid, size=1024, type=Counter(width=32)) delta = HashMap(key=flowid, size=1024, type=Counter(width=32)) pkts ( ( match(delta < SKIP) delta.add(1) ) ( match(delta >= SKIP && m < NUM_SAMPLES) m.set(m + 1)) duplicate(samples) ) ( n.set(n + 1) match(n >= NUM_PACKETS) n.reset() m.reset() delta.reset() ) ) samples collect(COLLECTOR) | ||
|
Postcard
generation |
[22] | ⬇ pkts duplicate(postcards) postcards tag(ipv4.checksum, pkt.input_port) tag(ipv4.identification, pkt.output_port) tag(ipv4.tos, switchid) collect(COLLECTOR) | ||
|
Trajectory
encoding |
[26] | ⬇ //code executed at ingress switch now = Timestamp() ; flowid = key(ip.src,ip.dest,tcp.src,tcp.dest,ip.proto) verify_time = HashMap(key=flowid, size=1024, type=Timestamp()) ; pkts timestamp(now) match( verify_time - now > THRESHOLD) timestamp(verify_time) tag(ipv4.tos, switch.id) tag(ipv4.identification, pkt.input_port) tag(pkt.ipv4.tos, pkt.ipv4.tos|0x1) //code executed at intermediate switch location = key(pkt.input_port, switch.id, pkt.output_port) trajectory = BloomFilter(alg=”membership”, nhash=4, key=location, size=16); pkts match(pkt.ipv4.tos & 0x1 == 0x1) trajectory.insert() tag(ipv4.checksum, ipv4.checksum | trajectory.read())) trajectory.reset() //code executed at egress switch pkts match(pkt.ipv4.tos & 0x1 == 0x1) tag(ipv4.identification, pkt.output_port) tag(ipv4.tos, switch.id) duplicate(reports) reports collect(COLLECTOR) |
|
|
|||
|
Two-phase
heavy hitters |
⬇ flowid = Key(ip.src,ip.dest,tcp.src,tcp.dest,ip.proto) total = Counter(width=32) nbytes = Sketch(alg=”count-min”,nhash=4,key=flowid,size=256,width=32) hh = BloomFilter(alg=”membership”,key=flowid,nhash=4,size=64) hh_bytes = HashMap(key=flowid,size=1024,type=Counter(width=32)) window(mment_interval) pkts match(pkt.input_port == PORT) total.set(total + pkt.size) ( ( match(!hh.test()) nbytes.set(nbytes + pkt.size) match(nbytes.min() / total > THRESHOLD) hh.insert() hh_bytes.set(nbytes.min()) duplicate(hh_alarms) ) ( match(hh.test()) hh_bytes.set(hh_bytes + pkt.size) ) ) hh_alarms tag(ipv4.checksum, nbytes.min()) collect(CONTROLLER) // Control traffic to retrieve heavy hitters volume. ctrl match(pkt.request == HH_VOLUME) duplicate(get_hh_volume) get_hh_volume tag(pkt.hh_volume, hh_bytes) collect(CONTROLLER) | |||
| Top-k Congested Flows | ⬇ // Code executed at first hop: pkts tag(ipv4.tos, ipv4.tos | 0x1) tag(ipv4.id, pkt.in_queue_length)) // Code executed at intermediate hops: q_len = Counter(width=32); pkts match(ipv4.tos & 0x1 == 0x1) q_len.set(ipv4.id + pkt.in_queue_length) tag(ipv4.id, q_len) // Code executed at last hop: flowid = Key(ip.src,ip.dest,tcp.src,tcp.dest,ip.proto) total_pkts = Sketch(alg=”countmin”,key=flowid,nhash=4,size=1024,w=32) path_q_len = Sketch(alg=”countmin”,key=flowid,nhash=4,size=1024,w=32) pkts.window(5s) match(ipv4.tos & 0x1 == 0x1) total_pkts.set(total_pkts + 1) path_q_len.set(path_q_len + ipv4.id) | |||
| Path Changes | ⬇ // Code to be executed at intermediate switches location = Key(pkt.input_port, switch.id, pkt.output_port) location_bf = BloomFilter( alg=”membership”,key=location,nhash=4,size=32) pkts location_bf.init(ipv4.checksum) trajectory.set() tag(ipv4.checksum, ipv4.checksum | location_bf) location_bf.reset() // Code to be executed at the packet’s last hop flowid = Key(ip.src,ip.dest,tcp.src,tcp.dest,ip.proto) paths_sketch = Sketch( alg=”store”,key=flowid,nhash=4,size=256,width=16) n_change_sketch = Sketch( alg=”countmin”,key=flowid,nhash=4,key=flowid,size=256) pkts.window(10 RTT) match(!paths_sketch.any(ipv4.checksum)) paths_sketch.set(ipv4.checksum) n_change_sketch.set(n_change_sketch + 1) |
|
|
|||
| Path Change Latency | ⬇ // Code to be executed on all switches updating rules change_ts = Timestamp(); l_clock = Counter(width=8); pkts match(segway_header.msg == GoodToMove) l_clock.set(max(l_clock + 1, segway_header.ts)) tag(segway_header.ts, l_clock) duplicate(end_of_update) end_of_update timestamp(change_ts) tag(segway_header.time, change_ts) tag(segway_header.ts, l_clock) collect(SEGWAY_CONTROLLER) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Measurements As First-class Artifacts
Extended Version
Paolo Laffranchini*⋆⋄□* Luis Rodrigues⋆ Marco Canini*†* Balachander Krishnamurthy*‡*
⋆ INESC-ID, IST, U. Lisboa ⋄ Université catholique de Louvain † KAUST ‡ AT&T Labs – Research *□*Work done in part while visiting at KAUST.
Abstract
The emergence of programmable switches has sparked a significant amount of work on new techniques to perform more powerful measurement tasks, for instance, to obtain fine-grained traffic and performance statistics. Previous work has focused on the efficiency of these measurements alone and has neglected flexibility, resulting in solutions that are hard to reuse or repurpose and that often overlap in functionality or goals.
In this paper, we propose the use of a set of reusable primitive building blocks that can be composed to express measurement tasks in a concise and simple way. We describe the rationale for the design of our primitives, that we have named MAFIA (Measurements As FIrst-class Artifacts), and using several examples we illustrate how they can be combined to realize a comprehensive range of network measurement tasks. Writing MAFIA code does not require expert knowledge of low-level switch architecture details. Using a prototype implementation of MAFIA, we demonstrate the applicability of our approach and show that the use of our primitives results in compiled code that is comparable in size and resource usage with manually written specialized P4 code, and can be run in current hardware.
I Introduction
Historically, network measurement’s evolution paralleled the growth of the Internet but at a much slower pace. SNMP, ping, and traceroute constituted the bulk of measurement-related aids for a long time. The introduction of SDN has led to significant work on various aspects of programmable network infrastructures. An SDN controller can dynamically install and modify switch rules, enforce high-level operator policies and gather statistics. Starting from the original white paper[1] various aspects of SDN (and particularly OpenFlow[2]) have been examined in depth. Unfortunately, measurement, a well-understood requirement for the Internet, with a long body of developed work for over two decades, appears to have been an afterthought in SDN’s development. In fact, [1] mentions security a dozen times (rightfully so) but the words measurement or metrics do not appear in it.
Given measurement’s importance in network operation and management, there has been a flurry of work on exploiting SDN features and programmable switches to perform more powerful measurement tasks. Beyond OpenFlow, proposals like OpenState [3] and switch programmability as in P4 [4] have enabled richer, customizable in-network processing that can implement measurements for fine-grained traffic and network performance statistics [5, 6, 7]. Most of the recent work in this area focuses on efficiently mapping measurement tasks on programmable forwarding elements. Efficiency is key as current programmable switch chips have limited computational and memory resources [8, 5, 7, 9].
An important requirement that has not been addressed in prior work is flexibility and extensibility in supporting a variety of measurement tasks; instead we have ad-hoc solutions proposed for specific measurements. In spite of advances in programmable data planes, it is not possible without significant effort to combine, reuse or repurpose existing solutions although they may partly overlap in functionality or goals.
We instead argue for supporting flexible measurement through a set of reusable building blocks (primitives) that take advantage of novel features of programmable forwarding elements and span most of the commonly performed measurement tasks. We identify a set of such primitives that network operators can use to express measurement tasks in a concise and simple way. Further, they are reusable as complex tasks can be expressed by composing a few calls to a subset of our measurement primitives.
We define our approach as Measurements As FIrst-class Artifacts, or MAFIA for short. Concretely, we instantiate our ideas as an API that provides an abstraction over measurement primitives that execute at line rate in the data plane. We remark that our primary target is network operators, who are not proficient data plane programmers, yet they desire to quickly address performance-, security- and troubleshooting-related measurement needs. As such, our goal is not satisfied by and is orthogonal to data plane programming languages like P4. These technologies are an enabler for MAFIA but remain fundamentally lower-level approaches.
Our work is informed by the large number of legacy measurements that have been carried out routinely in large and small networks as well as new ones in the SDN milieu. We identify the primitives for measurement on the basis of their breadth of applicability and the ability for maximal reuse (i.e., a good implementation can yield rich dividends in a broad set of contexts). We are driven by four key considerations inherent in measurement [10]: what, where, when, and how. We validate our idea by showing that several key known SDN measurements and some new ones can be built by composing our abstractions. Our primitives can be used to answer questions ranging from network-wide traffic characteristics (e.g., flow size distributions, identifying heavy hitters [11, 12, 13, 14], to fine-grained monitoring of properties of flows and switches (throughput, latency, loss, etc.) [15, 16, 17, 18, 19], to verification (traffic behavior matching operator’s intent) [20], to debugging (e.g., troubleshooting root causes of performance problems or switch/controller misbehavior) [21, 22], and various security aspects (e.g., anomalies, DDoS, malicious activity) [13, 14].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Open Networking Foundation, “Software-defined networking: The new norm for networks,” https://www.opennetworking.org/images/stories/downloads/sdn-resources/white-papers/wp-sdn-newnorm.pdf , 2012.
- 2[2] N. Mc Keown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “Open Flow: Enabling Innovation in Campus Networks,” SIGCOMM Comput. Commun. Rev. , vol. 38, no. 2, 2008.
- 3[3] G. Bianchi, M. Bonola, A. Capone, and C. Cascone, “Open State: Programming Platform-independent Stateful Openflow Applications Inside the Switch,” SIGCOMM Comput. Commun. Rev. , vol. 44, no. 2, 2014.
- 4[4] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. Mc Keown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, “P 4: Programming Protocol-independent Packet Processors,” SIGCOMM Comput. Commun. Rev. , vol. 44, no. 3, 2014.
- 5[5] S. Narayana, A. Sivaraman, V. Nathan, P. Goyal, V. Arun, M. Alizadeh, V. Jeyakumar, and C. Kim, “Language-Directed Hardware Design for Network Performance Monitoring,” in SIGCOMM , 2017.
- 6[6] J. Sonchack, A. J. Aviv, E. Keller, and J. M. Smith, “Turboflow: Information Rich Flow Record Generation on Commodity Switches,” in Euro Sys ’18 , 2018.
- 7[7] J. Sonchack, O. Michel, A. J. Aviv, E. Keller, and J. M. Smith, “Scaling Hardware Accelerated Network Monitoring to Concurrent and Dynamic Queries With *Flow,” in ATC , 2018.
- 8[8] P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. Mc Keown, M. Izzard, F. Mujica, and M. Horowitz, “Forwarding Metamorphosis: Fast Programmable Match-action Processing in Hardware for SDN,” in SIGCOMM , 2013.
