Architecture
How FlowLens integrates into VPP's data plane — graph nodes, feature arcs, and the buffer opaque contract.
System overview
FlowLens is a collection of VPP graph nodes that register on the ip4-unicast and ip4-output feature arcs. Every packet on an enabled interface passes through the active nodes in order.
Packet arrives at NIC
│
▼
┌─────────────────────────────────────────────────────────────┐
│ VPP Process │
│ │
│ [ip4-unicast feature arc] │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ 1. [ndpi-observe] classify + write buffer opaque │ │
│ │ 2. [ndpi-policy] permit / drop / DSCP mark │ │
│ │ 3. [ndpi-policer] token-bucket rate limiting │ │
│ │ 4. [ip4-lookup] FIB routing │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ [ip4-output feature arc] │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ 5. [sdwan-steer] override adjacency per-app │ │
│ │ 6. [interface-output] → NIC TX │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Buffer opaque contract
When ndpi-observe classifies a flow, it writes a 12-byte struct into the VPP buffer opaque area. Every downstream plugin reads this with a single pointer dereference — no IPC, no syscalls, no locks:
typedef struct {
u16 app_protocol; /* nDPI NDPI_PROTOCOL_* constant */
u8 category; /* NDPI_PROTOCOL_CATEGORY_* */
u32 risk; /* nDPI risk bitmask */
u8 status; /* CLASSIFIED | CLASSIFYING | GAVE_UP */
} ndpi_flow_tag_t;
/* Access from any downstream node: */
ndpi_flow_tag_t *tag = vlib_buffer_opaque(b)->ndpi_flow_tag;
Per-worker, lock-free design
Each VPP worker thread has its own:
- Flow table —
bihash_16_8_t, keyed on 5-tuple, O(1) lookup - nDPI detection context — per-worker, no cross-worker sharing
- Per-application counters — aggregated by the stats process node
There are no locks on the hot path. The stats process node runs on the main thread and uses vlib_worker_thread_barrier_sync() to safely aggregate counters once per second.
Flow lifecycle
New 5-tuple seen
→ allocate flow entry (bihash insert)
→ allocate nDPI state (~1.1 KB)
→ classify: first 3–8 packets
→ write verdict to buffer opaque + per-app counters
→ free nDPI state (keep entry with cached verdict)
Subsequent packets (cached):
→ bihash lookup (~8 ns)
→ use cached verdict from flow entry
Flow expiry:
→ aging thread removes stale entries
→ optional: emit IPFIX record
Stats segment
ndpi_stats.c registers a VLIB_NODE_TYPE_PROCESS node that:
- Wakes every 1 second
- Calls
vlib_worker_thread_barrier_sync()to safely read per-worker counters - Aggregates into global and per-app totals
- Pushes to VPP’s stats segment via
vlib_stats_set_gauge()
The Prometheus exporter reads from the stats segment via memory-mapped shared memory — no IPC required.
Performance characteristics
| Metric | Value | Condition |
|---|---|---|
| Overhead per packet (classifying) | ~150 ns | 10G link, 64B packets |
| Overhead per packet (cached flow) | ~8 ns | bihash lookup only |
| Flow table lookup | O(1) | per-worker, no locks |
| Max flows per worker | 1M (configurable) | 64-byte entries |
| Memory per classifying flow | ~1.1 KB | nDPI state allocated |
| Memory per cached flow | ~64 B | nDPI state freed after verdict |
| Classification convergence | 3–8 packets | 95th pct, TCP/TLS |
Feature arc registration
VNET_FEATURE_INIT (ndpi_observe_ip4, static) = {
.arc_name = "ip4-unicast",
.node_name = "ndpi-observe",
.runs_before = VNET_FEATURES ("ip4-lookup"),
};
Enable per-interface at runtime — no restart required:
vppctl set interface ndpi eth0 enable