Architecture

How FlowLens integrates into VPP's data plane — graph nodes, feature arcs, and the buffer opaque contract.

System overview

FlowLens is a collection of VPP graph nodes that register on the ip4-unicast and ip4-output feature arcs. Every packet on an enabled interface passes through the active nodes in order.

Packet arrives at NIC
        │
        ▼
┌─────────────────────────────────────────────────────────────┐
│  VPP Process                                                 │
│                                                              │
│  [ip4-unicast feature arc]                                   │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ 1. [ndpi-observe]    classify + write buffer opaque  │   │
│  │ 2. [ndpi-policy]     permit / drop / DSCP mark       │   │
│  │ 3. [ndpi-policer]    token-bucket rate limiting      │   │
│  │ 4. [ip4-lookup]      FIB routing                     │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  [ip4-output feature arc]                                    │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ 5. [sdwan-steer]     override adjacency per-app      │   │
│  │ 6. [interface-output] → NIC TX                       │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Buffer opaque contract

When ndpi-observe classifies a flow, it writes a 12-byte struct into the VPP buffer opaque area. Every downstream plugin reads this with a single pointer dereference — no IPC, no syscalls, no locks:

typedef struct {
  u16 app_protocol;   /* nDPI NDPI_PROTOCOL_* constant */
  u8  category;       /* NDPI_PROTOCOL_CATEGORY_* */
  u32 risk;           /* nDPI risk bitmask */
  u8  status;         /* CLASSIFIED | CLASSIFYING | GAVE_UP */
} ndpi_flow_tag_t;

/* Access from any downstream node: */
ndpi_flow_tag_t *tag = vlib_buffer_opaque(b)->ndpi_flow_tag;

Per-worker, lock-free design

Each VPP worker thread has its own:

Flow table — bihash_16_8_t, keyed on 5-tuple, O(1) lookup
nDPI detection context — per-worker, no cross-worker sharing
Per-application counters — aggregated by the stats process node

There are no locks on the hot path. The stats process node runs on the main thread and uses vlib_worker_thread_barrier_sync() to safely aggregate counters once per second.

Flow lifecycle

New 5-tuple seen
  → allocate flow entry (bihash insert)
  → allocate nDPI state (~1.1 KB)
  → classify: first 3–8 packets
  → write verdict to buffer opaque + per-app counters
  → free nDPI state (keep entry with cached verdict)

Subsequent packets (cached):
  → bihash lookup (~8 ns)
  → use cached verdict from flow entry

Flow expiry:
  → aging thread removes stale entries
  → optional: emit IPFIX record

Stats segment

ndpi_stats.c registers a VLIB_NODE_TYPE_PROCESS node that:

Wakes every 1 second
Calls vlib_worker_thread_barrier_sync() to safely read per-worker counters
Aggregates into global and per-app totals
Pushes to VPP’s stats segment via vlib_stats_set_gauge()

The Prometheus exporter reads from the stats segment via memory-mapped shared memory — no IPC required.

Performance characteristics

Metric	Value	Condition
Overhead per packet (classifying)	~150 ns	10G link, 64B packets
Overhead per packet (cached flow)	~8 ns	bihash lookup only
Flow table lookup	O(1)	per-worker, no locks
Max flows per worker	1M (configurable)	64-byte entries
Memory per classifying flow	~1.1 KB	nDPI state allocated
Memory per cached flow	~64 B	nDPI state freed after verdict
Classification convergence	3–8 packets	95th pct, TCP/TLS

Feature arc registration

VNET_FEATURE_INIT (ndpi_observe_ip4, static) = {
  .arc_name  = "ip4-unicast",
  .node_name = "ndpi-observe",
  .runs_before = VNET_FEATURES ("ip4-lookup"),
};

Enable per-interface at runtime — no restart required:

vppctl set interface ndpi eth0 enable