Engineering deep dive: How AMS-IX uses FastNetMon for automated DDoS mitigation

Outi Maria Pietilanaho

1 month ago

Introduction

DDoS attacks have long been a thorn in the side of network operators—but AMS-IX faced a particularly unusual challenge. Unlike the massive volumetric attacks that make headlines, the attacks targeting their management network were low-bandwidth but high-flow, exploiting vulnerabilities in session tables, firewall logging, and internal routing. Even small bursts of DDoS traffic could cascade into network-wide disruptions, impacting office connectivity, VPN users, internal services, and DNS resolution.

This deep dive explores the full journey of how AMS-IX, in an implementation led by Stavros Konstantaras, built an automated, resilient DDoS mitigation system from scratch. In this detailed technical analysis, you’ll learn:

How a few megabits per second of traffic brought the network to its knees, and the chain reactions that caused the collapse.
The architecture of AMS-IX’s management network, including routers, firewalls, and the spine-leaf management fabric, and how traffic flows under normal and attack conditions.
The challenges and limitations of manual mitigation, including firewall tuning, session table management, and security team coordination.
How AMS-IX designed a fully automated mitigation pipeline, combining FastNetMon detection, Python orchestration/BIRD2/BGP signaling, and NAWAS scrubbing.
Testing strategies, lessons learned, and results, including real-world attacks that were mitigated automatically without human intervention.
Future improvements, including router upgrades, migration from NetFlow to IPFIX, smarter mitigation algorithms, and IPv6 strategies.

This deep dive is written for network engineers, NOC teams, and technical decision-makers, providing a full overview of the architecture, workflows, and operational considerations that went into protecting a critical Internet Exchange.

Special thanks to Stavros Konstantaras, Senior Network Engineer at AMS-IX, for designing, implementing, and stress-testing this system, and for sharing his experience with us. We also want to thank AMS-IX for enabling this case study and providing a detailed real-world example of complex network defence.

Prefer a video? Watch Stavros walk through the case in his own words.

The Problem: When Small Attacks Cause Big Collapses

The saga began when a few megabits of UDP traffic targeting AMS-IX’s public DNS servers brought the management network to a halt. VPN users were disconnected, internal email and messaging stopped, NAT and DNS transit were disrupted, while multiple internal services became unreachable. Oddly, the production network—the one carrying customer traffic—remained unaffected.

Stavros and the team quickly realised: this wasn’t a typical volumetric DDoS attack. Very modest traffic volumes of a few Mbps caused cascading failures in the admin network that ended up in sudden disruption of office connectivity, leaving the entire team without access to internal and external resources for several minutes at a time.

Step by step, the investigation revealed the chain reactions:

Traffic volumes appeared harmless: Graphs showed only a few megabits per second reaching internal servers:

Firewalls became the bottleneck: CPU and session tables maxed out because each DNS query, even valid ones, consumed state at the firewall’s session table. As you can see in the graph below, the timestamps CPU overload match with attack time stamps.

Cascading failures ensued: Overloaded firewalls triggered LACP BPDU drops, OSPF session expirations, and lost default gateways for internal systems, leading to internal applications spiking syslog traffic. This created a feedback loop, compounding the problem.

NetFlow overhead worsened the situation: Pre-enabled NetFlow on Palo Alto’s added CPU load, accelerating the collapse.

The team tried built-in firewall mitigations, including zone protection and session limits, but nothing prevented a new round of collapse. Manual intervention proved too slow—the network needed a fully automated detection and mitigation system to stop the attacks in real time.

Understanding the Network Architecture

To better illustrate the problem and the solution, it is useful to understand exactly how traffic flows through the AMX-IX management network. The architecture looks rather straightforward:

Border routers: Two Cisco ASR1001 routers handled initial packet inspection and forwarding.

Firewall clusters: Two clusters of Palo Alto 3050 firewalls in active-passive mode managed security inspection and session tracking.

Management layer: Dell switches running Pluribus OS in a spine-leaf fabric connected the network internally.

Internal services: DNS, HTTP, mail, and other internal services resided behind the firewalls in the DMZ.

Taking as an example the DNS traffic, the flow into the network looks according to the following diagram:

Packet arrives from transit providers and hits a border router.
Border router verifies the packet and forwards it to the management fabric.
The firewall establishes a session, inspects the payload and forwards it to the internal DNS server.
DNS response is returned through the same path.

Yet this “simple” flow hid a problem: firewalls logged sessions for every packet, creating a massive state burden. Even small flow-based attack bursts quickly consumed session tables, triggering the chain reaction described above.

Early Mitigation Attempts and Limitations

The first attempts to mitigate the attacks were purely manual:

Firewall tuning (Palo Alto zone protection, session limits)
Manual activation of scrubbing services from NBIP/NAWAS
Splitting public services into the cloud to absorb some traffic

Despite these efforts, attacks continued to cause downtime. The flow rate was too high for the firewall to handle, and manual mitigation took too long—by the time an engineer reacted, firewalls were already maxed out. The need for automation became clear.

Designing the Automated DDoS Mitigation Pipeline

The solution combined three key elements: detection (the brain), mitigation (the shield), and orchestration (the glue).

1. The Brain: FastNetMon Detection

FastNetMon provided:

Reliable, automated detection of flow-based DDoS attacks
Support for multiple sampling methods (NetFlow, sFlow, IPFIX)
Integration with home-made scripts to trigger mitigation

2. The Shield: NAWAS Scrubbing

NAWAS served as a scrubbing centre:

Traffic is redirected during attacks to the NAWAS scrubbing centre for cleaning
Bad traffic is scrubbed, and clean traffic is returned via the Peering Platform
Existing contract and infrastructure allowed AMS-IX to use it without extra cost

3. The Glue: BGP Orchestration with Python and Bird2

Automation relied on a custom pipeline:

Traffic sampling: Border routers send flow data to FastNetMon via the peering fabric.
Attack detection: FastNetMon detects an attack and triggers the script notify_about_attack.sh.
Python orchestration: Custom script determines which prefixes are under attack and configures Bird2.
BGP signalling: Bird2 communicates via BGP with the CISCO routers to signal the prefixes that are under attack and force them to be advertised to NAWAS for scrubbing.

Testing the System

To ensure reliability, Stavros and the team:

Launched external DDoS tests using a virtual machine hosted several hops away to mimic real-world conditions
Generated millions of packets for both IPv4 and IPv6 DNS queries
Measured reaction time from the second of launching the attack to moment of full mitigation, achieving a fully automated reaction in ~45 seconds for IPv4 traffic

Summary and results

The story paints a picture of a painful reality many engineers can relate to: even small amounts of UDP DNS traffic—just a few megabits per second—were enough to bring down the network. Firewalls hit session limits, CPUs spiked, and internal services failed. Engineers scrambled with manual mitigation steps, often arriving too late, leading to repeated periods of downtime and operational disruption.

The solution, meticulously designed and implemented by Stavros and his team, turned this around. FastNetMon became the brain of the operation, automatically detecting suspicious traffic flows and signalling exactly which prefixes were under attack. Python scripts and Bird2 handled the orchestration, sending only the affected subnets to NAWAS for scrubbing, while clean traffic continued uninterrupted.

The solution was clean, simple, and it works. During internal testing, millions of attack packets were sent at once, yet the system responded in just 45 seconds. Later, in live conditions, multiple DDoS attempts were fully mitigated without the NOC even noticing—engineers no longer had to use time and resources to respond manually.

In short, what had once been a painful disruption is now a seamless, automated defence. The management network remains stable under attack, firewalls no longer collapse, and services stay online. The solution not only restored reliability but also gave the engineering team confidence and peace of mind: the network could now reliably withstand the kinds of bot-driven, rampant attacks that had previously been very problematic.

This case illustrates how deep technical understanding, combined with automation and carefully integrated tools, can transform a reactive, fragile network into a resilient, self-defending system. It is a blueprint for engineering teams facing similar DDoS challenges: identify the weak points, automate detection, and integrate mitigation tightly with routing and scrubbing mechanisms.

About FastNetMon

FastNetMon is a leading solution for network security, offering advanced DDoS detection and mitigation. With real-time analytics and rapid response capabilities, FastNetMon helps organisations protect their infrastructure from evolving cyber threats. For more information, visit https://fastnetmon.com