DDoS Defense by Design: Architecture That Survives When Everything Else Fails

FastNetMon



December 12, 2025

DDoS-Defense-by-Design-Architecture-That-Survives-When-Everything-Else-Fails-by-Herve-Hildenbrand-FastNetMon

This article is written by Herve Hildenbrand and was originally published on LinkedIn. Reposted with the author’s permission.

A DDoS attack almost ruined my 40th birthday.

Not the party, but the infrastructure I was responsible for. Friends texted “happy birthday.” Colleagues texted… differently. Even though they were kind enough to shield me that day, I couldn’t stay on the sidelines.

I watched our scrubbing center eat legitimate traffic alongside the attack. The protection had become the outage.

Let’s call this “Scrubbing Blindness”: when your security solution knows less about your traffic than the attacker does.

We thought we were prepared. We had the tools. We had the vendor contracts. We’d even tested it. But tests never mimic the real thing: the scale, the chaos and the presure of a live attack.

What we didn’t have was architecture that understood what “normal” looked like before everything went sideways.

Your connectivity strategy IS your security strategy. You cannot bolt on DDoS protection to a fragile network.

If you read my previous article on Building Enterprise Internet Connectivity: A Practical Guide from the Trenches, you know I advocate for network sovereignty: own your AS, own your IP space, treat the edge as a strategic asset.

That foundation wasn’t just about resilience. It was the architecture that makes DDoS defense possible.

Here’s the uncomfortable truth we network engineers learn too late: the defense must be designed in, not bolted on after an attacker teaches you what you forgot.

The Diversity Dividend: Why more doors mean more survivors

Most enterprises connect through 1-2 upstream providers and call it “redundant.” During a volumetric attack, that’s like having two doors into a room, both facing the same angry mob.

The math of distributed entry:

Under normal conditions, multiple ingress points let you serve traffic from the closest edge. Latency drops. User experience improves. Nothing controversial here.

Under attack, everything changes.

A 400 Gbps attack against a dual-homed network doesn’t split politely across your links. It follows the attacker’s sources and your carrier capillarity. One upstream saturates while the other barely sweats.

Here’s the cruel irony: your BGP sessions will most likely survive. CoPP protects the control plane. QoS marks protocol traffic as priority. Hold timers give you 90 seconds of grace.

But your customers or the infrastructure behind your edge? They’re drowning. The 100G link is full of garbage. Legitimate traffic drops. And because BGP is fine, your monitoring shows “all green.”

When that saturated link finally gets blackholed or manually shut, the domino failure begins: traffic shifts to your surviving upstream and brings the attack with it.

That same attack against an enterprise with 6 entry points? The math shifts in your favor. Traffic still follows attacker sources and carrier capillarity, but now it fragments across more paths. Instead of two links fighting for survival, you have six absorbing the blow. Some will take more heat than others, but none drowns alone.

You stay online. You have headroom. You can think.

But there’s a third advantage nobody discusses: attackers hate complexity. Saturating one 10G link is trivial. Saturating six geographically distributed 100G links across different providers, different IXPs, different physical paths? That’s an expensive and technically challenging problem. Most attackers move on or change the attack vector.

The Entry Point Multiplier: Every additional ingress point you add doesn’t just improve resilience linearly; it exponentially increases the cost of attacking you.

Detection: The “Three-Speed” problem

You need three detection speeds. Most enterprises have none or one.

Speed 1: Strategic visibility (minutes to hours)

This is where Kentik shines. Flow telemetry, historical baselines, anomaly detection across your entire topology. You see attack patterns evolve over time. You build intelligence.

Speed 2: Tactical visibility (seconds to minutes)

Akvorado or Kentik fills this gap beautifully in the open-source world. Real-time flow analysis, customizable dashboards, enough granularity to understand what’s happening now.

Speed 3: Reaction visibility (sub-second)

This is where FastNetMon becomes essential. Pavel Odintsov built something remarkable: detection-to-action in seconds, not minutes. When a volumetric attack ramps up, you don’t have time for human analysis. FastNetMon sees the spike and triggers your automated response before your BGP sessions even feel pressure. Pavel is always willing to help, a brilliant mind behind a tool that does exactly what it promises.

I encourage you to run a blend of all those tools. They’re not competitors; they’re complementary layers in the detection stack.

The detection architecture: Kentik / Akvorado for the big picture. FastNetMon for the now picture and instant actions.

Flowspec: Your first line of defense (not your only line)

BGP Flowspec is elegant. You push a rule to your border routers: “Drop UDP port 53 traffic from this source to this destination.” Within seconds, your entire edge enforces it. No ACL surgery. No human touch.

But Flowspec has limits…

When Flowspec wins:

Amplification attacks (NTP, DNS, SSDP) with identifiable signatures
Single-vector volumetric floods
Protocol-specific abuse (malformed packets, known attack patterns)

When Flowspec struggles:

Application-layer attacks that look like legitimate traffic
Randomized multi-vector attacks that mutate faster than you can write rules
Attacks from clean residential IPs (botnets behind carrier-grade NAT)

Here’s where the 100G port strategy from my previous article becomes critical. Flowspec stops traffic at your border. The attack doesn’t transit your internal backbone.

The 100G + Flowspec equation: Capacity buys you time. Flowspec buys you precision. You need both.

The Scrubbing Center: Your last line (not your first !)

External scrubbing centers (Lumen, Akamai, Cloudflare, your Tier-1’s DDoS service) are the nuclear option. Route your traffic through their infrastructure, let them absorb and filter the attack, receive clean traffic out the other side.

The problem with “Always-On” scrubbing:

You surrender inbound sovereignty. Every packet, every session, every customer interaction now traverses a third-party black box. I’ve debugged MTU issues, TCP window problems, and mysterious latency spikes that traced back to a scrubber “optimizing” my traffic.

Good luck troubleshooting why a customer’s API calls are failing when the answer is buried inside infrastructure you don’t control.

The correct posture: On-demand activation, automatically triggered by BGP.

Your detection platform (FastNetMon/Kentik/Akvorado … ) sees an attack. It signals your automation layer. Your edge routers announce the affected prefix with a specific BGP community. Your upstream scrubbing provider sees that community and activates mitigation for just that prefix.

The cascade:

Detection → Automation → Edge Router (BGP community tag) → Scrubber activation

Your edge can also take local action. Maybe you stop announcing the attacked prefix to your IXP peers (reducing attack surface) while keeping it active toward your scrubbing-enabled transit.

Pick your scrubber wisely: Global Tier-1 networks make sense here. Lumen, Tata, NTT: they have regional scrubbing centers distributed across their backbone. An attack originating in São Paulo gets scrubbed in São Paulo, not hair-pinned to Ashburn and back. The attack dies where it’s born.

The Surrender Protocol: Why RTBH is a trap

RTBH aka Remote Triggered Black Hole routing. Sounds powerful, no ?

Here’s what you’re actually doing: announcing to your upstream providers “please drop ALL traffic to this prefix.” Every packet. Attack traffic. Customer traffic. Everything.

Let’s call this “Voluntary Extinction.” You’re not defending your infrastructure you’re executing it yourself before the attacker has to.

The attacker wins twice:

First, your service goes globally unavailable. The exact outcome they wanted. Congratulations… you saved them the bandwidth.

Second, you broadcast weakness. You’ve just told every threat actor monitoring your network: “Look this target folds under pressure. Hit them again.”

“But we protected our infrastructure!”

Did you? You just forced the attacker to move laterally, target by target. By the time you’re done reacting, you’ve blackholed the entire network in a scorched earth type defense.

There’s maybe a time for RTBH: when the alternative is catastrophic collateral damage to shared infrastructure you don’t own. When your upstream’s entire PoP is melting because of traffic destined to you.

That’s not defense. That’s triage.

The RTBH Paradox: The moment you use it, you’ve already lost. It should exist in your playbook the way a cyanide pill exists in a spy movie: acknowledged, never celebrated, never the plan.

The /32 Trick: Surgical Scrubbing (and the polarization play)

Here’s a technique I call “Prefix Slicing.” It allows you to scrub a single attacked IP without disrupting the entire network block. This is another cheat code in the spirit of the BGP hacks from my previous article, but it requires precision.

The scenario: You announce your /24 to the Internet. Normal routing. Normal traffic. An attack hits a specific IP within that range.

The traditional response: Route the entire /24 through the scrubber. All 256 IPs get the scrubber treatment. Latency increases for everyone, and filtering artifacts might affect legitimate traffic.

The Constraint: Why You Can’t “Just” Scrub a /32

You might think, “I’ll just announce a /32 for the attacked IP to my scrubber.” This won’t work. The minimum globally routable prefix on the Internet is /24.

If you simply announce the /32 to your scrubbing provider while leaving the /24 announced normally to your other transit providers, traffic will continue to follow the path of the /24 via your clean transit links. Most of the attack traffic bypasses the scrubber entirely, and the attack continues unabated.

The Solution: Polarization and Slicing

To make surgical scrubbing work, you must ensure that the scrubbing provider is the globally preferred path to your entire /24. This is called “Polarization.”

Here is the surgical approach, executed via automation:

Polarize the /24: You must make the scrubbing provider the dominant path for the /24. This is achieved by withdrawing the /24 announcement from your other transit providers. The global internet must see the path to your network for this /24 through the scrubber’s AS (Autonomous System).
Inject the /32 (The Slice): Simultaneously, announce the /32 (the single attacked IP) to the scrubbing provider, tagged with the specific BGP community that activates mitigation.

How it works:

All traffic for the /24 now flows toward the scrubbing provider’s network. Once inside their AS, their internal routing takes over. BGP longest-match wins.

They see two routes for the attacked IP: the /24 and the /32. The /32 is more specific.

Traffic for the attacked IP follows the /32 route and is directed to the scrubbing appliances.
The remaining 255 IPs in your /24 follow the standard /24 route, passing through the provider’s backbone without intensive scrubbing (depending on the provider’s implementation).

This minimizes latency and potential issues for your clean traffic while ensuring the attack is mitigated.

When the attack subsides, you withdraw the /32 and remove the polarization (readvertise the /24 normally to your other providers). Return to normal in seconds.

The Asymmetric Reality

DDoS is asymmetric warfare. You don’t know when. You don’t know how. You don’t know the vector, the volume, or the duration. But you know it’s coming.

The US military uses DEFCON levels to describe readiness states. DEFCON 5 is peacetime. DEFCON 1 is nuclear war imminent.

DEFCON 2 is “FAST PACE”: Armed forces ready to deploy and engage in less than six hours.

That’s exactly where your DDoS defense must live.

You cannot be at DEFCON 1 permanently. That’s “always-on scrubbing,” and we’ve discussed why surrendering your inbound sovereignty to a third-party black box is a trap.

You cannot be at DEFCON 5 either. That’s “hoping your ISP notices before your customers do.”

DEFCON 2 means:

Detection systems watching every ingress in real-time
Automation scripts pre-tested, staged, ready to execute
Flowspec rules templated, one API call from deployment
Scrubbing activation tested monthly, communities documented
Runbooks that require zero human decisions at 3 AM

The difference between surviving a 400 Gbps attack and going dark for hours? It’s not budget. It’s not fancy hardware. It’s whether your network can reach DEFCON 1 automatically when the attack lands.

BGP makes this possible. It’s not just a routing protocol; it’s an orchestration layer. Detection systems trigger routers. Routers signal upstream providers. Communities cascade actions across your entire edge faster than any human could type a single command.

The DEFCON 2 Principle: You don’t react to DDoS attacks. You pre-configure your reactions and let the network execute them autonomously.

About the Author

Herve Hildenbrand is a senior network and infrastructure architect with 25+ years of experience across European enterprise and service provider networks. He leads network infrastructure for a major payment processor and specialises in VXLAN/EVPN fabrics, MPLS/BGP backbones, routing design, and large-scale migrations. Connect with Herve and read more of his articles on LinkedIn

DDoS Defense by Design: Architecture That Survives When Everything Else Fails