Nicholas Ferguson

Synapse: Pure-Rust Edge WAF Sensor

A10 Networks · Replaced an 8-hop cloud pipeline with a 25MB binary

▸

The legacy ThreatX architecture was an 8-hop cloud pipeline: sensor to Kafka to ETL to MongoDB to the HackerMind engine and back. Every detection decision took 1–10 seconds. Post-acquisition, I rebuilt it as a single Rust binary on Cloudflare's Pingora framework that makes all decisions at the edge in 450μs.

Why Rust, not Go or C++? This is a network proxy sitting in the hot path of every HTTP request. Go's garbage collector introduces unpredictable latency spikes, which is unacceptable for a security sensor. C++ would work but doesn't give you memory safety guarantees at the development velocity we needed. Rust gives you both: zero-cost abstractions, no GC pauses, and the compiler catches entire categories of bugs before they reach production.

The detection pipeline runs in ~75μs. Context building (parse method, URI, headers, compute feature flags) → actor store lookup → credential stuffing check → profile anomaly check against endpoint baselines → candidate selection via bitmask index (filters 237 rules down to ~35 candidates) → rule evaluation with short-circuit on first false → entity tracking with time-decay risk scoring → verdict. The bitmask indexing alone skips 90%+ of rules. Candidate caching gives ~1μs hits on the LRU.

Campaign correlation is the interesting part. Most WAFs evaluate requests in isolation. Synapse models attack infrastructure as a connected graph with node types for IPs, JA4 fingerprints, auth tokens, ASNs, user agents. Edge weights quantify relationships: shared token (0.95), shared fingerprint (0.8), same ASN (0.3). Multi-hop traversal discovers indirect relationships: two IPs sharing no direct attribute linked through a third. Campaign fires when 3+ connected nodes with combined weight ≥ 2.0.

DLP scanning uses Aho-Corasick multi-pattern matching, 30-50% faster than sequential regex. 22+ patterns covering credit cards (Luhn validation), SSNs, IBAN (mod-97), API keys, JWTs, private keys, crypto addresses. Content-Type short circuit skips binary. 8KB inspection cap. Benchmarks: ~21μs per 4KB, ~42μs per 8KB.

All worker threads share a single learning state via Arc<RwLock<Synapse>>. Before my rewrite, thread_local! storage meant each thread learned independently. Now all threads contribute to shared knowledge. Internal stores use parking_lot::RwLock, validated at 200 virtual users with zero lock contention. DashMap for lock-free concurrent HashMap where needed.

450μs

full-stack e2e

38K+

evals/sec (single thread)

99.8%

OWASP CRS coverage

25MB

single binary

2,200+

test cases

172

API endpoints

RustPingoraAho-CorasickJA4/JA4HDashMapFlatBuffersRayon

Signal Horizon: Fleet Management & Collective Intelligence

A10 Networks · Turning isolated sensors into a coordinated defense network

▸

Signal Horizon is the command plane for the Synapse sensor fleet. Its job is to make a distributed set of edge sensors behave as a single coordinated defense system while keeping tenant data isolated.

Why ClickHouse? Purpose-built for real-time analytics on columnar data. Sub-second queries across high-volume sensor events at a fraction of the cost of Elasticsearch. Column-oriented storage compresses security event data exceptionally well. IPs, attack types, status codes have high cardinality but repeat patterns that compress down hard.

Multi-tenant intelligence sharing without exposing raw data. Threat indicators get HMAC-SHA256 hashed with a tenant-specific key before leaving the boundary. The receiving side can correlate patterns (same hash means same indicator) without seeing the underlying data. "This IP is attacking 5 other tenants" without anyone knowing who.

Fleet management breaks into four components. FleetAggregator handles metrics and health scoring (0-100) with regional views. FleetCommander manages direct commands, broadcasts, and execution tracking through a full lifecycle (pending → sent → success/failed). ConfigManager maintains versioned templates across Prod/Staging/Dev with drift detection. RuleDistributor handles rollouts: immediate, canary (10→50→100%), or scheduled.

War Room is the real-time incident response workspace. Collaboration timeline, Quick Block tool with auto-tagging, fleet-wide block push, campaign association. Horizon Bot auto-creates rooms when high-severity campaigns are detected. The workflow (escalate, investigate, contain, resolve with post-mortem) is designed for the 24/7 SOC team that actually lives in this tool.

Sensors communicate over persistent WebSocket. Bidirectional: commands, config, and rules push down; events and signals push up. Batched telemetry with circuit breaker, exponential backoff, and event queuing on failure. If Signal Horizon goes unreachable, every sensor operates fully autonomously.

TypeScriptReactClickHousePostgreSQLRedisWebSockets

The $1M Contract Save

ThreatX · Diagnosing what two weeks of planned NGINX optimizations couldn't

▸

A major customer ran 400+ self-hosted sensors and handled 150K requests per second on Tuesdays alone. The initial attempt to cut over their production traffic lasted minutes before catastrophic failure. The response from the team was two weeks of planned NGINX optimizations in the sensor.

The problem wasn't NGINX. But nobody knew that yet. As I discovered, we had never actually validated the sensor at that traffic volume before attempting the cutover. So I built a test harness to find out. Stood up a Fargate cluster to generate load and pointed it at a sensor to reproduce the failure under controlled conditions.

Logging wasn't in place for this. So I logged into a sensor and ran dmesg. Wall of text: table full, dropping packet. Got it.

The connection tracking table, where the Linux kernel's stateful firewall (netfilter) tracks unique IP addresses, was exhausting. This isn't active connections. Every IP that touches the sensor gets an entry: blocks, drops, everything. At 150K RPS with the traffic diversity of a major retail site, the default table size fills fast. Disabling conntrack wasn't an option because we were running Docker, which depends on it.

The fix was a systemd service that calculates the conntrack table size dynamically based on available memory and sets it before the network stack comes up. No NGINX tuning required. After the change, sensors handled 150K RPS without breaking a sweat.

The issue had survived one prior failed onboarding attempt. Nobody found it because the symptoms look like a network problem. Application monitoring shows healthy services, the sensor process is fine, and the drops happen at the kernel level before traffic ever reaches userspace. You only find it if you know to check /proc/sys/net/netfilter/nf_conntrack_count. And you only reproduce it if you actually generate enough traffic to fill the table, which requires a purpose-built load test that nobody had done.

The fix enabled the contract. That diagnostic methodology (build a harness, reproduce at scale, look below the application layer) became how I approach problems by default. It's a straight line from this to how Synapse was designed: every decision at the edge, kernel-level awareness built in.

Linux kernelconntrack/netfiltersystemdFargateLoad testingdmesg

Internal Developer Platform: Built from Nothing

Vispero · First platform hire, $50M/year in software shipping on it

▸

When I arrived at Vispero, there was no developer platform. Fragile scripts. An end-of-life NAnt build system. If a developer needed a config changed, they filed a ticket and waited. When I left three years later they had a self-service portal, a CLI, Visual Studio plugins, and 600+ automated pipelines.

The core challenge was six target platforms with one framework. Windows, macOS, iOS, Android, Linux, Embedded Linux, each with completely different toolchains. I built a task-based DSL that abstracted platform-specific compilation. A build file declares what to build. The framework resolves how based on target. Windows = MSBuild + EV code signing. macOS = Xcode + notarization. Embedded Linux = cross-compilation. Self-bootstrapping: you could reproduce any build from a clean machine.

The toil elimination was where it clicked for me. Release engineers managed 600+ pipelines configured in XML. They were constantly copy-pasting configuration and templates to create new pipelines for sandboxes, releases, new versions, and all the devs would need them at once. I built a tool using the GoCD API that let you pick two pipelines from a pre-populated list, compare them in the diff tool of your choice, and push changes back up. Simple tool, massive time savings.

Another one: the company was stuck on a heavily customized Bugzilla instance that didn't integrate with CI/CD. Managers were spending hours every week manually updating issues and compiling notifications for QA on every promoted build. No API, just CGI. I made the build pipelines detect all issues in each build, assemble the notifications, and send them automatically.

The artifact repository and release management dashboards replaced the old process of promoting builds by moving them to different folders. Release managers could see all releases available in each channel (local and CDN), view value stream and dependency maps, and the same dependency data powered safe automated cleanup on the CDN. By this point the versioning database, build metrics database, and artifact database were all integrated into a single platform. Built in C#, SQL Server, and ASP.NET Core.

I also rebuilt the legacy license management system (an XML-backed MFC desktop app with 3+ day turnaround) into a C#/ASP.NET web platform with SQL Server. Government deployments worth $2M+ needed audit trails, approval workflows, and compliance reporting that didn't exist. The new system gave authorized users immediate turnaround with auto-approval for pre-authorized renewals.

The hybrid build infrastructure was its own project. AWS control plane scheduling jobs, on-prem VMware spinning up ephemeral agents on SSD arrays, Ansible provisioning them via Kerberos for credential-free WinRM auth into the Active Directory environment, build runs, agent destroyed. Every build got a clean environment. No drift, no "works on my machine." I replaced VMware golden images with Ansible playbooks because golden images are snapshots in time that diverge the moment you deploy them. At 600+ pipelines across six platforms, drift creates impossible-to-debug problems.

Observability came next. Bare-metal monitoring, custom build metrics tracking trends in build size, compile times, failure rates. SQL Server for structured data, InfluxDB and Grafana for time-series dashboards, Beats and Telegraf for collection, plus custom collectors in PowerShell for anything the off-the-shelf agents couldn't reach. This was the foundation for identifying bottlenecks and justifying infrastructure investments with data instead of anecdotes.

The developer experience layer tied it all together. I introduced Jira with a knowledge base that resolved half of incoming requests before they were even submitted. I'm not a fan of intake forms with dozens of fields; it feels like asking users to organize my backlog for me. Most of my forms started with three inputs: a title, "what's the problem?", and "is this blocking your work or impacting customers?" If I needed more context, I'd ask in the comments. Backend automation handled the actual categorization, issue type assignment, and trend tracking. Service and support requests dropped by nearly half again when I introduced self-service pipeline configuration. I started a monthly newsletter and regular platform surveys. The surveys received a 75% response rate, and satisfaction with the platform averaged 4.5/5 in my final year.

Then ransomware validated everything. The corporate network got hit. Took down most of the org for a week. Because I'd migrated all dev services to AWS (source control, build infrastructure, artifact storage), engineering was completely unaffected. Kept shipping. It wasn't designed as disaster recovery. It was a side effect of breaking dependency on corporate IT.

$50M

/year shipped

6

target platforms

600+

pipelines

75%

survey response rate

4.5/5

platform satisfaction

$2M+

licenses managed

C# / .NETASP.NET CoreSQL ServerPowerShellGoCDAWSVMwareAnsibleKerberosInfluxDBGrafanaJira

Endpoint Agent: Five Platforms, Kernel-Level on Each

A10 Networks · Wrote v1 for Windows, Linux, macOS, Android, and iOS

▸

A10's firewall team needed an endpoint agent that intercepts and routes traffic bound for AI model provider APIs to an AI firewall gateway for inspection. It needed to run privileged and be tamper-resistant against end users. It was a P1 priority, and there was nobody in the building who had shipped to all five target platforms with kernel-level drivers and could code the agent itself. I had. So I wrote the v1 for all five.

Linux: eBPF for traffic interception, Rust userspace for event processing and control plane communication, systemd integration for deployment. Why eBPF over a kernel module? Safety: runs in a sandboxed VM with a verifier, can't crash the kernel. Portability: CO-RE (Compile Once, Run Everywhere) works across kernel versions without recompilation. Operations: attach and detach without reboots. When you're deploying across diverse customer environments with different kernels, those properties matter more than raw flexibility.

Windows: v1 shipped with WinDivert for traffic interception using a third-party certificate, packaged as MSIs. Hardened v2 with our own EV code signing certificate, which means reputation management with Microsoft SmartScreen, a hardware FIPS device to store the key, and all the compliance that comes with it. Runs as a Windows protected service, deployed through standard MDM solutions.

macOS: Requires an Apple developer account and approved network extension entitlements before you can even begin real development. If you've never navigated Apple's entitlement approval process for privileged network access, just know it's a gating factor that blocks most teams for weeks.

The number of people who have shipped privileged, tamper-resistant agents to all five major platforms and know where the landmines are on each is very small. I'd already released production software to every one of them. That's what made it possible to write MVPs fast enough to start shipping while defining the roadmap for hardened versions.

eBPFRustWinDivertEV code signingMSImacOS entitlementssystemdMDM

Professional Services: First Engineer in the Role

ThreatX · Created a new revenue capability by bridging engineering and enterprise sales

▸

I proposed the role and became ThreatX's first engineer to deliver formal professional services: solution architecture and post-sales integration support for enterprise customers. It wasn't a title change. It was identifying a gap where deals were stalling because nobody could bridge the distance between what the product did and what enterprise environments actually looked like.

I personally delivered four enterprise engagements. The common thread was Microsoft-ecosystem customers that no other engineer on the team could support. That meant debugging IIS integrations, shipping a VMware appliance for a customer that wouldn't run Linux containers, and translating between enterprise IT terminology and product capabilities during pre-sales calls. These weren't support tickets. They were the difference between a closed deal and a lost one.

The broader impact was opening a market segment. Before this, Microsoft-heavy enterprises were effectively off the table. After, sales had a repeatable path into those accounts with engineering support that could speak their language. It established professional services as a revenue-generating capability the company didn't have before.

For context: before joining ThreatX I'd nearly been hired by AWS Professional Services (hiring freeze) and VMware as a Solution Architect. The consulting muscle was already there. This was just the first time I got to apply it inside a product company.

IISVMwareWindows ServerSolution ArchitecturePre-sales

Open Source: Security Testing Tools

Apparatus · Crucible · Chimera

▸

Apparatus: AI-augmented network simulation supporting 13 protocols with fault injection, honeypot deployment, and traffic analysis. Built for security testing and network resilience validation.

Crucible: Attack simulation framework with 119 pre-built scenarios covering OWASP Top 10, API abuse patterns, and advanced persistent threat chains. Used to validate that Synapse actually catches what it claims to catch.

Chimera: Intentionally vulnerable API server spanning 450+ endpoints across 22 industry verticals. Purpose-built target environment for security tool validation and training. Includes an OWASP LLM teaching environment with guided hints and vulnerability code diffs.

All publicly available on GitHub.

TypeScriptPythonFlaskReactDockerOWASP

AI-Augmented Development: The Actual Methodology

Multi-perspective review gates, behavioral test analysis, circuit breakers

▸

"AI-augmented" gets used loosely. Here's what it actually means in practice: a structured engineering methodology where I lead a team of AI agents the same way a senior engineer leads junior developers. I set the architecture, define patterns, review every output, and make every design decision. The agents implement against my specs, write tests, generate documentation. The quality gates keep them honest.

The outer loop is implementation. I define the architecture, write specs, and set patterns. Agents implement against those specs (code, tests, documentation). Same as assigning work to junior developers: clear requirements, bounded scope, review everything before it merges.

Multi-perspective review gates. Every change gets reviewed from multiple angles: security implications, performance characteristics, code quality and pattern consistency. These are separate review perspectives that catch different categories of issues, the same way a security engineer and a performance engineer would see different things in the same PR.

The test gap analysis loop is the interesting one. This isn't code coverage. It's behavioral coverage. A dedicated review agent examines the test suite looking for gaps in behavior testing: are we testing the edge cases? Are there trivial assertions masking real gaps? Is the test actually validating the contract or just confirming the implementation? When agents write tests, those tests go through this review where another agent either approves or requests changes. The effect is quality control on the tests themselves. No rubber-stamp assertions, no tests that pass by coincidence.

Circuit breakers prevent compounding errors. If a review loop flags an issue, the pipeline stops. It doesn't try to self-heal or auto-fix and retry indefinitely. The failure gets surfaced to me with context: what was attempted, what failed, what the reviewers flagged. I make the architectural decision about how to proceed. The system is designed to fail fast and fail loudly rather than silently accumulate technical debt.

The result is 9,775 test cases across four platforms. Not because I heroically wrote ten thousand tests. Because the methodology made comprehensive testing the path of least resistance. It's harder to skip validation than to do it. That's the point. The agents handle the volume. The gates handle the quality. I handle the architecture and decisions.

9,775

test cases

4

platforms shipped

Documentation Platform: From Manual Editing to Automated Publishing

ThreatX / A10 Networks · Information architecture, style guide, docs-as-code pipeline

▸

I authored all of ThreatX's published documentation, before and after the A10 acquisition. But the more interesting story is how I scaled from doing it all manually to building a platform that made quality documentation a natural byproduct of the development process.

The starting point was unsustainable. I was creating every AsciiDoc file myself and manually editing other people's contributions for quality and consistency. Content review was a bottleneck that scaled linearly with my time. That works when you're small. It doesn't work when every minor product version needs to ship with versioned documentation simultaneously.

So I built the infrastructure to make it scale. Designed the information architecture from scratch: how content is organized, how topics relate to each other, how a user navigates from problem to solution. Developed a custom style guide that codified the voice, terminology, and structural patterns so contributions didn't need me as a quality filter.

Then I automated the enforcement. Built a docs-as-code platform on Antora with automated quality checks using Vale and a custom style guide ruleset. Re-usable AsciiDoc components meant common patterns (admonitions, API references, configuration examples) were consistent by construction, not by review. Contribution templates lowered the barrier for other engineers to write docs that met the standard without my intervention.

The outcome: every minor version of the product shipped with versioned documentation at the same time. Architecture decision records, operational runbooks, API references, onboarding guides, all maintained in the same pipeline as the code. This was critical for the on-prem product launch, which required customer self-service documentation that had to be complete and correct on day one.

AsciiDocAntoraValeGitCI/CD

Documentation-First

AI-Augmented Engineering

Test-Driven Confidence

End-to-End Ownership