Archives — Distributed Readings

Cloudflare ↗

Announcing Claude Compliance API support with Cloudflare CASB

Security teams needed visibility and compliance monitoring of Claude Enterprise API usage across their organization without leaving their existing security infrastructure.

security api-design

3 min

Cloudflare ↗

Project Glasswing: what Mythos showed us

Determining whether security-focused LLMs can effectively identify vulnerabilities in live production infrastructure code at scale.

security ml-systems

4 min

Cloudflare ↗

Browser Run: now running on Cloudflare Containers, it’s faster and more scalable

Browser Run needed higher usage limits, better performance, and improved reliability while increasing development velocity for their browser automation service.

distributed-systems load-balancing

3 min

Cloudflare ↗

Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse

A partitioning change to a petabyte-scale ClickHouse cluster caused billing pipeline jobs to stall without obvious error signals in standard metrics.

databases observability

4 min

Cloudflare ↗

How Cloudflare responded to the “Copy Fail” Linux vulnerability

Rapidly detect, investigate, and mitigate a critical Linux kernel privilege escalation vulnerability across a global edge computing fleet without impacting customers.

security distributed-systems

4 min

Cloudflare ↗

When DNSSEC goes wrong: how we responded to the .de TLD outage

When DENIC published invalid DNSSEC signatures for the .de TLD, DNS resolvers like 1.1.1.1 faced a critical decision: reject all .de domain queries due to signature validation failures or serve potentially stale cached responses to maintain availability.

caching distributed-systems

4 min

Cloudflare ↗

Code Orange: Fail Small is complete. The result is a stronger Cloudflare network

Cloudflare needed to make their global edge infrastructure more resilient to configuration changes and prevent widespread outages caused by unsafe deployments.

distributed-systems observability

4 min

Cloudflare ↗

Shutdowns, power outages, and conflict: a review of Q1 2026 Internet disruptions

How to measure, analyze, and publicly report on Internet disruptions caused by geopolitical events, infrastructure attacks, and power outages in real-time across global networks.

observability distributed-systems

4 min

Cloudflare ↗

Making Rust Workers reliable: panic and abort recovery in wasm‑bindgen

Rust panics in Cloudflare Workers were fatal and poisoned the entire worker instance, making applications unreliable when unhandled errors occurred.

security observability

4 min

Cloudflare ↗

Orchestrating AI Code Review at scale

Cloudflare needed to scale code review processes across their engineering organization while maintaining code quality and security standards without overwhelming human reviewers.

ml-systems api-design

3 min

Cloudflare ↗

The AI engineering stack we built internally — on the platform we ship

Cloudflare needed to build an internal AI engineering stack that could handle massive scale (20 million requests, 241 billion tokens) while dogfooding their own platform products.

api-design ml-systems

4 min

Cloudflare ↗

Agents Week: network performance update

Cloudflare needed to improve request handling performance across its global network to maintain competitive advantage over other CDNs.

distributed-systems load-balancing

4 min

Cloudflare ↗

Browser Run: give your agents a browser

AI agents needed a way to interact with browsers at scale while maintaining visibility and control over automated actions, requiring higher concurrency and real-time debugging capabilities.

real-time-systems ml-systems

3 min

Cloudflare ↗

Building the foundation for running extra-large language models

How to efficiently run inference for extra-large language models on edge infrastructure while maintaining low latency and high throughput across distributed Cloudflare servers.

ml-systems distributed-systems

4 min

Cloudflare ↗

Introducing Agent Lee - a new interface to the Cloudflare stack

Users had to manually navigate multiple tabs and interfaces within the Cloudflare dashboard to troubleshoot issues and manage their infrastructure, creating friction in the workflow.

api-design security

4 min

Cloudflare ↗

Introducing the Agent Readiness score. Is your site agent-ready?

Website owners needed a way to measure and understand how well their sites support AI agents and web crawlers for indexing and integration.

api-design observability

4 min

Cloudflare ↗

A one-line Kubernetes fix that saved 600 hours a year

Cloudflare's Atlantis instance took 30 minutes to restart due to a Kubernetes volume permission bottleneck.

observability storage-systems

4 min

Cloudflare ↗

Cloudflare Client-Side Security: smarter detection, now open to everyone

Detecting sophisticated client-side security threats like zero-day exploits while minimizing false positives in real-time across millions of requests.

security ml-systems

4 min

Cloudflare ↗

Our ongoing commitment to privacy for the 1.1.1.1 public DNS resolver

How to design a public DNS resolver that prioritizes user privacy while maintaining performance and trustworthiness at scale.

security distributed-systems

4 min

Cloudflare ↗

Building a security overview dashboard for actionable insights

Security teams were overwhelmed by the volume of raw security data across Cloudflare's platform, making it difficult to prioritize and act on vulnerabilities and threats efficiently.

security observability

3 min

Cloudflare ↗

Investigating multi-vector attacks in Log Explorer

Security teams lacked a unified view across multiple Cloudflare datasets, making it difficult to identify and investigate multi-vector attacks that span different attack surfaces and log sources.

observability security

3 min