Browse past weeks of engineering reads.
Cloudflare's Atlantis instance took 30 minutes to restart due to a Kubernetes volume permission bottleneck.
How to automatically convert TypeScript workflow code into visual step diagrams for users to understand and interact with their workflows in the dashboard.
Cloudflare's existing server fleet could not keep pace with rapidly growing global traffic demands, requiring a new generation of hardware with significantly higher compute and network throughput.
Cloudflare needed to significantly increase edge compute throughput per server but faced a tradeoff where high-core-count CPUs came with smaller per-core L3 cache, risking latency penalties for cache-dependent workloads.
How to safely execute untrusted AI-generated code with minimal latency and resource overhead.
Customers needed precise control over where their data is processed geographically to meet diverse compliance requirements (e.g., GDPR, data sovereignty laws), but existing pre-defined regional options were too coarse-grained to cover all regulatory and performance needs.
Running large AI models for agent workloads on edge infrastructure was cost-prohibitive and required significant inference stack optimization to serve models like Kimi K2.5 efficiently at scale.
Italy's 'Piracy Shield' system forces Internet infrastructure providers like Cloudflare to block content at the network level without proper oversight or due process, leading to disproportionate overblocking of legitimate content.
Organizations struggle to discover and secure AI-powered applications across their infrastructure, especially shadow AI deployments that teams spin up without central oversight, creating security blind spots.
Standard defensive security tools miss logic flaws and vulnerabilities in APIs because they lack understanding of stateful API interactions and business logic flows.
Traditional bot-blocking approaches are insufficient for preventing account abuse (e.g., credential stuffing, fake account creation) because sophisticated attacks increasingly involve human-like behavior or actual humans, bypassing conventional bot detection.
Security teams were overwhelmed by the volume of raw security data across Cloudflare's platform, making it difficult to prioritize and act on vulnerabilities and threats efficiently.
Enterprise SASE (Secure Access Service Edge) migrations traditionally take 18+ months due to architectural complexity, requiring organizations to integrate networking and security across global infrastructure.
Cloudflare's open-source Pingora proxy had request smuggling vulnerabilities when deployed as an ingress proxy, allowing attackers to exploit HTTP parsing discrepancies to bypass security controls and route malicious requests.
Organizations struggle to migrate from legacy network security architectures to modern SASE (Secure Access Service Edge) solutions, facing risks from accumulated technical debt and complex dependencies in their existing infrastructure.
Security teams lacked a unified view across multiple Cloudflare datasets, making it difficult to identify and investigate multi-vector attacks that span different attack surfaces and log sources.
AI agents hitting Cloudflare error pages received heavyweight HTML responses that consumed excessive tokens and required brittle parsing, making automated error handling inefficient and costly.
Organizations struggle with Internet-facing blind spots in their attack surface, lacking continuous visibility into security gaps and risk exposures across their external-facing assets.
The Cloudflare One SASE client's Proxy Mode relied on user-space TCP stacks for tunneling traffic, introducing significant overhead that limited throughput and increased latency for end users.
Traditional WAFs force a trade-off between logging (risking missed attacks) and blocking (risking false positives), requiring extensive manual tuning to balance security coverage with availability.
Tunnel layering in Cloudflare's WARP/One client caused MTU mismatches, leading to silently dropped oversized packets that degraded connectivity and resilience.
Organizations face fragmented data security across endpoints, network traffic, cloud applications, and AI prompts, making it difficult to enforce consistent data loss prevention (DLP) policies as data flows through diverse channels including RDP sessions and AI copilots.
Enterprises connecting multiple private networks via tunnels frequently encounter overlapping IP address ranges (e.g., multiple sites using 10.0.0.0/8), making traditional routing tables unable to determine which tunnel should receive return traffic.