Browse past weeks of engineering reads.
A partitioning change to a petabyte-scale ClickHouse cluster caused billing pipeline jobs to stall without obvious error signals in standard metrics.
AI agents lack persistent memory mechanisms to retain context, learn from interactions, and improve decision-making over time.
Providing agents, developers, and automations with scalable, Git-compatible versioned storage that can handle tens of millions of repositories without forcing them to manage infrastructure.
GPU memory bandwidth constraints were limiting LLM inference efficiency across Cloudflare's distributed edge network, requiring optimization to deliver faster and cheaper inference.
Cloudflare's Atlantis instance took 30 minutes to restart due to a Kubernetes volume permission bottleneck.