Archives — Distributed Readings

Google Cloud ↗

Migrating to Google Cloud’s Application Load Balancer: A practical guide

Migrating business-critical load balancer configurations from on-premises hardware solutions to Google Cloud while preserving existing traffic manipulation logic.

load-balancing distributed-systems

5 min

AWS ↗

Building hybrid multi-tenant architecture for stateful services on AWS

Building a multi-tenant architecture that isolates tenants without requiring separate AWS accounts while maintaining stateful service deployments.

load-balancing distributed-systems

5 min

Cloudflare ↗

Browser Run: now running on Cloudflare Containers, it’s faster and more scalable

Browser Run needed higher usage limits, better performance, and improved reliability while increasing development velocity for their browser automation service.

distributed-systems load-balancing

3 min

Netflix ↗

State of Routing in Model Serving

Netflix needed to design a domain-independent traffic routing system for their ML model serving infrastructure that could handle personalized experiences at scale across multiple domains while maintaining high availability.

microservices load-balancing

5 min

Cloudflare ↗

The AI engineering stack we built internally — on the platform we ship

Cloudflare needed to build an internal AI engineering stack that could handle massive scale (20 million requests, 241 billion tokens) while dogfooding their own platform products.

api-design ml-systems

4 min

Cloudflare ↗

Agents Week: network performance update

Cloudflare needed to improve request handling performance across its global network to maintain competitive advantage over other CDNs.

distributed-systems load-balancing

4 min

Cloudflare ↗

Building the foundation for running extra-large language models

How to efficiently run inference for extra-large language models on edge infrastructure while maintaining low latency and high throughput across distributed Cloudflare servers.

ml-systems distributed-systems

4 min

Cloudflare ↗

Cloudflare’s AI Platform: an inference layer designed for agents

Developers needed a unified way to access multiple AI model providers without managing separate integrations and API contracts for each one.

api-design microservices

4 min

Cloudflare ↗

Rearchitecting the Workflows control plane for the agentic era

Cloudflare Workflows needed to support higher concurrency and creation rate limits to enable durable background agents at scale.

distributed-systems rate-limiting

4 min

AWS ↗

Unlock efficient model deployment: Simplified Inference Operator setup on Amazon SageMaker HyperPod

Simplifying the deployment and scheduling of machine learning inference workloads across multiple instances and instance types on Amazon SageMaker HyperPod.

ml-systems distributed-systems

4 min

Cloudflare ↗

500 Tbps of capacity: 16 years of scaling our global network

How to scale a global content delivery and DDoS mitigation network to handle massive throughput (500 Tbps) while maintaining capacity to protect against record-breaking attacks.

load-balancing distributed-systems

3 min

Cloudflare ↗

Introducing Programmable Flow Protection: custom DDoS mitigation logic for Magic Transit customers

Magic Transit customers needed the ability to define and enforce custom DDoS mitigation logic for proprietary and non-standard UDP protocols without being limited to Cloudflare's pre-built detection rules.

security distributed-systems

4 min

Cloudflare ↗

Why we're rethinking cache for the AI era

CDN cache systems were designed for human traffic patterns but struggle with the distinct access patterns of AI bot traffic, which now represents over 10 billion requests per week and threatens cache efficiency.

caching distributed-systems

4 min

Meta ↗

Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads

Meta needed to scale their ads ranking models to LLM-scale complexity and size while maintaining inference latency requirements for real-time ad serving.

ml-systems real-time-systems

5 min

AWS ↗

How Salesforce migrated from Cluster Autoscaler to Karpenter across their fleet of 1,000 EKS clusters

Salesforce's Cluster Autoscaler could not efficiently scale and manage node provisioning across their fleet of 1,000+ EKS clusters, likely suffering from slow scaling decisions, suboptimal bin-packing, and operational complexity at massive scale.

distributed-systems load-balancing

4 min

Cloudflare ↗

Inside Gen 13: how we built our most powerful server yet

Cloudflare's existing server fleet could not keep pace with rapidly growing global traffic demands, requiring a new generation of hardware with significantly higher compute and network throughput.

distributed-systems load-balancing

4 min