Google Cloud

Migrating to Google Cloud’s Application Load Balancer: A practical guide

Migrating business-critical load balancer configurations from on-premises hardware solutions to Google Cloud while preserving existing traffic manipulation logic.

load-balancing distributed-systems
5 min
AWS

Building hybrid multi-tenant architecture for stateful services on AWS

Building a multi-tenant architecture that isolates tenants without requiring separate AWS accounts while maintaining stateful service deployments.

load-balancing distributed-systems
5 min
Cloudflare

Browser Run: now running on Cloudflare Containers, it’s faster and more scalable

Browser Run needed higher usage limits, better performance, and improved reliability while increasing development velocity for their browser automation service.

distributed-systems load-balancing
3 min
Netflix

State of Routing in Model Serving

Netflix needed to design a domain-independent traffic routing system for their ML model serving infrastructure that could handle personalized experiences at scale across multiple domains while maintaining high availability.

microservices load-balancing
5 min
Cloudflare

The AI engineering stack we built internally — on the platform we ship

Cloudflare needed to build an internal AI engineering stack that could handle massive scale (20 million requests, 241 billion tokens) while dogfooding their own platform products.

api-design ml-systems
4 min
Cloudflare

Agents Week: network performance update

Cloudflare needed to improve request handling performance across its global network to maintain competitive advantage over other CDNs.

distributed-systems load-balancing
4 min
Cloudflare

Building the foundation for running extra-large language models

How to efficiently run inference for extra-large language models on edge infrastructure while maintaining low latency and high throughput across distributed Cloudflare servers.

ml-systems distributed-systems
4 min
Cloudflare

Cloudflare’s AI Platform: an inference layer designed for agents

Developers needed a unified way to access multiple AI model providers without managing separate integrations and API contracts for each one.

api-design microservices
4 min
Cloudflare

Rearchitecting the Workflows control plane for the agentic era

Cloudflare Workflows needed to support higher concurrency and creation rate limits to enable durable background agents at scale.

distributed-systems rate-limiting
4 min
AWS

Unlock efficient model deployment: Simplified Inference Operator setup on Amazon SageMaker HyperPod

Simplifying the deployment and scheduling of machine learning inference workloads across multiple instances and instance types on Amazon SageMaker HyperPod.

ml-systems distributed-systems
4 min
Cloudflare

500 Tbps of capacity: 16 years of scaling our global network

How to scale a global content delivery and DDoS mitigation network to handle massive throughput (500 Tbps) while maintaining capacity to protect against record-breaking attacks.

load-balancing distributed-systems
3 min
Cloudflare

Introducing Programmable Flow Protection: custom DDoS mitigation logic for Magic Transit customers

Magic Transit customers needed the ability to define and enforce custom DDoS mitigation logic for proprietary and non-standard UDP protocols without being limited to Cloudflare's pre-built detection rules.

security distributed-systems
4 min
Cloudflare

Why we're rethinking cache for the AI era

CDN cache systems were designed for human traffic patterns but struggle with the distinct access patterns of AI bot traffic, which now represents over 10 billion requests per week and threatens cache efficiency.

caching distributed-systems
4 min
Meta

Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads

Meta needed to scale their ads ranking models to LLM-scale complexity and size while maintaining inference latency requirements for real-time ad serving.

ml-systems real-time-systems
5 min
AWS

How Salesforce migrated from Cluster Autoscaler to Karpenter across their fleet of 1,000 EKS clusters

Salesforce's Cluster Autoscaler could not efficiently scale and manage node provisioning across their fleet of 1,000+ EKS clusters, likely suffering from slow scaling decisions, suboptimal bin-packing, and operational complexity at massive scale.

distributed-systems load-balancing
4 min
Cloudflare

Inside Gen 13: how we built our most powerful server yet

Cloudflare's existing server fleet could not keep pace with rapidly growing global traffic demands, requiring a new generation of hardware with significantly higher compute and network throughput.

distributed-systems load-balancing
4 min