Archives — Distributed Readings

AWS ↗

How Synthesia optimizes generative AI video inference on Amazon EC2 G7e instances

Synthesia needed to maximize GPU utilization during video inference on EC2 G7e instances by reducing idle time caused by sequential GPU compute, data transfer, and post-processing operations.

ml-systems real-time-systems

5 min

Google ↗

A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity

Enable on-device AI models to coordinate complex tasks across external data sources while maintaining persistent user context and proactive engagement without relying solely on cloud connectivity.

api-design ml-systems

5 min

Google ↗

A2UI v0.9: The New Standard for Portable, Framework-Agnostic Generative UI

AI agents needed a standardized way to generate UI components that work across different platforms and frameworks without being tightly coupled to any specific technology stack.

api-design real-time-systems

5 min

Google ↗

Build Long-running AI agents that pause, resume, and never lose context with ADK

Building production-grade AI agents that can maintain context and state across long-running enterprise workflows spanning days or weeks without losing information during idle periods or server restarts.

api-design distributed-systems

5 min

Google ↗

Building real-world on-device AI with LiteRT and NPU

Mobile developers faced performance and battery inefficiency when running AI models on CPU/GPU, limiting real-time AI applications on edge devices.

api-design ml-systems

5 min

Google ↗

Google Tensor SDK Beta with LiteRT

Developers needed a unified way to build, deploy, and run high-performance machine learning models directly on edge devices (Google Pixel TPU) with reliable fallback mechanisms.

ml-systems api-design

5 min

Google ↗

Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding

Autoregressive LLM decoding suffers from sequential bottlenecks where tokens must be generated one-at-a-time, limiting throughput and inference speed on hardware accelerators like TPUs.

ml-systems real-time-systems

5 min

Google Cloud ↗

Building Event-Driven Data Agents with BigQuery, Pub/Sub, and ADK

Enterprise systems need to react to events in real-time rather than relying on slow batch jobs or inefficient polling microservices that create dangerous delays in detecting critical issues like fraud or supply chain disruptions.

real-time-systems messaging-queues

5 min

Google Cloud ↗

Gemini Live Agent Challenge: Announcing the winners and highlights

How to enable developers to build multimodal AI agents that can process and respond to real-time audio, video, text, and generation capabilities beyond traditional text-based interfaces.

real-time-systems api-design

5 min

Google Cloud ↗

How BASF manages thousands of supply chain decisions with AlphaEvolve’s agentic algorithms

BASF needed to manage and optimize thousands of interdependent supply chain decisions across 180 global production sites where weather and regulatory changes can cause cascading disruptions in a two-year production pipeline.

distributed-systems ml-systems

5 min

Google Cloud ↗

Level Up Your Agents: Announcing Google's Official Skills Repository

AI agents built on Google Cloud need access to accurate, current, and grounded information about Google's products and APIs to function effectively.

api-design ml-systems

5 min

Spotify ↗

Better Experiments with LLM Evals — A funnel, not a fork

Efficiently evaluating and validating LLM-generated outputs at scale during experimentation without manual review bottlenecks.

ml-systems observability

4 min

Meta ↗

Migrating Data Ingestion Systems at Meta Scale

Meta needed to migrate their legacy data ingestion system to a new architecture while maintaining reliability and consistency for real-time social graph snapshots at massive scale.

distributed-systems storage-systems

5 min

Meta ↗

Reel Friends: Building Social Discovery that Scales to Billions

Building a social discovery system that efficiently surfaces Reels watched and reacted to by friends while scaling to billions of users.

caching distributed-systems

5 min

Stripe ↗

Five vertical SaaS insights from Sessions 2026

Vertical SaaS platforms needed to expand their service offerings beyond pure software to include integrated payments, financial services, and agentic commerce capabilities to build more defensible and durable businesses.

api-design distributed-systems

3 min

Netflix ↗

Smarter Live Streaming at Scale: Rolling Out VBR for All Netflix Live Events

Netflix needed to optimize bandwidth utilization and video quality for live streaming events at global scale by moving from constant bitrate to variable bitrate encoding.

real-time-systems distributed-systems

5 min

Netflix ↗

Stop Answering the Same Question Twice: Interval-Aware Caching for Druid at Netflix Scale

Query performance degradation at massive scale (10+ trillion rows, 15M events/second) where repeated identical queries were consuming excessive resources and impacting latency.

caching databases

5 min

Netflix ↗

The Human Infrastructure: How Netflix Built the Operations Layer Behind Live at Scale

Netflix needed to build reliable operations infrastructure to support live streaming at massive scale, going from one show per month to nine shows per day with tens of millions of concurrent viewers.

microservices observability

5 min

Spotify ↗

Inside the Archive: The Tech Behind Your 2025 Wrapped Highlights

How to identify and surface the most interesting and meaningful listening moments from a year's worth of user streaming data to create personalized narrative highlights for Wrapped.

data-pipelines ml-systems

4 min

Spotify ↗

Our Multi-Agent Architecture for Smarter Advertising

Spotify needed to optimize ad targeting and delivery at scale by coordinating multiple specialized systems to make smarter advertising decisions rather than relying on monolithic ad selection logic.

microservices ml-systems

4 min

Stripe ↗

How Stripe Radar helps prevent free trial abuse

Detecting and preventing fraudulent behavior in free trial signups, such as repeated trial abuse and missed cancellations, at scale with high accuracy.

ml-systems api-design

4 min

Stripe ↗

How agents, digital wallets, and trust are rewriting checkout

Understanding and optimizing the checkout conversion funnel across diverse ecommerce businesses to identify what drives successful transactions in modern online payment flows.

api-design real-time-systems

4 min

Stripe ↗

Insights from Shoptalk 2026: How agents are changing retail

How to integrate AI agents into ecommerce platforms to enable seamless product discovery and checkout across embedded and third-party surfaces.

api-design real-time-systems

4 min

Stripe ↗

Three of the biggest fraud trends from MRC Vegas 2026

Detecting and preventing sophisticated fraud attacks while minimizing friction for legitimate users in payment systems.

api-design security

4 min

AWS ↗

Modernizing KYC with AWS serverless solutions and agentic AI for financial services

Traditional rule-based KYC (Know Your Customer) systems lack the autonomous decision-making capability and real-time validation speed needed for modern financial services compliance operations.

serverless real-time-systems

5 min

AWS ↗

Real-time analytics: Oldcastle integrates Infor with Amazon Aurora and Amazon Quick Sight

Oldcastle needed to overcome the limitations of traditional ERP reporting to enable real-time analytics and dashboards for their Infor ERP system.

databases real-time-systems

5 min

Airbnb ↗

Building a fault-tolerant metrics storage system at Airbnb

Building a metrics storage system capable of ingesting 50 million samples per second while reliably storing 2.5 petabytes of time series data at scale.

observability storage-systems

5 min

Cloudflare ↗

Add voice to your agent

Enabling developers to build conversational agents with real-time voice capabilities without requiring complex infrastructure setup.

real-time-systems api-design

4 min

Cloudflare ↗

Browser Run: give your agents a browser

AI agents needed a way to interact with browsers at scale while maintaining visibility and control over automated actions, requiring higher concurrency and real-time debugging capabilities.

real-time-systems ml-systems

3 min

Cloudflare ↗

Introducing Flagship: feature flags built for the age of AI

Third-party feature flag services introduce unacceptable latency for applications requiring sub-millisecond flag evaluation at global scale.

caching distributed-systems

4 min

AWS ↗

Build a multi-tenant configuration system with tagged storage patterns

Building a scalable multi-tenant configuration service that maintains strict tenant isolation while supporting real-time updates without cache staleness or downtime.

caching storage-systems

5 min

Meta ↗

Escaping the Fork: How Meta Modernized WebRTC Across 50+ Use Cases

Meta needed to modernize WebRTC across 50+ use cases while maintaining synchronization with upstream open-source development, avoiding the drift that typically occurs when large projects fork internally.

distributed-systems real-time-systems

5 min

AWS ↗

Automate safety monitoring with computer vision and generative AI

Detecting safety hazards in real-time across hundreds of distributed operational sites using video feeds while maintaining low latency and managing the computational complexity of processing multiple camera streams.

real-time-systems distributed-systems

5 min

AWS ↗

How Aigen transformed agricultural robotics for sustainable farming with Amazon SageMaker AI

Aigen needed to scale machine learning pipelines across hundreds of distributed edge solar robots while managing data labeling and model training challenges in agricultural robotics.

ml-systems distributed-systems

5 min

Airbnb ↗

What COVID did to our forecasting models (and what we built to handle the next shock)

Building forecasting models that remain accurate during sudden market shocks like a global pandemic, where historical data no longer predicts future outcomes.

ml-systems observability

5 min

Cloudflare ↗

Cloudflare Client-Side Security: smarter detection, now open to everyone

Detecting sophisticated client-side security threats like zero-day exploits while minimizing false positives in real-time across millions of requests.

security ml-systems

4 min

Meta ↗

Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads

Meta needed to scale their ads ranking models to LLM-scale complexity and size while maintaining inference latency requirements for real-time ad serving.

ml-systems real-time-systems

5 min

LinkedIn ↗

Engineering the next generation of LinkedIn’s Feed

LinkedIn's Feed needed to evolve to handle increasing content diversity, real-time ranking signals, and personalization at massive scale.

real-time-systems ml-systems

3 min

AWS ↗

Build priority-based message processing with Amazon MQ and AWS App Runner

Standard message queues process messages in FIFO order, lacking the ability to prioritize urgent messages over lower-priority ones, which can cause critical tasks to wait behind less important work during high load.

messaging-queues real-time-systems

5 min

AWS ↗

Mastering millisecond latency and millions of events: The event-driven architecture behind the Amazon Key Suite

The Amazon Key Suite had a tightly coupled monolithic architecture that struggled with reliability and scalability when processing millions of events at millisecond latency requirements across multiple service integrations.

microservices messaging-queues

5 min

Cloudflare ↗

Always-on detections: eliminating the WAF “log versus block” trade-off

Traditional WAFs force a trade-off between logging (risking missed attacks) and blocking (risking false positives), requiring extensive manual tuning to balance security coverage with availability.

security real-time-systems

4 min

Cloudflare ↗

Ending the "silent drop": how Dynamic Path MTU Discovery makes the Cloudflare One Client more resilient

Tunnel layering in Cloudflare's WARP/One client caused MTU mismatches, leading to silently dropped oversized packets that degraded connectivity and resilience.

distributed-systems real-time-systems

4 min

Cloudflare ↗

Inside Gen 13: how we built our most powerful server yet

Cloudflare's existing server fleet could not keep pace with rapidly growing global traffic demands, requiring a new generation of hardware with significantly higher compute and network throughput.

distributed-systems load-balancing

4 min

Cloudflare ↗

Launching Cloudflare’s Gen 13 servers: trading cache for cores for 2x edge compute performance

Cloudflare needed to significantly increase edge compute throughput per server but faced a tradeoff where high-core-count CPUs came with smaller per-core L3 cache, risking latency penalties for cache-dependent workloads.

distributed-systems caching

4 min

Dropbox ↗

Inside the feature store powering real-time AI in Dropbox Dash

Dropbox Dash needs to rank and retrieve relevant context across a user's work in real time, requiring low-latency access to precomputed and real-time features for AI-driven search and recommendation models.

ml-systems real-time-systems

3 min

Meta ↗

FFmpeg at Meta: Media Processing at Scale

Meta needed to handle massive-scale media processing (encoding, transcoding, filtering) across its family of apps, requiring efficient orchestration of complex audio/video pipelines using FFmpeg at an unprecedented scale.

storage-systems distributed-systems

5 min

Meta ↗

Friend Bubbles: Enhancing Social Discovery on Facebook Reels

Facebook Reels needed a way to enhance social discovery by surfacing content that friends have interacted with, requiring real-time computation of relationship strength and ranking of friend-engaged content at massive scale.

ml-systems real-time-systems

5 min

Meta ↗

How Advanced Browsing Protection Works in Messenger

Messenger needed to protect user privacy when clicking links in chats while still detecting and warning users about malicious URLs, creating a tension between link safety scanning and end-to-end privacy.

security messaging-queues

5 min

Netflix ↗

AV1 — Now Powering 30% of Netflix Streaming

Delivering high-quality streaming video across diverse devices and varying network conditions requires efficient video encoding; legacy codecs like H.264 and VP9 were limiting compression efficiency, consuming more bandwidth for equivalent visual quality.

real-time-systems storage-systems

5 min

Netflix ↗

How Temporal Powers Reliable Cloud Operations at Netflix

Netflix needed reliable orchestration for business-critical cloud operations across teams like Open Connect CDN and Live reliability, but faced operational challenges as Temporal adoption grew since 2021.

distributed-systems microservices

5 min

Netflix ↗

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

Netflix needed to spin up hundreds of containers in seconds to serve streaming traffic, but after modernizing their container runtime, they hit an unexpected performance bottleneck rooted in CPU architecture that impaired container scaling efficiency.

distributed-systems real-time-systems

5 min

Netflix ↗

Netflix Live Origin

Netflix needed a custom origin server to bridge its cloud-based live streaming pipelines with its CDN (Open Connect), handling the unique challenges of live content delivery such as low-latency requirements, reliability, and the real-time nature of live streams compared to on-demand content.

real-time-systems distributed-systems

5 min

Netflix ↗

Optimizing Recommendation Systems with JDK’s Vector API

Netflix's Ranker service had a video serendipity scoring feature (computing how different a title is from a user's watch history) consuming ~7.5% of total CPU per node, creating a significant performance bottleneck at their enormous scale.

ml-systems real-time-systems

5 min