AWS

How Synthesia optimizes generative AI video inference on Amazon EC2 G7e instances

Synthesia needed to maximize GPU utilization during video inference on EC2 G7e instances by reducing idle time caused by sequential GPU compute, data transfer, and post-processing operations.

ml-systems real-time-systems
5 min
Google

A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity

Enable on-device AI models to coordinate complex tasks across external data sources while maintaining persistent user context and proactive engagement without relying solely on cloud connectivity.

api-design ml-systems
5 min
Google

A2UI v0.9: The New Standard for Portable, Framework-Agnostic Generative UI

AI agents needed a standardized way to generate UI components that work across different platforms and frameworks without being tightly coupled to any specific technology stack.

api-design real-time-systems
5 min
Google

Build Long-running AI agents that pause, resume, and never lose context with ADK

Building production-grade AI agents that can maintain context and state across long-running enterprise workflows spanning days or weeks without losing information during idle periods or server restarts.

api-design distributed-systems
5 min
Google

Building real-world on-device AI with LiteRT and NPU

Mobile developers faced performance and battery inefficiency when running AI models on CPU/GPU, limiting real-time AI applications on edge devices.

api-design ml-systems
5 min
Google

Google Tensor SDK Beta with LiteRT

Developers needed a unified way to build, deploy, and run high-performance machine learning models directly on edge devices (Google Pixel TPU) with reliable fallback mechanisms.

ml-systems api-design
5 min
Google

Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding

Autoregressive LLM decoding suffers from sequential bottlenecks where tokens must be generated one-at-a-time, limiting throughput and inference speed on hardware accelerators like TPUs.

ml-systems real-time-systems
5 min
Google Cloud

Building Event-Driven Data Agents with BigQuery, Pub/Sub, and ADK

Enterprise systems need to react to events in real-time rather than relying on slow batch jobs or inefficient polling microservices that create dangerous delays in detecting critical issues like fraud or supply chain disruptions.

real-time-systems messaging-queues
5 min
Google Cloud

Gemini Live Agent Challenge: Announcing the winners and highlights

How to enable developers to build multimodal AI agents that can process and respond to real-time audio, video, text, and generation capabilities beyond traditional text-based interfaces.

real-time-systems api-design
5 min
Google Cloud

How BASF manages thousands of supply chain decisions with AlphaEvolve’s agentic algorithms

BASF needed to manage and optimize thousands of interdependent supply chain decisions across 180 global production sites where weather and regulatory changes can cause cascading disruptions in a two-year production pipeline.

distributed-systems ml-systems
5 min
Google Cloud

Level Up Your Agents: Announcing Google's Official Skills Repository

AI agents built on Google Cloud need access to accurate, current, and grounded information about Google's products and APIs to function effectively.

api-design ml-systems
5 min
Spotify

Better Experiments with LLM Evals — A funnel, not a fork

Efficiently evaluating and validating LLM-generated outputs at scale during experimentation without manual review bottlenecks.

ml-systems observability
4 min
Meta

Migrating Data Ingestion Systems at Meta Scale

Meta needed to migrate their legacy data ingestion system to a new architecture while maintaining reliability and consistency for real-time social graph snapshots at massive scale.

distributed-systems storage-systems
5 min
Meta

Reel Friends: Building Social Discovery that Scales to Billions

Building a social discovery system that efficiently surfaces Reels watched and reacted to by friends while scaling to billions of users.

caching distributed-systems
5 min
Stripe

Five vertical SaaS insights from Sessions 2026

Vertical SaaS platforms needed to expand their service offerings beyond pure software to include integrated payments, financial services, and agentic commerce capabilities to build more defensible and durable businesses.

api-design distributed-systems
3 min
Netflix

Smarter Live Streaming at Scale: Rolling Out VBR for All Netflix Live Events

Netflix needed to optimize bandwidth utilization and video quality for live streaming events at global scale by moving from constant bitrate to variable bitrate encoding.

real-time-systems distributed-systems
5 min
Netflix

Stop Answering the Same Question Twice: Interval-Aware Caching for Druid at Netflix Scale

Query performance degradation at massive scale (10+ trillion rows, 15M events/second) where repeated identical queries were consuming excessive resources and impacting latency.

caching databases
5 min
Netflix

The Human Infrastructure: How Netflix Built the Operations Layer Behind Live at Scale

Netflix needed to build reliable operations infrastructure to support live streaming at massive scale, going from one show per month to nine shows per day with tens of millions of concurrent viewers.

microservices observability
5 min
Spotify

Inside the Archive: The Tech Behind Your 2025 Wrapped Highlights

How to identify and surface the most interesting and meaningful listening moments from a year's worth of user streaming data to create personalized narrative highlights for Wrapped.

data-pipelines ml-systems
4 min
Spotify

Our Multi-Agent Architecture for Smarter Advertising

Spotify needed to optimize ad targeting and delivery at scale by coordinating multiple specialized systems to make smarter advertising decisions rather than relying on monolithic ad selection logic.

microservices ml-systems
4 min
Stripe

How Stripe Radar helps prevent free trial abuse

Detecting and preventing fraudulent behavior in free trial signups, such as repeated trial abuse and missed cancellations, at scale with high accuracy.

ml-systems api-design
4 min
Stripe

How agents, digital wallets, and trust are rewriting checkout

Understanding and optimizing the checkout conversion funnel across diverse ecommerce businesses to identify what drives successful transactions in modern online payment flows.

api-design real-time-systems
4 min
Stripe

Insights from Shoptalk 2026: How agents are changing retail

How to integrate AI agents into ecommerce platforms to enable seamless product discovery and checkout across embedded and third-party surfaces.

api-design real-time-systems
4 min
Stripe

Three of the biggest fraud trends from MRC Vegas 2026

Detecting and preventing sophisticated fraud attacks while minimizing friction for legitimate users in payment systems.

api-design security
4 min
AWS

Modernizing KYC with AWS serverless solutions and agentic AI for financial services

Traditional rule-based KYC (Know Your Customer) systems lack the autonomous decision-making capability and real-time validation speed needed for modern financial services compliance operations.

serverless real-time-systems
5 min
AWS

Real-time analytics: Oldcastle integrates Infor with Amazon Aurora and Amazon Quick Sight

Oldcastle needed to overcome the limitations of traditional ERP reporting to enable real-time analytics and dashboards for their Infor ERP system.

databases real-time-systems
5 min
Airbnb

Building a fault-tolerant metrics storage system at Airbnb

Building a metrics storage system capable of ingesting 50 million samples per second while reliably storing 2.5 petabytes of time series data at scale.

observability storage-systems
5 min
Cloudflare

Add voice to your agent

Enabling developers to build conversational agents with real-time voice capabilities without requiring complex infrastructure setup.

real-time-systems api-design
4 min
Cloudflare

Browser Run: give your agents a browser

AI agents needed a way to interact with browsers at scale while maintaining visibility and control over automated actions, requiring higher concurrency and real-time debugging capabilities.

real-time-systems ml-systems
3 min
Cloudflare

Introducing Flagship: feature flags built for the age of AI

Third-party feature flag services introduce unacceptable latency for applications requiring sub-millisecond flag evaluation at global scale.

caching distributed-systems
4 min
AWS

Build a multi-tenant configuration system with tagged storage patterns

Building a scalable multi-tenant configuration service that maintains strict tenant isolation while supporting real-time updates without cache staleness or downtime.

caching storage-systems
5 min
Meta

Escaping the Fork: How Meta Modernized WebRTC Across 50+ Use Cases

Meta needed to modernize WebRTC across 50+ use cases while maintaining synchronization with upstream open-source development, avoiding the drift that typically occurs when large projects fork internally.

distributed-systems real-time-systems
5 min
AWS

Automate safety monitoring with computer vision and generative AI

Detecting safety hazards in real-time across hundreds of distributed operational sites using video feeds while maintaining low latency and managing the computational complexity of processing multiple camera streams.

real-time-systems distributed-systems
5 min
AWS

How Aigen transformed agricultural robotics for sustainable farming with Amazon SageMaker AI

Aigen needed to scale machine learning pipelines across hundreds of distributed edge solar robots while managing data labeling and model training challenges in agricultural robotics.

ml-systems distributed-systems
5 min
Airbnb

What COVID did to our forecasting models (and what we built to handle the next shock)

Building forecasting models that remain accurate during sudden market shocks like a global pandemic, where historical data no longer predicts future outcomes.

ml-systems observability
5 min
Cloudflare

Cloudflare Client-Side Security: smarter detection, now open to everyone

Detecting sophisticated client-side security threats like zero-day exploits while minimizing false positives in real-time across millions of requests.

security ml-systems
4 min
Meta

Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads

Meta needed to scale their ads ranking models to LLM-scale complexity and size while maintaining inference latency requirements for real-time ad serving.

ml-systems real-time-systems
5 min
LinkedIn

Engineering the next generation of LinkedIn’s Feed

LinkedIn's Feed needed to evolve to handle increasing content diversity, real-time ranking signals, and personalization at massive scale.

real-time-systems ml-systems
3 min
AWS

Build priority-based message processing with Amazon MQ and AWS App Runner

Standard message queues process messages in FIFO order, lacking the ability to prioritize urgent messages over lower-priority ones, which can cause critical tasks to wait behind less important work during high load.

messaging-queues real-time-systems
5 min
AWS

Mastering millisecond latency and millions of events: The event-driven architecture behind the Amazon Key Suite

The Amazon Key Suite had a tightly coupled monolithic architecture that struggled with reliability and scalability when processing millions of events at millisecond latency requirements across multiple service integrations.

microservices messaging-queues
5 min
Cloudflare

Always-on detections: eliminating the WAF “log versus block” trade-off

Traditional WAFs force a trade-off between logging (risking missed attacks) and blocking (risking false positives), requiring extensive manual tuning to balance security coverage with availability.

security real-time-systems
4 min
Cloudflare

Ending the "silent drop": how Dynamic Path MTU Discovery makes the Cloudflare One Client more resilient

Tunnel layering in Cloudflare's WARP/One client caused MTU mismatches, leading to silently dropped oversized packets that degraded connectivity and resilience.

distributed-systems real-time-systems
4 min
Cloudflare

Inside Gen 13: how we built our most powerful server yet

Cloudflare's existing server fleet could not keep pace with rapidly growing global traffic demands, requiring a new generation of hardware with significantly higher compute and network throughput.

distributed-systems load-balancing
4 min
Cloudflare

Launching Cloudflare’s Gen 13 servers: trading cache for cores for 2x edge compute performance

Cloudflare needed to significantly increase edge compute throughput per server but faced a tradeoff where high-core-count CPUs came with smaller per-core L3 cache, risking latency penalties for cache-dependent workloads.

distributed-systems caching
4 min
Dropbox

Inside the feature store powering real-time AI in Dropbox Dash

Dropbox Dash needs to rank and retrieve relevant context across a user's work in real time, requiring low-latency access to precomputed and real-time features for AI-driven search and recommendation models.

ml-systems real-time-systems
3 min
Meta

FFmpeg at Meta: Media Processing at Scale

Meta needed to handle massive-scale media processing (encoding, transcoding, filtering) across its family of apps, requiring efficient orchestration of complex audio/video pipelines using FFmpeg at an unprecedented scale.

storage-systems distributed-systems
5 min
Meta

Friend Bubbles: Enhancing Social Discovery on Facebook Reels

Facebook Reels needed a way to enhance social discovery by surfacing content that friends have interacted with, requiring real-time computation of relationship strength and ranking of friend-engaged content at massive scale.

ml-systems real-time-systems
5 min
Meta

How Advanced Browsing Protection Works in Messenger

Messenger needed to protect user privacy when clicking links in chats while still detecting and warning users about malicious URLs, creating a tension between link safety scanning and end-to-end privacy.

security messaging-queues
5 min
Netflix

AV1 — Now Powering 30% of Netflix Streaming

Delivering high-quality streaming video across diverse devices and varying network conditions requires efficient video encoding; legacy codecs like H.264 and VP9 were limiting compression efficiency, consuming more bandwidth for equivalent visual quality.

real-time-systems storage-systems
5 min
Netflix

How Temporal Powers Reliable Cloud Operations at Netflix

Netflix needed reliable orchestration for business-critical cloud operations across teams like Open Connect CDN and Live reliability, but faced operational challenges as Temporal adoption grew since 2021.

distributed-systems microservices
5 min
Netflix

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

Netflix needed to spin up hundreds of containers in seconds to serve streaming traffic, but after modernizing their container runtime, they hit an unexpected performance bottleneck rooted in CPU architecture that impaired container scaling efficiency.

distributed-systems real-time-systems
5 min
Netflix

Netflix Live Origin

Netflix needed a custom origin server to bridge its cloud-based live streaming pipelines with its CDN (Open Connect), handling the unique challenges of live content delivery such as low-latency requirements, reliability, and the real-time nature of live streams compared to on-demand content.

real-time-systems distributed-systems
5 min
Netflix

Optimizing Recommendation Systems with JDK’s Vector API

Netflix's Ranker service had a video serendipity scoring feature (computing how different a title is from a user's watch history) consuming ~7.5% of total CPU per node, creating a significant performance bottleneck at their enormous scale.

ml-systems real-time-systems
5 min