AWS

Architecting for agentic AI development on AWS

AI agents struggle to iterate rapidly on system design and codebases due to architectural patterns that limit their ability to understand, modify, and validate applications effectively.

microservices serverless
5 min
Airbnb

What COVID did to our forecasting models (and what we built to handle the next shock)

Building forecasting models that remain accurate during sudden market shocks like a global pandemic, where historical data no longer predicts future outcomes.

ml-systems observability
5 min
Cloudflare

Sandboxing AI agents, 100x faster

How to safely execute untrusted AI-generated code with minimal latency and resource overhead.

security edge-computing
4 min
AWS

AI-powered event response for Amazon EKS

Responding to operational events in Amazon EKS clusters is often manual, slow, and requires deep expertise, making it difficult to handle incidents at scale across complex Kubernetes environments.

observability ml-systems
3 min
Cloudflare

Powering the agents: Workers AI now runs large models, starting with Kimi K2.5

Running large AI models for agent workloads on edge infrastructure was cost-prohibitive and required significant inference stack optimization to serve models like Kimi K2.5 efficiently at scale.

ml-systems distributed-systems
4 min
Dropbox

How we optimized Dash's relevance judge with DSPy

Manual prompt engineering for Dropbox Dash's relevance judge was unreliable, hard to measure, and costly—making it difficult to systematically improve task performance in production.

ml-systems search
3 min
Meta

Friend Bubbles: Enhancing Social Discovery on Facebook Reels

Facebook Reels needed a way to enhance social discovery by surfacing content that friends have interacted with, requiring real-time computation of relationship strength and ranking of friend-engaged content at massive scale.

ml-systems real-time-systems
5 min
Meta

Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation

Meta's ads ranking ML experimentation lifecycle required extensive manual intervention from engineers for hypothesis generation, training job launches, failure debugging, and result iteration, slowing down the pace of ranking model innovation.

ml-systems microservices
5 min
Airbnb

Recommending Travel Destinations to Help Users Explore

Airbnb users in the early trip planning stage often lack a clear travel destination, making it difficult to provide relevant recommendations and convert exploratory browsing into bookings.

ml-systems search
5 min
Cloudflare

AI Security for Apps is now generally available

Organizations struggle to discover and secure AI-powered applications across their infrastructure, especially shadow AI deployments that teams spin up without central oversight, creating security blind spots.

security api-design
4 min
Cloudflare

Slashing agent token costs by 98% with RFC 9457-compliant error responses

AI agents hitting Cloudflare error pages received heavyweight HTML responses that consumed excessive tokens and required brittle parsing, making automated error handling inefficient and costly.

api-design ml-systems
4 min
Meta

Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps

Updating security-related APIs across millions of lines of code and thousands of engineers is extremely difficult at scale, especially when a single class of mobile vulnerability can be replicated across hundreds of locations in an Android codebase.

security ml-systems
5 min
Netflix

Optimizing Recommendation Systems with JDK’s Vector API

Netflix's Ranker service had a video serendipity scoring feature (computing how different a title is from a user's watch history) consuming ~7.5% of total CPU per node, creating a significant performance bottleneck at their enormous scale.

ml-systems real-time-systems
5 min
Airbnb

Academic Publications & Airbnb Tech: 2025 Year in Review

Airbnb needed to advance its AI, data science, and machine learning capabilities across multiple domains (NLP, optimization, measurement science) to improve its travel and living platform, requiring solutions to challenges in search ranking, recommendation, experimentation, and large-scale data processing.

ml-systems search
5 min
Dropbox

Using LLMs to amplify human labeling and improve Dash search relevance

Dash's search ranking models required large volumes of high-quality labeled relevance data to train effectively, but human labeling alone was too slow and expensive to scale to the needed coverage.

search ml-systems
3 min
Meta

RCCLX: Innovating GPU Communications on AMD Platforms

GPU-to-GPU communication performance on AMD platforms was insufficient for Meta's evolving AI model training workloads, and the standard RCCL library didn't meet the performance and flexibility requirements of their internal workloads.

distributed-systems ml-systems
5 min
Netflix

MediaFM: The Multimodal AI Foundation for Media Understanding at Netflix

Netflix needed scalable, deep machine-level understanding of every piece of content across an expanding catalog (including live events and podcasts) to power recommendations and discovery, but building separate models per content type and modality doesn't scale.

ml-systems microservices
5 min
Dropbox

How low-bit inference enables efficient AI

Running AI inference for products like Dropbox Dash at scale is expensive and resource-intensive, requiring efficient use of compute and memory to make the product accessible to a broad user base.

ml-systems storage-systems
3 min
Dropbox

Insights from our executive roundtable on AI and engineering productivity

Engineering organizations face open questions about how to effectively integrate AI coding tools (like Claude Code and Cursor) into developer workflows and where these tools can have the most measurable impact on productivity.

ml-systems microservices
4 min
Meta

Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters

Connecting thousands of GPUs across multiple data centers and regions for gigawatt-scale AI training clusters requires seamlessly bridging different network fabrics, which creates massive networking and interconnect challenges.

distributed-systems ml-systems
5 min
Meta

The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It

Agentic (AI-driven) software development produces and ships code so fast that traditional testing frameworks cannot keep pace, leaving bugs uncaught as they land in rapidly evolving codebases.

ml-systems observability
5 min
Netflix

Scaling LLM Post-Training at Netflix

Generic pre-trained LLMs lack the domain-specific alignment needed for Netflix's production use cases in recommendation, personalization, and search, and the post-training pipeline to fine-tune them doesn't scale efficiently across multiple domain constraints and reliability requirements.

ml-systems distributed-systems
5 min
AWS

How Artera enhances prostate cancer diagnostics using AWS

Artera needed to develop and scale an AI-powered prostate cancer diagnostic test, requiring significant compute resources for model training/inference and a reliable pipeline to deliver timely, personalized treatment recommendations.

ml-systems storage-systems
4 min
Airbnb

My Journey to Airbnb: Peter Coles

Airbnb needed to build robust data science and economic modeling capabilities to understand and optimize their two-sided marketplace dynamics for policy and business decisions.

ml-systems
5 min
Dropbox

Engineering VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash

Enterprise search and AI assistant products like Dropbox Dash need to connect disparate data sources and optimize AI-driven retrieval, but naively querying across siloed data with LLMs leads to poor relevance and brittle prompt engineering.

search ml-systems
3 min
Netflix

The AI Evolution of Graph Search at Netflix

Netflix's Graph Search platform for federated enterprise data required users to write structured queries, limiting accessibility and ease of use despite the system being scalable and configurable.

search ml-systems
5 min
Dropbox

Inside the feature store powering real-time AI in Dropbox Dash

Dropbox Dash needs to rank and retrieve relevant context across a user's work in real time, requiring low-latency access to precomputed and real-time features for AI-driven search and recommendation models.

ml-systems real-time-systems
3 min
AWS

Architecting conversational observability for cloud applications

Diagnosing and resolving issues in complex Kubernetes clusters is slow and requires expert knowledge, leading to high Mean Time to Recovery (MTTR) and heavy reliance on specialized engineers for root cause analysis.

observability ml-systems
4 min
AWS

Announcing the updated AWS Well-Architected Generative AI Lens

Organizations building generative AI workloads on AWS lacked comprehensive architectural guidance covering responsible AI, data architecture, and emerging patterns like agentic workflows, leading to poorly architected AI systems.

ml-systems api-design
4 min
AWS

Announcing the updated AWS Well-Architected Machine Learning Lens

Organizations building ML workloads on AWS lacked up-to-date architectural guidance that incorporates the latest services, capabilities, and best practices, leading to sub-optimal ML system designs across reliability, performance, cost, and operational dimensions.

ml-systems
3 min
AWS

Architecting for AI excellence: AWS launches three Well-Architected Lenses at re:Invent 2025

Organizations deploying AI/ML workloads on AWS lacked comprehensive architectural guidance for building responsible, well-architected machine learning and generative AI systems at scale.

ml-systems
5 min
AWS

Building an AI gateway to Amazon Bedrock with Amazon API Gateway

Enterprises adopting Amazon Bedrock need centralized governance over AI model access, including authorization controls, usage quotas, and auditing, but lack a standardized gateway pattern to enforce these policies at scale.

api-design rate-limiting
4 min
Dropbox

How Dash uses context engineering for smarter AI

Dropbox Dash's AI agent struggled with effectiveness when naively providing all available context to the model, leading to degraded performance as irrelevant information diluted the signal needed for accurate, agentic AI responses.

ml-systems search
3 min
Airbnb

GraphQL Data Mocking at Scale with LLMs and @generateMock

Producing valid and realistic mock data for GraphQL testing and prototyping is tedious to write and maintain; existing approaches like random value generation and field-level stubbing lack domain context, resulting in unconvincing and brittle test data that doesn't scale across a large schema.

api-design ml-systems
5 min
Dropbox

Half-Quadratic Quantization of large machine learning models

Large machine learning models require significant memory and compute resources, making deployment and inference expensive and slow, especially in resource-constrained environments.

ml-systems storage-systems
3 min
Dropbox

With Mobius Labs' Aana models, we're bringing deeper multimodal understanding to Dropbox Dash

Dropbox Dash needed deeper understanding of multimodal content (photos and videos) across user files, but processing diverse media types at Dropbox's scale posed efficiency and architectural challenges.

ml-systems search
3 min