AWS

How ALS GeoAnalytics LITHOLENS ™ revolutionizes core logging through machine learning with Amazon EKS

ALS GeoAnalytics needed to scale machine learning model training and inference for core logging analysis while managing computational costs effectively.

distributed-systems ml-systems
3 min
AWS

How Synthesia optimizes generative AI video inference on Amazon EC2 G7e instances

Synthesia needed to maximize GPU utilization during video inference on EC2 G7e instances by reducing idle time caused by sequential GPU compute, data transfer, and post-processing operations.

ml-systems real-time-systems
5 min
Cloudflare

Project Glasswing: what Mythos showed us

Determining whether security-focused LLMs can effectively identify vulnerabilities in live production infrastructure code at scale.

security ml-systems
4 min
Dropbox

Introducing Nova, our internal platform for coding agents

Enabling engineers to run multiple concurrent coding sessions and integrating AI agents into automated internal workflows at scale.

microservices api-design
3 min
Google

A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity

Enable on-device AI models to coordinate complex tasks across external data sources while maintaining persistent user context and proactive engagement without relying solely on cloud connectivity.

api-design ml-systems
5 min
Google

A2UI v0.9: The New Standard for Portable, Framework-Agnostic Generative UI

AI agents needed a standardized way to generate UI components that work across different platforms and frameworks without being tightly coupled to any specific technology stack.

api-design real-time-systems
5 min
Google

Accelerating on-device AI: A look at Arm and Google AI Edge optimization

Enabling efficient execution of generative AI models on edge devices with limited computational resources while maintaining acceptable latency and performance.

ml-systems api-design
5 min
Google

Announcing ADK for Kotlin and ADK for Android 0.1.0: Building AI Agents on Android and Beyond

Developers needed a way to build AI agent workflows that could run on Android devices and backend systems without reinventing the core agentic logic across different platforms.

api-design sdks
3 min
Google

Announcing Genkit Middleware: Intercept, extend, and harden your agentic apps

Developers need a way to reliably control, monitor, and extend AI model generation calls in production agentic applications without modifying core business logic.

api-design ml-systems
5 min
Google

Blazing fast on-device GenAI with LiteRT-LM

Running large language models efficiently on mobile and edge devices while preserving multimodal and agentic capabilities without requiring server-side inference.

ml-systems mobile-platforms
5 min
Google

Build Long-running AI agents that pause, resume, and never lose context with ADK

Building production-grade AI agents that can maintain context and state across long-running enterprise workflows spanning days or weeks without losing information during idle periods or server restarts.

api-design distributed-systems
5 min
Google

Building real-world on-device AI with LiteRT and NPU

Mobile developers faced performance and battery inefficiency when running AI models on CPU/GPU, limiting real-time AI applications on edge devices.

api-design ml-systems
5 min
Google

Building with Gemini Embedding 2: Agentic multimodal RAG and beyond

Developers needed a unified embedding model capable of processing interleaved multimodal inputs (text, images, video, audio, documents) in a single semantic space for tasks like retrieval-augmented generation and visual search.

api-design ml-systems
5 min
Google

Empowering Service Providers and Hardware Partners with Gemini for Home

How can Google enable third-party service providers and hardware manufacturers to build intelligent smart home experiences without requiring deep AI/ML expertise or significant R&D investment?

api-design ml-systems
5 min
Google

Google Tensor SDK Beta with LiteRT

Developers needed a unified way to build, deploy, and run high-performance machine learning models directly on edge devices (Google Pixel TPU) with reliable fallback mechanisms.

ml-systems api-design
5 min
Google

MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

Enabling efficient post-training of large language models on single-host TPU configurations without requiring complex multi-host distributed setups.

ml-systems distributed-systems
5 min
Google

One Year of Innovation: Celebrating 100k Members in the Google Cloud x NVIDIA Developer Community

Developers needed accessible infrastructure, resources, and structured learning pathways to effectively build and optimize AI applications using GPUs and large language models at scale.

api-design ml-systems
5 min
Google

Production-Ready AI Agents: 5 Lessons from Refactoring a Monolith

Converting a brittle, monolithic sales research AI prototype into a production-ready agent that eliminates silent failures, fragile parsing, and lacks observability.

microservices observability
5 min
Google

Speeding Up AI: Bringing Google Colossus to PyTorch via GCSFS and Rapid Bucket

AI training pipelines were bottlenecked by slow data I/O when accessing training datasets stored in Google Cloud, limiting throughput and increasing total training time.

storage-systems ml-systems
5 min
Google

Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding

Autoregressive LLM decoding suffers from sequential bottlenecks where tokens must be generated one-at-a-time, limiting throughput and inference speed on hardware accelerators like TPUs.

ml-systems real-time-systems
5 min
Google Cloud

Agent Factory Recap: How Gemma 4 Taught Itself Physics

How to deploy high-intelligence AI models with agentic capabilities to consumer hardware and mobile devices without requiring cloud infrastructure.

ml-systems distributed-systems
5 min
Google Cloud

Building Event-Driven Data Agents with BigQuery, Pub/Sub, and ADK

Enterprise systems need to react to events in real-time rather than relying on slow batch jobs or inefficient polling microservices that create dangerous delays in detecting critical issues like fraud or supply chain disruptions.

real-time-systems messaging-queues
5 min
Google Cloud

Cloud Engineer’s AI Toolkit: Sign up Now for a Developer Workshop Near You!

Organizations need to securely build, deploy, and govern autonomous AI agents at enterprise scale as the industry transitions from experimental LLMs to production agentic AI systems.

ml-systems security
5 min
Google Cloud

Create Expert Content: Deploying a Multi-Agent System with Terraform and Cloud Run

Automating the transformation of raw community signals into reliable technical guidance at scale using multiple specialized agents.

microservices api-design
5 min
Google Cloud

Five must-have guides to move agents into production with Gemini Enterprise Agent Platform

Deploying and managing AI agents at scale in production requires infrastructure for state management, security governance, and complex workflow orchestration that goes beyond demo implementations.

distributed-systems security
5 min
Google Cloud

Gemini Live Agent Challenge: Announcing the winners and highlights

How to enable developers to build multimodal AI agents that can process and respond to real-time audio, video, text, and generation capabilities beyond traditional text-based interfaces.

real-time-systems api-design
5 min
Google Cloud

How BASF manages thousands of supply chain decisions with AlphaEvolve’s agentic algorithms

BASF needed to manage and optimize thousands of interdependent supply chain decisions across 180 global production sites where weather and regulatory changes can cause cascading disruptions in a two-year production pipeline.

distributed-systems ml-systems
5 min
Google Cloud

Introducing Gemini Enterprise Agent Platform, powering the next wave of agents

Building safe, reliable, and autonomous agents that can act independently across multiple enterprise systems while maintaining security, governance, and reliability guardrails.

ml-systems security
5 min
Google Cloud

Level Up Your Agents: Announcing Google's Official Skills Repository

AI agents built on Google Cloud need access to accurate, current, and grounded information about Google's products and APIs to function effectively.

api-design ml-systems
5 min
Google Cloud

Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX

Google needed to accelerate large-scale codebase migrations (TensorFlow to JAX) that are too complex and interconnected for manual developer effort or standard AI coding tools to handle efficiently.

ml-systems general
5 min
Google Cloud

What Google I/O '26 means for developing agents on Google Cloud

Developers needed a unified, secure way to build AI agents locally and deploy them to Google Cloud with standardized protocols and tooling.

api-design microservices
5 min
Spotify

Better Experiments with LLM Evals — A funnel, not a fork

Efficiently evaluating and validating LLM-generated outputs at scale during experimentation without manual review bottlenecks.

ml-systems observability
4 min
Netflix

Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph

Netflix needed to manage the lifecycle of machine learning models across multiple domains and teams at scale, moving beyond their original single-domain personalization focus.

ml-systems microservices
5 min
Netflix

Evaluating Netflix Show Synopses with LLM-as-a-Judge

Netflix needed to automatically evaluate the quality and relevance of show synopses at scale to improve member discovery and engagement.

ml-systems api-design
5 min
Netflix

Powering Multimodal Intelligence for Video Search

Netflix needed to efficiently extract and surface key moments from hundreds or thousands of hours of raw video footage for editorial teams to accelerate the creative content production process.

ml-systems search
5 min
Netflix

State of Routing in Model Serving

Netflix needed to design a domain-independent traffic routing system for their ML model serving infrastructure that could handle personalized experiences at scale across multiple domains while maintaining high availability.

microservices load-balancing
5 min
Spotify

Inside the Archive: The Tech Behind Your 2025 Wrapped Highlights

How to identify and surface the most interesting and meaningful listening moments from a year's worth of user streaming data to create personalized narrative highlights for Wrapped.

data-pipelines ml-systems
4 min
Spotify

Our Multi-Agent Architecture for Smarter Advertising

Spotify needed to optimize ad targeting and delivery at scale by coordinating multiple specialized systems to make smarter advertising decisions rather than relying on monolithic ad selection logic.

microservices ml-systems
4 min
Stripe

Analyzing first-party fraud trends: Account, free trial, and refund abuse

Detecting and preventing first-party fraud at scale across a payment network where legitimate users abuse policies through multiple accounts, free trial cycling, and refund exploitation.

ml-systems security
4 min
Stripe

How Stripe Radar helps prevent free trial abuse

Detecting and preventing fraudulent behavior in free trial signups, such as repeated trial abuse and missed cancellations, at scale with high accuracy.

ml-systems api-design
4 min
Stripe

Three of the biggest fraud trends from MRC Vegas 2026

Detecting and preventing sophisticated fraud attacks while minimizing friction for legitimate users in payment systems.

api-design security
4 min
AWS

Modernizing KYC with AWS serverless solutions and agentic AI for financial services

Traditional rule-based KYC (Know Your Customer) systems lack the autonomous decision-making capability and real-time validation speed needed for modern financial services compliance operations.

serverless real-time-systems
5 min
Cloudflare

Orchestrating AI Code Review at scale

Cloudflare needed to scale code review processes across their engineering organization while maintaining code quality and security standards without overwhelming human reviewers.

ml-systems api-design
3 min
Cloudflare

The AI engineering stack we built internally — on the platform we ship

Cloudflare needed to build an internal AI engineering stack that could handle massive scale (20 million requests, 241 billion tokens) while dogfooding their own platform products.

api-design ml-systems
4 min
Meta

Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge

Facebook Groups Search was unreliable at helping users discover and validate community content most relevant to their search queries.

search ml-systems
5 min
Cloudflare

AI Search: the search primitive for your agents

Providing a scalable, efficient search infrastructure that allows AI agents to dynamically create search instances and perform semantic queries across uploaded documents without managing underlying indexing complexity.

search ml-systems
4 min
Cloudflare

Agents that remember: introducing Agent Memory

AI agents lack persistent memory mechanisms to retain context, learn from interactions, and improve decision-making over time.

storage-systems ml-systems
3 min
Cloudflare

Browser Run: give your agents a browser

AI agents needed a way to interact with browsers at scale while maintaining visibility and control over automated actions, requiring higher concurrency and real-time debugging capabilities.

real-time-systems ml-systems
3 min
Cloudflare

Building the foundation for running extra-large language models

How to efficiently run inference for extra-large language models on edge infrastructure while maintaining low latency and high throughput across distributed Cloudflare servers.

ml-systems distributed-systems
4 min
Cloudflare

Cloudflare’s AI Platform: an inference layer designed for agents

Developers needed a unified way to access multiple AI model providers without managing separate integrations and API contracts for each one.

api-design microservices
4 min
Cloudflare

Project Think: building the next generation of AI agents on Cloudflare

Building a scalable platform for deploying AI agents at the edge that can think, act, and persist state across distributed Cloudflare infrastructure.

distributed-systems ml-systems
3 min
Cloudflare

Unweight: how we compressed an LLM 22% without sacrificing quality

GPU memory bandwidth constraints were limiting LLM inference efficiency across Cloudflare's distributed edge network, requiring optimization to deliver faster and cheaper inference.

ml-systems distributed-systems
4 min
Meta

Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale

Meta needed to automatically identify and remediate performance inefficiencies across their massive infrastructure to reduce power consumption and free up engineering capacity.

observability distributed-systems
5 min
AWS

Unlock efficient model deployment: Simplified Inference Operator setup on Amazon SageMaker HyperPod

Simplifying the deployment and scheduling of machine learning inference workloads across multiple instances and instance types on Amazon SageMaker HyperPod.

ml-systems distributed-systems
4 min
Meta

How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines

AI coding assistants were ineffective at making useful edits in large-scale data pipelines because they lacked sufficient understanding of complex, multi-repository codebases spanning multiple languages and thousands of files.

distributed-systems ml-systems
5 min
AWS

Architecting for agentic AI development on AWS

AI agents struggle to iterate rapidly on system design and codebases due to architectural patterns that limit their ability to understand, modify, and validate applications effectively.

microservices serverless
5 min
AWS

Automate safety monitoring with computer vision and generative AI

Detecting safety hazards in real-time across hundreds of distributed operational sites using video feeds while maintaining low latency and managing the computational complexity of processing multiple camera streams.

real-time-systems distributed-systems
5 min
AWS

How Aigen transformed agricultural robotics for sustainable farming with Amazon SageMaker AI

Aigen needed to scale machine learning pipelines across hundreds of distributed edge solar robots while managing data labeling and model training challenges in agricultural robotics.

ml-systems distributed-systems
5 min
Airbnb

What COVID did to our forecasting models (and what we built to handle the next shock)

Building forecasting models that remain accurate during sudden market shocks like a global pandemic, where historical data no longer predicts future outcomes.

ml-systems observability
5 min
Cloudflare

Cloudflare Client-Side Security: smarter detection, now open to everyone

Detecting sophisticated client-side security threats like zero-day exploits while minimizing false positives in real-time across millions of requests.

security ml-systems
4 min
Cloudflare

Sandboxing AI agents, 100x faster

How to safely execute untrusted AI-generated code with minimal latency and resource overhead.

security edge-computing
4 min
Meta

AI for American-Produced Cement and Concrete

Designing high-quality, sustainable concrete mixes that are produced in the United States while optimizing for performance characteristics.

ml-systems general
5 min
Meta

KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure

Meta needed to automatically optimize low-level infrastructure and kernel-level parameters for AI ranking models to improve performance without manual tuning.

ml-systems distributed-systems
5 min
Meta

Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads

Meta needed to scale their ads ranking models to LLM-scale complexity and size while maintaining inference latency requirements for real-time ad serving.

ml-systems real-time-systems
5 min
LinkedIn

AI Helping Build Better AI: How Agents Accelerate Model Experi...

Training and evaluating AI models is resource-intensive, requiring significant human effort to generate quality training data and assess model outputs.

ml-systems distributed-systems
3 min
LinkedIn

Announcing Our LinkedIn-Cornell 2024 Grant Recipients

Advancing AI research requires collaboration between industry and academia, but funding and partnership models need structured programs.

ml-systems general
3 min
LinkedIn

Career stories: The math-music connection in data science

Data science teams need diverse skill sets that blend mathematical rigor with creative problem-solving to build effective ML systems.

ml-systems general
3 min
LinkedIn

Engineering the next generation of LinkedIn’s Feed

LinkedIn's Feed needed to evolve to handle increasing content diversity, real-time ranking signals, and personalization at massive scale.

real-time-systems ml-systems
3 min
LinkedIn

Scaling LLM-Based ranking systems with SGLang at LinkedIn

LinkedIn's LLM-based ranking systems faced latency and throughput challenges when serving personalized results at scale.

ml-systems distributed-systems
3 min
LinkedIn

The LinkedIn Generative AI Application Tech Stack: Personaliza...

Building personalized generative AI features at LinkedIn's scale required a robust and reliable application infrastructure that could serve millions of users.

ml-systems microservices
3 min
AWS

AI-powered event response for Amazon EKS

Responding to operational events in Amazon EKS clusters is often manual, slow, and requires deep expertise, making it difficult to handle incidents at scale across complex Kubernetes environments.

observability ml-systems
3 min
AWS

Announcing the updated AWS Well-Architected Generative AI Lens

Organizations building generative AI workloads on AWS lacked comprehensive architectural guidance covering responsible AI, data architecture, and emerging patterns like agentic workflows, leading to poorly architected AI systems.

ml-systems api-design
4 min
AWS

Announcing the updated AWS Well-Architected Machine Learning Lens

Organizations building ML workloads on AWS lacked up-to-date architectural guidance that incorporates the latest services, capabilities, and best practices, leading to sub-optimal ML system designs across reliability, performance, cost, and operational dimensions.

ml-systems
3 min
AWS

Architecting conversational observability for cloud applications

Diagnosing and resolving issues in complex Kubernetes clusters is slow and requires expert knowledge, leading to high Mean Time to Recovery (MTTR) and heavy reliance on specialized engineers for root cause analysis.

observability ml-systems
4 min
AWS

Architecting for AI excellence: AWS launches three Well-Architected Lenses at re:Invent 2025

Organizations deploying AI/ML workloads on AWS lacked comprehensive architectural guidance for building responsible, well-architected machine learning and generative AI systems at scale.

ml-systems
5 min
AWS

Building an AI gateway to Amazon Bedrock with Amazon API Gateway

Enterprises adopting Amazon Bedrock need centralized governance over AI model access, including authorization controls, usage quotas, and auditing, but lack a standardized gateway pattern to enforce these policies at scale.

api-design rate-limiting
4 min
AWS

How Artera enhances prostate cancer diagnostics using AWS

Artera needed to develop and scale an AI-powered prostate cancer diagnostic test, requiring significant compute resources for model training/inference and a reliable pipeline to deliver timely, personalized treatment recommendations.

ml-systems storage-systems
4 min
Airbnb

Academic Publications & Airbnb Tech: 2025 Year in Review

Airbnb needed to advance its AI, data science, and machine learning capabilities across multiple domains (NLP, optimization, measurement science) to improve its travel and living platform, requiring solutions to challenges in search ranking, recommendation, experimentation, and large-scale data processing.

ml-systems search
5 min
Airbnb

GraphQL Data Mocking at Scale with LLMs and @generateMock

Producing valid and realistic mock data for GraphQL testing and prototyping is tedious to write and maintain; existing approaches like random value generation and field-level stubbing lack domain context, resulting in unconvincing and brittle test data that doesn't scale across a large schema.

api-design ml-systems
5 min
Airbnb

My Journey to Airbnb: Peter Coles

Airbnb needed to build robust data science and economic modeling capabilities to understand and optimize their two-sided marketplace dynamics for policy and business decisions.

ml-systems
5 min
Airbnb

Recommending Travel Destinations to Help Users Explore

Airbnb users in the early trip planning stage often lack a clear travel destination, making it difficult to provide relevant recommendations and convert exploratory browsing into bookings.

ml-systems search
5 min
Cloudflare

AI Security for Apps is now generally available

Organizations struggle to discover and secure AI-powered applications across their infrastructure, especially shadow AI deployments that teams spin up without central oversight, creating security blind spots.

security api-design
4 min
Cloudflare

Powering the agents: Workers AI now runs large models, starting with Kimi K2.5

Running large AI models for agent workloads on edge infrastructure was cost-prohibitive and required significant inference stack optimization to serve models like Kimi K2.5 efficiently at scale.

ml-systems distributed-systems
4 min
Cloudflare

Slashing agent token costs by 98% with RFC 9457-compliant error responses

AI agents hitting Cloudflare error pages received heavyweight HTML responses that consumed excessive tokens and required brittle parsing, making automated error handling inefficient and costly.

api-design ml-systems
4 min
Dropbox

Engineering VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash

Enterprise search and AI assistant products like Dropbox Dash need to connect disparate data sources and optimize AI-driven retrieval, but naively querying across siloed data with LLMs leads to poor relevance and brittle prompt engineering.

search ml-systems
3 min
Dropbox

Half-Quadratic Quantization of large machine learning models

Large machine learning models require significant memory and compute resources, making deployment and inference expensive and slow, especially in resource-constrained environments.

ml-systems storage-systems
3 min
Dropbox

How Dash uses context engineering for smarter AI

Dropbox Dash's AI agent struggled with effectiveness when naively providing all available context to the model, leading to degraded performance as irrelevant information diluted the signal needed for accurate, agentic AI responses.

ml-systems search
3 min
Dropbox

How low-bit inference enables efficient AI

Running AI inference for products like Dropbox Dash at scale is expensive and resource-intensive, requiring efficient use of compute and memory to make the product accessible to a broad user base.

ml-systems storage-systems
3 min
Dropbox

How we optimized Dash's relevance judge with DSPy

Manual prompt engineering for Dropbox Dash's relevance judge was unreliable, hard to measure, and costly—making it difficult to systematically improve task performance in production.

ml-systems search
3 min
Dropbox

Inside the feature store powering real-time AI in Dropbox Dash

Dropbox Dash needs to rank and retrieve relevant context across a user's work in real time, requiring low-latency access to precomputed and real-time features for AI-driven search and recommendation models.

ml-systems real-time-systems
3 min
Dropbox

Insights from our executive roundtable on AI and engineering productivity

Engineering organizations face open questions about how to effectively integrate AI coding tools (like Claude Code and Cursor) into developer workflows and where these tools can have the most measurable impact on productivity.

ml-systems microservices
4 min
Dropbox

Using LLMs to amplify human labeling and improve Dash search relevance

Dash's search ranking models required large volumes of high-quality labeled relevance data to train effectively, but human labeling alone was too slow and expensive to scale to the needed coverage.

search ml-systems
3 min
Dropbox

With Mobius Labs' Aana models, we're bringing deeper multimodal understanding to Dropbox Dash

Dropbox Dash needed deeper understanding of multimodal content (photos and videos) across user files, but processing diverse media types at Dropbox's scale posed efficiency and architectural challenges.

ml-systems search
3 min
Meta

Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters

Connecting thousands of GPUs across multiple data centers and regions for gigawatt-scale AI training clusters requires seamlessly bridging different network fabrics, which creates massive networking and interconnect challenges.

distributed-systems ml-systems
5 min
Meta

Friend Bubbles: Enhancing Social Discovery on Facebook Reels

Facebook Reels needed a way to enhance social discovery by surfacing content that friends have interacted with, requiring real-time computation of relationship strength and ranking of friend-engaged content at massive scale.

ml-systems real-time-systems
5 min
Meta

Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps

Updating security-related APIs across millions of lines of code and thousands of engineers is extremely difficult at scale, especially when a single class of mobile vulnerability can be replicated across hundreds of locations in an Android codebase.

security ml-systems
5 min
Meta

RCCLX: Innovating GPU Communications on AMD Platforms

GPU-to-GPU communication performance on AMD platforms was insufficient for Meta's evolving AI model training workloads, and the standard RCCL library didn't meet the performance and flexibility requirements of their internal workloads.

distributed-systems ml-systems
5 min
Meta

Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation

Meta's ads ranking ML experimentation lifecycle required extensive manual intervention from engineers for hypothesis generation, training job launches, failure debugging, and result iteration, slowing down the pace of ranking model innovation.

ml-systems microservices
5 min
Meta

The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It

Agentic (AI-driven) software development produces and ships code so fast that traditional testing frameworks cannot keep pace, leaving bugs uncaught as they land in rapidly evolving codebases.

ml-systems observability
5 min
Netflix

MediaFM: The Multimodal AI Foundation for Media Understanding at Netflix

Netflix needed scalable, deep machine-level understanding of every piece of content across an expanding catalog (including live events and podcasts) to power recommendations and discovery, but building separate models per content type and modality doesn't scale.

ml-systems microservices
5 min
Netflix

Optimizing Recommendation Systems with JDK’s Vector API

Netflix's Ranker service had a video serendipity scoring feature (computing how different a title is from a user's watch history) consuming ~7.5% of total CPU per node, creating a significant performance bottleneck at their enormous scale.

ml-systems real-time-systems
5 min
Netflix

Scaling LLM Post-Training at Netflix

Generic pre-trained LLMs lack the domain-specific alignment needed for Netflix's production use cases in recommendation, personalization, and search, and the post-training pipeline to fine-tune them doesn't scale efficiently across multiple domain constraints and reliability requirements.

ml-systems distributed-systems
5 min
Netflix

The AI Evolution of Graph Search at Netflix

Netflix's Graph Search platform for federated enterprise data required users to write structured queries, limiting accessibility and ease of use despite the system being scalable and configurable.

search ml-systems
5 min