Netflix

Optimizing Recommendation Systems with JDK’s Vector API

Netflix's Ranker service had a video serendipity scoring feature (computing how different a title is from a user's watch history) consuming ~7.5% of total CPU per node, creating a significant performance bottleneck at their enormous scale.

ml-systems real-time-systems
5 min
Netflix

Scaling Global Storytelling: Modernizing Localization Analytics at Netflix

Netflix's localization analytics infrastructure (tracking dubbing, subtitling, and translation across hundreds of languages and regions) could not keep pace with the rapidly growing scale of global content, making it difficult to derive timely insights for content localization decisions.

databases distributed-systems
5 min
Netflix

MediaFM: The Multimodal AI Foundation for Media Understanding at Netflix

Netflix needed scalable, deep machine-level understanding of every piece of content across an expanding catalog (including live events and podcasts) to power recommendations and discovery, but building separate models per content type and modality doesn't scale.

ml-systems microservices
5 min
Netflix

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

Netflix needed to spin up hundreds of containers in seconds to serve streaming traffic, but after modernizing their container runtime, they hit an unexpected performance bottleneck rooted in CPU architecture that impaired container scaling efficiency.

distributed-systems real-time-systems
5 min
Netflix

Automating RDS Postgres to Aurora Postgres Migration

Netflix's relational database ecosystem lacked standardization, with databases spread across RDS Postgres and other technologies, leading to inconsistent functionality, suboptimal performance, and higher total cost of ownership.

databases distributed-systems
5 min
Netflix

Scaling LLM Post-Training at Netflix

Generic pre-trained LLMs lack the domain-specific alignment needed for Netflix's production use cases in recommendation, personalization, and search, and the post-training pipeline to fine-tune them doesn't scale efficiently across multiple domain constraints and reliability requirements.

ml-systems distributed-systems
5 min
Netflix

The AI Evolution of Graph Search at Netflix

Netflix's Graph Search platform for federated enterprise data required users to write structured queries, limiting accessibility and ease of use despite the system being scalable and configurable.

search ml-systems
5 min
Netflix

How Temporal Powers Reliable Cloud Operations at Netflix

Netflix needed reliable orchestration for business-critical cloud operations across teams like Open Connect CDN and Live reliability, but faced operational challenges as Temporal adoption grew since 2021.

distributed-systems microservices
5 min
Netflix

Netflix Live Origin

Netflix needed a custom origin server to bridge its cloud-based live streaming pipelines with its CDN (Open Connect), handling the unique challenges of live content delivery such as low-latency requirements, reliability, and the real-time nature of live streams compared to on-demand content.

real-time-systems distributed-systems
5 min
Netflix

AV1 — Now Powering 30% of Netflix Streaming

Delivering high-quality streaming video across diverse devices and varying network conditions requires efficient video encoding; legacy codecs like H.264 and VP9 were limiting compression efficiency, consuming more bandwidth for equivalent visual quality.

real-time-systems storage-systems
5 min