Building a high-volume metrics pipeline with OpenTelemetry and vmagent
Migrating a large-scale metrics pipeline from StatsD to OpenTelemetry while handling production traffic volumes without losing data or blocking dependent systems.
How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines
AI coding assistants were ineffective at making useful edits in large-scale data pipelines because they lacked sufficient understanding of complex, multi-repository codebases spanning multiple languages and thousands of files.
Trust But Canary: Configuration Safety at Scale
Safely deploying configuration changes at scale while minimizing the risk of widespread failures caused by faulty configurations.
Automate safety monitoring with computer vision and generative AI
Detecting safety hazards in real-time across hundreds of distributed operational sites using video feeds while maintaining low latency and managing the computational complexity of processing multiple camera streams.
Cloudflare Client-Side Security: smarter detection, now open to everyone
Detecting sophisticated client-side security threats like zero-day exploits while minimizing false positives in real-time across millions of requests.
Our ongoing commitment to privacy for the 1.1.1.1 public DNS resolver
How to design a public DNS resolver that prioritizes user privacy while maintaining performance and trustworthiness at scale.
Improving storage efficiency in Magic Pocket, our immutable blob store
Dropbox needed to improve storage efficiency and resilience in Magic Pocket, their immutable blob store, when handling variable and changing workloads.
Introducing Northguard and Xinfra: scalable log storage at LinkedIn
LinkedIn's logging infrastructure couldn't scale cost-effectively to handle the massive volume of operational logs across thousands of services.
KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure
Meta needed to automatically optimize low-level infrastructure and kernel-level parameters for AI ranking models to improve performance without manual tuning.
Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads
Meta needed to scale their ads ranking models to LLM-scale complexity and size while maintaining inference latency requirements for real-time ad serving.