Airbnb

Viaduct 1.0 and the future of Airbnb’s data mesh

Airbnb needed to transition Viaduct from an internal-only data mesh tool to a production-ready, community-driven platform with a stable public API.

api-design distributed-systems
5 min
Airbnb

Monitoring reliably at scale

Designing monitoring and observability systems that remain functional and reliable even when the core infrastructure they monitor is failing or degraded.

observability distributed-systems
5 min
Airbnb

Skipper: Building Airbnb’s embedded workflow engine

How to build a durable workflow execution engine that can recover from failures mid-process without losing state or duplicating work.

distributed-systems databases
5 min
Airbnb

Building a fault-tolerant metrics storage system at Airbnb

Building a metrics storage system capable of ingesting 50 million samples per second while reliably storing 2.5 petabytes of time series data at scale.

observability storage-systems
5 min
Airbnb

Building a high-volume metrics pipeline with OpenTelemetry and vmagent

Migrating a large-scale metrics pipeline from StatsD to OpenTelemetry while handling production traffic volumes without losing data or blocking dependent systems.

observability distributed-systems
5 min
Airbnb

What COVID did to our forecasting models (and what we built to handle the next shock)

Building forecasting models that remain accurate during sudden market shocks like a global pandemic, where historical data no longer predicts future outcomes.

ml-systems observability
5 min
Airbnb

From vendors to vanguard: Airbnb’s hard-won lessons in observability ownership

Airbnb's reliance on multiple third-party observability vendors resulted in inconsistent data, fragmented developer experiences, and limitations in cost-effectiveness and reliability at their scale.

observability microservices
5 min
Airbnb

It Wasn’t a Culture Problem: Upleveling Alert Development at Airbnb

Airbnb's Observability as Code alert development process had excessively long development cycles (weeks) due to cumbersome code review workflows, slowing down engineers' ability to create and iterate on alerts at scale across thousands of services.

observability microservices
5 min