Browse past weeks of engineering reads.
Airbnb needed to scale their identity graph infrastructure to efficiently resolve user identities and understand relationships between entities across their platform.
Airbnb needed to transition Viaduct from an internal-only data mesh tool to a production-ready, community-driven platform with a stable public API.
Designing monitoring and observability systems that remain functional and reliable even when the core infrastructure they monitor is failing or degraded.
How to build a durable workflow execution engine that can recover from failures mid-process without losing state or duplicating work.
Building a metrics storage system capable of ingesting 50 million samples per second while reliably storing 2.5 petabytes of time series data at scale.
Migrating a large-scale metrics pipeline from StatsD to OpenTelemetry while handling production traffic volumes without losing data or blocking dependent systems.
Airbnb's multi-tenant key-value store (Mussel) used static rate limiting that couldn't adapt to varying traffic patterns and spikes, risking degraded performance and reliability for all tenants during surges.
This article is a personal profile of a Senior Director of Engineering at Airbnb rather than a technical post addressing a specific engineering challenge. It highlights her role overseeing Application & Cloud infrastructure but does not detail a specific system problem.
Airbnb relied primarily on card payments across 220+ global markets, but many users preferred local payment methods, causing checkout friction, reduced accessibility, and lower adoption in key markets.
Dynamic configuration changes at scale can cause widespread outages if rolled out unsafely—a single bad config update can immediately affect all services and requests without the safety net of a gradual deployment process.