Browse past weeks of engineering reads.
Airbnb needed to scale their identity graph infrastructure to efficiently resolve user identities and understand relationships between entities across their platform.
Airbnb needed to transition Viaduct from an internal-only data mesh tool to a production-ready, community-driven platform with a stable public API.
Designing monitoring and observability systems that remain functional and reliable even when the core infrastructure they monitor is failing or degraded.
How to build a durable workflow execution engine that can recover from failures mid-process without losing state or duplicating work.
Building a metrics storage system capable of ingesting 50 million samples per second while reliably storing 2.5 petabytes of time series data at scale.
How can Airbnb enable social features and community connections while maintaining strict user privacy and giving users control over their personal data sharing?
Migrating a large-scale metrics pipeline from StatsD to OpenTelemetry while handling production traffic volumes without losing data or blocking dependent systems.
This article does not describe a specific engineering problem or technical solution.
Building forecasting models that remain accurate during sudden market shocks like a global pandemic, where historical data no longer predicts future outcomes.
Airbnb needed to advance its AI, data science, and machine learning capabilities across multiple domains (NLP, optimization, measurement science) to improve its travel and living platform, requiring solutions to challenges in search ranking, recommendation, experimentation, and large-scale data processing.
Airbnb's multi-tenant key-value store (Mussel) used static rate limiting that couldn't adapt to varying traffic patterns and spikes, risking degraded performance and reliability for all tenants during surges.
Airbnb's reliance on multiple third-party observability vendors resulted in inconsistent data, fragmented developer experiences, and limitations in cost-effectiveness and reliability at their scale.
Producing valid and realistic mock data for GraphQL testing and prototyping is tedious to write and maintain; existing approaches like random value generation and field-level stubbing lack domain context, resulting in unconvincing and brittle test data that doesn't scale across a large schema.
Airbnb's Observability as Code alert development process had excessively long development cycles (weeks) due to cumbersome code review workflows, slowing down engineers' ability to create and iterate on alerts at scale across thousands of services.
This article is a personal profile of a Senior Director of Engineering at Airbnb rather than a technical post addressing a specific engineering challenge. It highlights her role overseeing Application & Cloud infrastructure but does not detail a specific system problem.
Airbnb needed to build robust data science and economic modeling capabilities to understand and optimize their two-sided marketplace dynamics for policy and business decisions.
Airbnb relied primarily on card payments across 220+ global markets, but many users preferred local payment methods, causing checkout friction, reduced accessibility, and lower adoption in key markets.
Airbnb users in the early trip planning stage often lack a clear travel destination, making it difficult to provide relevant recommendations and convert exploratory browsing into bookings.
Dynamic configuration changes at scale can cause widespread outages if rolled out unsafely—a single bad config update can immediately affect all services and requests without the safety net of a gradual deployment process.