Browse past weeks of engineering reads.
Manual prompt engineering for Dropbox Dash's relevance judge was unreliable, hard to measure, and costly—making it difficult to systematically improve task performance in production.
Airbnb users in the early trip planning stage often lack a clear travel destination, making it difficult to provide relevant recommendations and convert exploratory browsing into bookings.
Airbnb needed to advance its AI, data science, and machine learning capabilities across multiple domains (NLP, optimization, measurement science) to improve its travel and living platform, requiring solutions to challenges in search ranking, recommendation, experimentation, and large-scale data processing.
Dash's search ranking models required large volumes of high-quality labeled relevance data to train effectively, but human labeling alone was too slow and expensive to scale to the needed coverage.
Enterprise search and AI assistant products like Dropbox Dash need to connect disparate data sources and optimize AI-driven retrieval, but naively querying across siloed data with LLMs leads to poor relevance and brittle prompt engineering.
Netflix's Graph Search platform for federated enterprise data required users to write structured queries, limiting accessibility and ease of use despite the system being scalable and configurable.
Dropbox Dash needs to rank and retrieve relevant context across a user's work in real time, requiring low-latency access to precomputed and real-time features for AI-driven search and recommendation models.
Dropbox Dash's AI agent struggled with effectiveness when naively providing all available context to the model, leading to degraded performance as irrelevant information diluted the signal needed for accurate, agentic AI responses.
Dropbox Dash needed deeper understanding of multimodal content (photos and videos) across user files, but processing diverse media types at Dropbox's scale posed efficiency and architectural challenges.