Archives — Distributed Readings

Google ↗

A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity

Enable on-device AI models to coordinate complex tasks across external data sources while maintaining persistent user context and proactive engagement without relying solely on cloud connectivity.

api-design ml-systems

5 min

Google ↗

A2UI v0.9: The New Standard for Portable, Framework-Agnostic Generative UI

AI agents needed a standardized way to generate UI components that work across different platforms and frameworks without being tightly coupled to any specific technology stack.

api-design real-time-systems

5 min

Google ↗

Build Long-running AI agents that pause, resume, and never lose context with ADK

Building production-grade AI agents that can maintain context and state across long-running enterprise workflows spanning days or weeks without losing information during idle periods or server restarts.

api-design distributed-systems

5 min

Google ↗

Building real-world on-device AI with LiteRT and NPU

Mobile developers faced performance and battery inefficiency when running AI models on CPU/GPU, limiting real-time AI applications on edge devices.

api-design ml-systems

5 min

Google ↗

Google Tensor SDK Beta with LiteRT

Developers needed a unified way to build, deploy, and run high-performance machine learning models directly on edge devices (Google Pixel TPU) with reliable fallback mechanisms.

ml-systems api-design

5 min

Google ↗

Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding

Autoregressive LLM decoding suffers from sequential bottlenecks where tokens must be generated one-at-a-time, limiting throughput and inference speed on hardware accelerators like TPUs.

ml-systems real-time-systems

5 min