Archives — Distributed Readings

Google ↗

A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity

Enable on-device AI models to coordinate complex tasks across external data sources while maintaining persistent user context and proactive engagement without relying solely on cloud connectivity.

api-design ml-systems

5 min

Google ↗

A2UI v0.9: The New Standard for Portable, Framework-Agnostic Generative UI

AI agents needed a standardized way to generate UI components that work across different platforms and frameworks without being tightly coupled to any specific technology stack.

api-design real-time-systems

5 min

Google ↗

Accelerating on-device AI: A look at Arm and Google AI Edge optimization

Enabling efficient execution of generative AI models on edge devices with limited computational resources while maintaining acceptable latency and performance.

ml-systems api-design

5 min

Google ↗

Announcing ADK for Kotlin and ADK for Android 0.1.0: Building AI Agents on Android and Beyond

Developers needed a way to build AI agent workflows that could run on Android devices and backend systems without reinventing the core agentic logic across different platforms.

api-design sdks

3 min

Google ↗

Announcing Genkit Middleware: Intercept, extend, and harden your agentic apps

Developers need a way to reliably control, monitor, and extend AI model generation calls in production agentic applications without modifying core business logic.

api-design ml-systems

5 min

Google ↗

Blazing fast on-device GenAI with LiteRT-LM

Running large language models efficiently on mobile and edge devices while preserving multimodal and agentic capabilities without requiring server-side inference.

ml-systems mobile-platforms

5 min

Google ↗

Build Long-running AI agents that pause, resume, and never lose context with ADK

Building production-grade AI agents that can maintain context and state across long-running enterprise workflows spanning days or weeks without losing information during idle periods or server restarts.

api-design distributed-systems

5 min

Google ↗

Building real-world on-device AI with LiteRT and NPU

Mobile developers faced performance and battery inefficiency when running AI models on CPU/GPU, limiting real-time AI applications on edge devices.

api-design ml-systems

5 min

Google ↗

Building with Gemini Embedding 2: Agentic multimodal RAG and beyond

Developers needed a unified embedding model capable of processing interleaved multimodal inputs (text, images, video, audio, documents) in a single semantic space for tasks like retrieval-augmented generation and visual search.

api-design ml-systems

5 min

Google ↗

Empowering Service Providers and Hardware Partners with Gemini for Home

How can Google enable third-party service providers and hardware manufacturers to build intelligent smart home experiences without requiring deep AI/ML expertise or significant R&D investment?

api-design ml-systems

5 min

Google ↗

Google Tensor SDK Beta with LiteRT

Developers needed a unified way to build, deploy, and run high-performance machine learning models directly on edge devices (Google Pixel TPU) with reliable fallback mechanisms.

ml-systems api-design

5 min

Google ↗

MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

Enabling efficient post-training of large language models on single-host TPU configurations without requiring complex multi-host distributed setups.

ml-systems distributed-systems

5 min

Google ↗

One Year of Innovation: Celebrating 100k Members in the Google Cloud x NVIDIA Developer Community

Developers needed accessible infrastructure, resources, and structured learning pathways to effectively build and optimize AI applications using GPUs and large language models at scale.

api-design ml-systems

5 min

Google ↗

Production-Ready AI Agents: 5 Lessons from Refactoring a Monolith

Converting a brittle, monolithic sales research AI prototype into a production-ready agent that eliminates silent failures, fragile parsing, and lacks observability.

microservices observability

5 min

Google ↗

Speeding Up AI: Bringing Google Colossus to PyTorch via GCSFS and Rapid Bucket

AI training pipelines were bottlenecked by slow data I/O when accessing training datasets stored in Google Cloud, limiting throughput and increasing total training time.

storage-systems ml-systems

5 min

Google ↗

Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding

Autoregressive LLM decoding suffers from sequential bottlenecks where tokens must be generated one-at-a-time, limiting throughput and inference speed on hardware accelerators like TPUs.

ml-systems real-time-systems

5 min