Google

Building with Gemini Embedding 2: Agentic multimodal RAG and beyond

Developers needed a unified embedding model capable of processing interleaved multimodal inputs (text, images, video, audio, documents) in a single semantic space for tasks like retrieval-augmented generation and visual search.

api-design ml-systems
5 min