Archives — Distributed Readings

AWS ↗

Cyber resilience on AWS: A reference approach for recovery from ransomware and destructive events

How to design systems that can recover from ransomware and destructive cyberattacks when backups, credentials, and infrastructure components have been compromised.

security storage-systems

4 min

AWS ↗

How ALS GeoAnalytics LITHOLENS ™ revolutionizes core logging through machine learning with Amazon EKS

ALS GeoAnalytics needed to scale machine learning model training and inference for core logging analysis while managing computational costs effectively.

distributed-systems ml-systems

3 min

AWS ↗

How Synthesia optimizes generative AI video inference on Amazon EC2 G7e instances

Synthesia needed to maximize GPU utilization during video inference on EC2 G7e instances by reducing idle time caused by sequential GPU compute, data transfer, and post-processing operations.

ml-systems real-time-systems

5 min

AWS ↗

Building hybrid multi-tenant architecture for stateful services on AWS

Building a multi-tenant architecture that isolates tenants without requiring separate AWS accounts while maintaining stateful service deployments.

load-balancing distributed-systems

5 min

AWS ↗

Choosing between single or multiple organizations in AWS Organizations

Organizations must determine whether to operate under a single AWS organization or split into multiple organizations based on their operational, security, and scaling requirements.

security distributed-systems

4 min

AWS ↗

Streaming CloudWatch metrics to VPC-based OpenTelemetry collectors using Lambda

Streaming CloudWatch metrics to internal VPC-based OpenTelemetry collectors without exposing them to the internet.

observability serverless

4 min

AWS ↗

Deloitte optimizes EKS environment provisioning and achieves 89% faster testing environments using Amazon EKS and vCluster

Deloitte needed to significantly reduce the time required to provision and spin up testing environments for their Kubernetes workloads.

distributed-systems microservices

3 min

AWS ↗

Modernizing KYC with AWS serverless solutions and agentic AI for financial services

Traditional rule-based KYC (Know Your Customer) systems lack the autonomous decision-making capability and real-time validation speed needed for modern financial services compliance operations.

serverless real-time-systems

5 min

AWS ↗

PACIFIC enables multi-tenant, sovereign product carbon footprint exchange on the Catena-X data space using AWS

Enable multiple independent organizations to securely exchange Product Carbon Footprint (PCF) data within a shared data space while maintaining data sovereignty and tenant isolation.

microservices security

4 min

AWS ↗

Real-time analytics: Oldcastle integrates Infor with Amazon Aurora and Amazon Quick Sight

Oldcastle needed to overcome the limitations of traditional ERP reporting to enable real-time analytics and dashboards for their Infor ERP system.

databases real-time-systems

5 min

AWS ↗

Build a multi-tenant configuration system with tagged storage patterns

Building a scalable multi-tenant configuration service that maintains strict tenant isolation while supporting real-time updates without cache staleness or downtime.

caching storage-systems

5 min

AWS ↗

Unlock efficient model deployment: Simplified Inference Operator setup on Amazon SageMaker HyperPod

Simplifying the deployment and scheduling of machine learning inference workloads across multiple instances and instance types on Amazon SageMaker HyperPod.

ml-systems distributed-systems

4 min

AWS ↗

Architecting for agentic AI development on AWS

AI agents struggle to iterate rapidly on system design and codebases due to architectural patterns that limit their ability to understand, modify, and validate applications effectively.

microservices serverless

5 min

AWS ↗

Automate safety monitoring with computer vision and generative AI

Detecting safety hazards in real-time across hundreds of distributed operational sites using video feeds while maintaining low latency and managing the computational complexity of processing multiple camera streams.

real-time-systems distributed-systems

5 min

AWS ↗

How Aigen transformed agricultural robotics for sustainable farming with Amazon SageMaker AI

Aigen needed to scale machine learning pipelines across hundreds of distributed edge solar robots while managing data labeling and model training challenges in agricultural robotics.

ml-systems distributed-systems

5 min

AWS ↗

How Generali Malaysia optimizes operations with Amazon EKS

Generali Malaysia needed to optimize Kubernetes operations on AWS while reducing operational overhead, managing costs, and improving security posture.

distributed-systems security

4 min

AWS ↗

Streamlining access to powerful disaster recovery capabilities of AWS

Organizations need a streamlined way to protect and recover entire AWS workloads across multiple layers (data, compute, infrastructure, networking, and configuration) in the event of a disaster.

storage-systems security

5 min

AWS ↗

6,000 AWS accounts, three people, one platform: Lessons learned

Managing 6,000 AWS accounts for a multi-tenant serverless SaaS platform with only three people created massive operational challenges around automation, observability, and cost management at scale.

distributed-systems microservices

4 min

AWS ↗

AI-powered event response for Amazon EKS

Responding to operational events in Amazon EKS clusters is often manual, slow, and requires deep expertise, making it difficult to handle incidents at scale across complex Kubernetes environments.

observability ml-systems

3 min

AWS ↗

Announcing the updated AWS Well-Architected Generative AI Lens

Organizations building generative AI workloads on AWS lacked comprehensive architectural guidance covering responsible AI, data architecture, and emerging patterns like agentic workflows, leading to poorly architected AI systems.

ml-systems api-design

4 min

AWS ↗

Announcing the updated AWS Well-Architected Machine Learning Lens

Organizations building ML workloads on AWS lacked up-to-date architectural guidance that incorporates the latest services, capabilities, and best practices, leading to sub-optimal ML system designs across reliability, performance, cost, and operational dimensions.

ml-systems

3 min

AWS ↗

Architecting conversational observability for cloud applications

Diagnosing and resolving issues in complex Kubernetes clusters is slow and requires expert knowledge, leading to high Mean Time to Recovery (MTTR) and heavy reliance on specialized engineers for root cause analysis.

observability ml-systems

4 min

AWS ↗

Architecting for AI excellence: AWS launches three Well-Architected Lenses at re:Invent 2025

Organizations deploying AI/ML workloads on AWS lacked comprehensive architectural guidance for building responsible, well-architected machine learning and generative AI systems at scale.

ml-systems

5 min

AWS ↗

BASF Digital Farming builds a STAC-based solution on Amazon EKS

BASF Digital Farming needed a scalable way to catalog, discover, and serve large volumes of spatiotemporal geospatial data (satellite imagery, crop data) for their xarvio crop optimization platform, and their existing infrastructure struggled with the scale and query patterns of this data.

microservices storage-systems

4 min

AWS ↗

Build priority-based message processing with Amazon MQ and AWS App Runner

Standard message queues process messages in FIFO order, lacking the ability to prioritize urgent messages over lower-priority ones, which can cause critical tasks to wait behind less important work during high load.

messaging-queues real-time-systems

5 min

AWS ↗

Building an AI gateway to Amazon Bedrock with Amazon API Gateway

Enterprises adopting Amazon Bedrock need centralized governance over AI model access, including authorization controls, usage quotas, and auditing, but lack a standardized gateway pattern to enforce these policies at scale.

api-design rate-limiting

4 min

AWS ↗

Digital Transformation at Santander: How Platform Engineering is Revolutionizing Cloud Infrastructure

Santander struggled to manage cloud infrastructure supporting billions of daily transactions across 200+ critical systems, facing complexity and scalability challenges in their banking operations.

distributed-systems microservices

5 min

AWS ↗

How Artera enhances prostate cancer diagnostics using AWS

Artera needed to develop and scale an AI-powered prostate cancer diagnostic test, requiring significant compute resources for model training/inference and a reliable pipeline to deliver timely, personalized treatment recommendations.

ml-systems storage-systems

4 min

AWS ↗

How BASF’s Agriculture Solutions drives traceability and climate action by tokenizing cotton value chains using Amazon Managed Blockchain

Agricultural supply chains (cotton/food) lack end-to-end traceability, making it difficult to verify sustainability claims, track climate impact, and ensure circularity across complex multi-party value chains.

distributed-systems security

4 min

AWS ↗

How Convera built fine-grained API authorization with Amazon Verified Permissions

Convera needed to implement fine-grained authorization for their API platform, where coarse-grained access controls were insufficient to manage complex permission requirements across API resources and actions.

api-design security

3 min

AWS ↗

How Salesforce migrated from Cluster Autoscaler to Karpenter across their fleet of 1,000 EKS clusters

Salesforce's Cluster Autoscaler could not efficiently scale and manage node provisioning across their fleet of 1,000+ EKS clusters, likely suffering from slow scaling decisions, suboptimal bin-packing, and operational complexity at massive scale.

distributed-systems load-balancing

4 min

AWS ↗

Know before you go – AWS re:Invent 2025 guide to Well-Architected and Cloud Optimization sessions

Organizations struggle to design well-architected cloud systems that balance cost optimization, security, reliability, and performance efficiency across increasingly complex AWS environments including AI-powered workloads.

security microservices

5 min

AWS ↗

Mastering millisecond latency and millions of events: The event-driven architecture behind the Amazon Key Suite

The Amazon Key Suite had a tightly coupled monolithic architecture that struggled with reliability and scalability when processing millions of events at millisecond latency requirements across multiple service integrations.

microservices messaging-queues

5 min

AWS ↗

Secure Amazon Elastic VMware Service (Amazon EVS) with AWS Network Firewall

Securing Amazon Elastic VMware Service (EVS) environments requires centralized traffic inspection across multiple VPCs, on-premises data centers, and internet egress points, which is complex to architect and implement.

security distributed-systems

4 min

AWS ↗

She architects: Bringing unique perspectives to innovative solutions at AWS

The article addresses the challenge of diverse representation and perspectives in cloud architecture roles, exploring how lack of varied viewpoints can limit innovation in technical solution design.

api-design

5 min

AWS ↗

Sovereign failover – Design for digital sovereignty using the AWS European Sovereign Cloud

Organizations operating under European digital sovereignty requirements need resilient failover capabilities, but regulatory constraints on data residency and governance make cross-partition (sovereign-to-commercial cloud) failover architecturally complex.

distributed-systems security

4 min

AWS ↗

The Hidden Price Tag: Uncovering Hidden Costs in Cloud Architectures with the AWS Well-Architected Framework

Organizations migrating to or operating in the cloud encounter hidden and unexpected costs due to suboptimal architectural decisions, resource misconfigurations, and lack of adherence to cloud best practices.

distributed-systems storage-systems

5 min