Archives — Distributed Readings

AWS ↗

Cyber resilience on AWS: A reference approach for recovery from ransomware and destructive events

How to design systems that can recover from ransomware and destructive cyberattacks when backups, credentials, and infrastructure components have been compromised.

security storage-systems

4 min

Cloudflare ↗

Announcing Claude Compliance API support with Cloudflare CASB

Security teams needed visibility and compliance monitoring of Claude Enterprise API usage across their organization without leaving their existing security infrastructure.

security api-design

3 min

Cloudflare ↗

Announcing Claude Managed Agents on Cloudflare

Enabling developers to deploy and scale autonomous agent workflows globally while maintaining security isolation and control over access to private backend systems.

distributed-systems security

4 min

Cloudflare ↗

Project Glasswing: what Mythos showed us

Determining whether security-focused LLMs can effectively identify vulnerabilities in live production infrastructure code at scale.

security ml-systems

4 min

Google Cloud ↗

Cloud Engineer’s AI Toolkit: Sign up Now for a Developer Workshop Near You!

Organizations need to securely build, deploy, and govern autonomous AI agents at enterprise scale as the industry transitions from experimental LLMs to production agentic AI systems.

ml-systems security

5 min

Google Cloud ↗

Five must-have guides to move agents into production with Gemini Enterprise Agent Platform

Deploying and managing AI agents at scale in production requires infrastructure for state management, security governance, and complex workflow orchestration that goes beyond demo implementations.

distributed-systems security

5 min

Google Cloud ↗

Introducing Gemini Enterprise Agent Platform, powering the next wave of agents

Building safe, reliable, and autonomous agents that can act independently across multiple enterprise systems while maintaining security, governance, and reliability guardrails.

ml-systems security

5 min

Google Cloud ↗

Next ‘26: Redefining security for the AI era with Google Cloud and Wiz

Organizations need to secure their AI systems and infrastructure against emerging AI-era threats while maintaining the ability to leverage AI's potential at scale.

security distributed-systems

5 min

Google Cloud ↗

Securing Your Gemini and Google API Keys

Developers using Google's AI APIs (Gemini and Google APIs) are exposing their API keys to unauthorized access, leading to account compromise, token theft, and service abuse.

security api-design

5 min

Google Cloud ↗

What Google I/O '26 means for developing agents on Google Cloud

Developers needed a unified, secure way to build AI agents locally and deploy them to Google Cloud with standardized protocols and tooling.

api-design microservices

5 min

Google Cloud ↗

What’s new with the Cross-Cloud Network at Next ‘26

Enabling seamless connectivity, governance, and security across multi-agent AI systems and core applications distributed globally at planet scale.

distributed-systems microservices

5 min

AWS ↗

Building hybrid multi-tenant architecture for stateful services on AWS

Building a multi-tenant architecture that isolates tenants without requiring separate AWS accounts while maintaining stateful service deployments.

load-balancing distributed-systems

5 min

AWS ↗

Choosing between single or multiple organizations in AWS Organizations

Organizations must determine whether to operate under a single AWS organization or split into multiple organizations based on their operational, security, and scaling requirements.

security distributed-systems

4 min

AWS ↗

Streaming CloudWatch metrics to VPC-based OpenTelemetry collectors using Lambda

Streaming CloudWatch metrics to internal VPC-based OpenTelemetry collectors without exposing them to the internet.

observability serverless

4 min

Cloudflare ↗

When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug

CUBIC congestion control algorithm's congestion window was becoming pinned at minimum values in QUIC, causing severe performance degradation due to incorrect idle period detection.

networking security

4 min

Meta ↗

Labyrinth 1.1: Making End-to-End Encrypted Backups Even More Reliable

Ensuring end-to-end encrypted messages and conversation history survive device loss, device switches, and extended offline periods without compromising encryption guarantees.

storage-systems security

5 min

Cloudflare ↗

How Cloudflare responded to the “Copy Fail” Linux vulnerability

Rapidly detect, investigate, and mitigate a critical Linux kernel privilege escalation vulnerability across a global edge computing fleet without impacting customers.

security distributed-systems

4 min

Cloudflare ↗

When DNSSEC goes wrong: how we responded to the .de TLD outage

When DENIC published invalid DNSSEC signatures for the .de TLD, DNS resolvers like 1.1.1.1 faced a critical decision: reject all .de domain queries due to signature validation failures or serve potentially stale cached responses to maintain availability.

caching distributed-systems

4 min

Stripe ↗

Analyzing first-party fraud trends: Account, free trial, and refund abuse

Detecting and preventing first-party fraud at scale across a payment network where legitimate users abuse policies through multiple accounts, free trial cycling, and refund exploitation.

ml-systems security

4 min

Stripe ↗

Giving agents the ability to pay

Enable autonomous agents to programmatically access payment instruments and execute transactions without requiring human intervention or direct card/account access.

api-design security

4 min

Stripe ↗

How Stripe Radar helps prevent free trial abuse

Detecting and preventing fraudulent behavior in free trial signups, such as repeated trial abuse and missed cancellations, at scale with high accuracy.

ml-systems api-design

4 min

Stripe ↗

How agents, digital wallets, and trust are rewriting checkout

Understanding and optimizing the checkout conversion funnel across diverse ecommerce businesses to identify what drives successful transactions in modern online payment flows.

api-design real-time-systems

4 min

Stripe ↗

Three of the biggest fraud trends from MRC Vegas 2026

Detecting and preventing sophisticated fraud attacks while minimizing friction for legitimate users in payment systems.

api-design security

4 min

Cloudflare ↗

Agents can now create Cloudflare accounts, buy domains, and deploy

How to enable autonomous agents to programmatically create Cloudflare accounts, purchase domains, and deploy infrastructure without manual dashboard interaction or credential handling.

api-design security

4 min

Cloudflare ↗

Code Orange: Fail Small is complete. The result is a stronger Cloudflare network

Cloudflare needed to make their global edge infrastructure more resilient to configuration changes and prevent widespread outages caused by unsafe deployments.

distributed-systems observability

4 min

Cloudflare ↗

Post-quantum encryption for Cloudflare IPsec is generally available

Protecting IPsec communications from future quantum computing threats while maintaining current interoperability with existing infrastructure.

security distributed-systems

3 min

Cloudflare ↗

Shutdowns, power outages, and conflict: a review of Q1 2026 Internet disruptions

How to measure, analyze, and publicly report on Internet disruptions caused by geopolitical events, infrastructure attacks, and power outages in real-time across global networks.

observability distributed-systems

4 min

Meta ↗

How Meta Is Strengthening End-to-End Encrypted Backups

How to enable end-to-end encrypted backups for messaging applications while ensuring recovery codes remain inaccessible to Meta, cloud providers, and other third parties.

security storage-systems

5 min

AWS ↗

Modernizing KYC with AWS serverless solutions and agentic AI for financial services

Traditional rule-based KYC (Know Your Customer) systems lack the autonomous decision-making capability and real-time validation speed needed for modern financial services compliance operations.

serverless real-time-systems

5 min

AWS ↗

PACIFIC enables multi-tenant, sovereign product carbon footprint exchange on the Catena-X data space using AWS

Enable multiple independent organizations to securely exchange Product Carbon Footprint (PCF) data within a shared data space while maintaining data sovereignty and tenant isolation.

microservices security

4 min

Cloudflare ↗

Building the agentic cloud: everything we launched during Agents Week 2026

How to enable developers to build and deploy AI agents at scale across a distributed edge computing network while maintaining security and providing necessary infrastructure tools.

distributed-systems security

4 min

Cloudflare ↗

Making Rust Workers reliable: panic and abort recovery in wasm‑bindgen

Rust panics in Cloudflare Workers were fatal and poisoned the entire worker instance, making applications unreliable when unhandled errors occurred.

security observability

4 min

Cloudflare ↗

Moving past bots vs. humans

Traditional bot detection mechanisms are becoming ineffective as AI assistants and privacy proxies blur the distinction between legitimate users and automated abuse.

security api-design

4 min

Cloudflare ↗

Orchestrating AI Code Review at scale

Cloudflare needed to scale code review processes across their engineering organization while maintaining code quality and security standards without overwhelming human reviewers.

ml-systems api-design

3 min

Cloudflare ↗

The AI engineering stack we built internally — on the platform we ship

Cloudflare needed to build an internal AI engineering stack that could handle massive scale (20 million requests, 241 billion tokens) while dogfooding their own platform products.

api-design ml-systems

4 min

Airbnb ↗

Privacy-first connections: Empowering social experiences at Airbnb

How can Airbnb enable social features and community connections while maintaining strict user privacy and giving users control over their personal data sharing?

security api-design

5 min

Cloudflare ↗

Agents Week: network performance update

Cloudflare needed to improve request handling performance across its global network to maintain competitive advantage over other CDNs.

distributed-systems load-balancing

4 min

Cloudflare ↗

Introducing Agent Lee - a new interface to the Cloudflare stack

Users had to manually navigate multiple tabs and interfaces within the Cloudflare dashboard to troubleshoot issues and manage their infrastructure, creating friction in the workflow.

api-design security

4 min

Cloudflare ↗

Introducing the Agent Readiness score. Is your site agent-ready?

Website owners needed a way to measure and understand how well their sites support AI agents and web crawlers for indexing and integration.

api-design observability

4 min

Cloudflare ↗

Redirects for AI Training enforces canonical content

AI crawlers were ingesting deprecated and non-canonical content despite soft directives like robots.txt, requiring a way to enforce canonical versions without modifying origin infrastructure.

caching security

4 min

Cloudflare ↗

Securing non-human identities: automated revocation, OAuth, and scoped permissions

Developers lack effective mechanisms to prevent unauthorized access when API credentials are accidentally exposed or compromised.

security api-design

4 min

Meta ↗

Post-Quantum Cryptography Migration at Meta: Framework, Lessons, and Takeaways

Meta needed to migrate its infrastructure and systems to post-quantum cryptography standards before quantum computers could break existing encryption schemes.

security distributed-systems

5 min

Cloudflare ↗

500 Tbps of capacity: 16 years of scaling our global network

How to scale a global content delivery and DDoS mitigation network to handle massive throughput (500 Tbps) while maintaining capacity to protect against record-breaking attacks.

load-balancing distributed-systems

3 min

Cloudflare ↗

Cloudflare targets 2029 for full post-quantum security

Cloudflare needed to prepare its global infrastructure and services for the threat of quantum computing attacks on current cryptographic standards before 2029.

security distributed-systems

4 min

Cloudflare ↗

From bytecode to bytes: automated magic packet generation

Cloudflare needed to automatically generate malware trigger packets for BPF bytecode analysis, which previously required hours of manual work.

security

3 min

Cloudflare ↗

How we built Organizations to help enterprises manage Cloudflare at scale

Cloudflare needed to enable enterprise customers to manage multiple accounts and resources under a unified organizational structure with centralized authorization and access control.

api-design security

4 min

Cloudflare ↗

Welcome to Agents Week

How to enable AI agents to operate effectively at the edge of the internet with the security, performance, and reliability characteristics of Cloudflare's existing infrastructure.

distributed-systems security

4 min

AWS ↗

How Generali Malaysia optimizes operations with Amazon EKS

Generali Malaysia needed to optimize Kubernetes operations on AWS while reducing operational overhead, managing costs, and improving security posture.

distributed-systems security

4 min

AWS ↗

Streamlining access to powerful disaster recovery capabilities of AWS

Organizations need a streamlined way to protect and recover entire AWS workloads across multiple layers (data, compute, infrastructure, networking, and configuration) in the event of a disaster.

storage-systems security

5 min

Airbnb ↗

My Journey to Airbnb — Jonathan Woodard

This article does not describe a specific engineering problem or technical solution.

security

5 min

Cloudflare ↗

Cloudflare Client-Side Security: smarter detection, now open to everyone

Detecting sophisticated client-side security threats like zero-day exploits while minimizing false positives in real-time across millions of requests.

security ml-systems

4 min

Cloudflare ↗

Introducing EmDash — the spiritual successor to WordPress that solves plugin security

WordPress plugins pose significant security risks because they run with unrestricted access to the entire system, requiring a safer plugin architecture that isolates untrusted code.

security microservices

4 min

Cloudflare ↗

Introducing Programmable Flow Protection: custom DDoS mitigation logic for Magic Transit customers

Magic Transit customers needed the ability to define and enforce custom DDoS mitigation logic for proprietary and non-standard UDP protocols without being limited to Cloudflare's pre-built detection rules.

security distributed-systems

4 min

Cloudflare ↗

Our ongoing commitment to privacy for the 1.1.1.1 public DNS resolver

How to design a public DNS resolver that prioritizes user privacy while maintaining performance and trustworthiness at scale.

security distributed-systems

4 min

Cloudflare ↗

Sandboxing AI agents, 100x faster

How to safely execute untrusted AI-generated code with minimal latency and resource overhead.

security edge-computing

4 min

LinkedIn ↗

Securing every Kubernetes workload at scale

Securing thousands of Kubernetes workloads across a large-scale infrastructure requires automated and consistent security policies.

security microservices

3 min

AWS ↗

How BASF’s Agriculture Solutions drives traceability and climate action by tokenizing cotton value chains using Amazon Managed Blockchain

Agricultural supply chains (cotton/food) lack end-to-end traceability, making it difficult to verify sustainability claims, track climate impact, and ensure circularity across complex multi-party value chains.

distributed-systems security

4 min

AWS ↗

How Convera built fine-grained API authorization with Amazon Verified Permissions

Convera needed to implement fine-grained authorization for their API platform, where coarse-grained access controls were insufficient to manage complex permission requirements across API resources and actions.

api-design security

3 min

AWS ↗

Know before you go – AWS re:Invent 2025 guide to Well-Architected and Cloud Optimization sessions

Organizations struggle to design well-architected cloud systems that balance cost optimization, security, reliability, and performance efficiency across increasingly complex AWS environments including AI-powered workloads.

security microservices

5 min

AWS ↗

Secure Amazon Elastic VMware Service (Amazon EVS) with AWS Network Firewall

Securing Amazon Elastic VMware Service (EVS) environments requires centralized traffic inspection across multiple VPCs, on-premises data centers, and internet egress points, which is complex to architect and implement.

security distributed-systems

4 min

AWS ↗

Sovereign failover – Design for digital sovereignty using the AWS European Sovereign Cloud

Organizations operating under European digital sovereignty requirements need resilient failover capabilities, but regulatory constraints on data residency and governance make cross-partition (sovereign-to-commercial cloud) failover architecturally complex.

distributed-systems security

4 min

Cloudflare ↗

AI Security for Apps is now generally available

Organizations struggle to discover and secure AI-powered applications across their infrastructure, especially shadow AI deployments that teams spin up without central oversight, creating security blind spots.

security api-design

4 min

Cloudflare ↗

Active defense: introducing a stateful vulnerability scanner for APIs

Standard defensive security tools miss logic flaws and vulnerabilities in APIs because they lack understanding of stateful API interactions and business logic flows.

security api-design

3 min

Cloudflare ↗

Always-on detections: eliminating the WAF “log versus block” trade-off

Traditional WAFs force a trade-off between logging (risking missed attacks) and blocking (risking false positives), requiring extensive manual tuning to balance security coverage with availability.

security real-time-systems

4 min

Cloudflare ↗

Announcing Cloudflare Account Abuse Protection: prevent fraudulent attacks from bots and humans

Traditional bot-blocking approaches are insufficient for preventing account abuse (e.g., credential stuffing, fake account creation) because sophisticated attacks increasingly involve human-like behavior or actual humans, bypassing conventional bot detection.

security rate-limiting

3 min

Cloudflare ↗

Building a security overview dashboard for actionable insights

Security teams were overwhelmed by the volume of raw security data across Cloudflare's platform, making it difficult to prioritize and act on vulnerabilities and threats efficiently.

security observability

3 min

Cloudflare ↗

Complexity is a choice. SASE migrations shouldn’t take years.

Enterprise SASE (Secure Access Service Edge) migrations traditionally take 18+ months due to architectural complexity, requiring organizations to integrate networking and security across global infrastructure.

security distributed-systems

3 min

Cloudflare ↗

Fixing request smuggling vulnerabilities in Pingora OSS deployments

Cloudflare's open-source Pingora proxy had request smuggling vulnerabilities when deployed as an ingress proxy, allowing attackers to exploit HTTP parsing discrepancies to bypass security controls and route malicious requests.

security api-design

3 min

Cloudflare ↗

From legacy architecture to Cloudflare One

Organizations struggle to migrate from legacy network security architectures to modern SASE (Secure Access Service Edge) solutions, facing risks from accumulated technical debt and complex dependencies in their existing infrastructure.

security microservices

3 min

Cloudflare ↗

From the endpoint to the prompt: a unified data security vision in Cloudflare One

Organizations face fragmented data security across endpoints, network traffic, cloud applications, and AI prompts, making it difficult to enforce consistent data loss prevention (DLP) policies as data flows through diverse channels including RDP sessions and AI copilots.

security api-design

3 min

Cloudflare ↗

How Automatic Return Routing solves IP overlap

Enterprises connecting multiple private networks via tunnels frequently encounter overlapping IP address ranges (e.g., multiple sites using 10.0.0.0/8), making traditional routing tables unable to determine which tunnel should receive return traffic.

distributed-systems security

4 min

Cloudflare ↗

Introducing Custom Regions for precision data control

Customers needed precise control over where their data is processed geographically to meet diverse compliance requirements (e.g., GDPR, data sovereignty laws), but existing pre-defined regional options were too coarse-grained to cover all regulatory and performance needs.

distributed-systems security

4 min

Cloudflare ↗

Investigating multi-vector attacks in Log Explorer

Security teams lacked a unified view across multiple Cloudflare datasets, making it difficult to identify and investigate multi-vector attacks that span different attack surfaces and log sources.

observability security

3 min

Cloudflare ↗

Standing up for the open Internet: why we appealed Italy’s "Piracy Shield" fine

Italy's 'Piracy Shield' system forces Internet infrastructure providers like Cloudflare to block content at the network level without proper oversight or due process, leading to disproportionate overblocking of legitimate content.

security api-design

4 min

Cloudflare ↗

Translating risk insights into actionable protection: leveling up security posture with Cloudflare and Mastercard

Organizations struggle with Internet-facing blind spots in their attack surface, lacking continuous visibility into security gaps and risk exposures across their external-facing assets.

security

4 min

Meta ↗

How Advanced Browsing Protection Works in Messenger

Messenger needed to protect user privacy when clicking links in chats while still detecting and warning users about malicious URLs, creating a tension between link safety scanning and end-to-end privacy.

security messaging-queues

5 min

Meta ↗

Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps

Updating security-related APIs across millions of lines of code and thousands of engineers is extremely difficult at scale, especially when a single class of mobile vulnerability can be replicated across hundreds of locations in an Android codebase.

security ml-systems

5 min