Cloud Infrastructure for Scalable Gaming Platforms

Last updated: 2026-06-15

Launch day is loud. It is 3 a.m. and your chat lights up. Concurrency jumps from 12,000 to 240,000 in 11 minutes. CPU looks fine, but users cannot join matches. Why? Session state sits on one hot node. Egress to the social feed grows 18x and saturates the NAT gateway. The cache starts to evict. P95 match start time drifts from 4.1s to 12.6s. Players reboot. Support floods. The fix is not “add more nodes.” The fix is to move state, cut hops, and shape traffic. This guide shows the moves that work when the graph turns up and to the right.

The seven diagnostics to run before you “add more nodes”

Do these checks this week. They expose real choke points and stop you from guessing.

Connection churn: joins and leaves per minute by region and by platform.
p95 and p99 for lobby load, auth, and match start.
Egress per CCU and by path (CDN, APIs, data feeds).
Cache hit ratio and eviction rate under load.
Database write amplification (one action → how many writes?).
Queue lag and backpressure on brokers.
Cold start latency for pods, functions, and jobs.

Map each metric to an action. For example: if cache hit ratio is under 80%, add keys or RAM before scaling app nodes. If queue lag grows while CPU is low, add consumers or split topics. For a broader lens on resilience, see well-architected reliability checks.

Redis from 32 → 48 GB cut evictions from 4.2% to 0.3% and dropped p95 lobby from 210 ms to 140 ms.
Kafka partitions from 24 → 48 cleared 6-minute lag to under 30 seconds at 1.4M msgs/min.
Egress per CCU = ~120 KB/min; main cost came from cross-region webhooks, not CDN.

The physics of latency you cannot negotiate

Latency is a budget. You split it across client, edge, backbone, region, service, and data store. You cannot beat the speed of light. You can cut hops.

Client: input to send (e.g., 10–20 ms).
Edge: TLS and routing (e.g., 15–30 ms).
Backbone: distance across the globe (varies with route).
Region: LB → service → cache/DB (often 40–120 ms total).

Pick an edge network that shortens the middle mile. Keep chat, presence, and matchmaking hints near players. Put heavy writes in-region with a clear queue. If you host with Google, review how they place game workloads on Google’s backbone to reduce jitter on long routes.

Client device: 15 ms
Edge terminate + route: 25 ms
Backbone hop to region: 40 ms
Service + cache read: 60 ms
DB write (batched): 30 ms
Return path + render: 60 ms

Architecture moves that actually scale

These moves have paid off on real launches. They are simple in idea, but they need care in rollout.

Keep game services stateless. Store session and presence outside the process. Use Deployments and HPA for pods. Read up on Kubernetes primitives for stateless services so restarts do not drop users.

Use Redis for hot, low-latency in-memory state and Postgres for durable, authoritative data. This keeps fast reads off the primary store. See authoritative relational persistence and low-latency in-memory state for details and caveats.

Decouple writes from reads with streams. This helps when one part spikes (like rewards) and others (like profile pages) should not slow down. Start with compact events and idempotent consumers. For theory and APIs, check event streaming for decoupling.

Protect services with backpressure and circuit breakers. Use token bucket rate limits on login and gifting. Make idempotency keys for payments and claims. Shard only when you have a key that spreads load. If not, try vertical partitioning first (hot vs cold tables). CQRS can help when read and write needs are very different.

Build vs Buy Matrix for Critical Components

Session cache	Thundering herds, eviction storms	Full control; but on-call load, upgrades, failover drills	Multi-AZ, support; but egress fees, hard limits	Hit ratio ≥ 70%, >200k RPM, eviction spikes at peak
Real-time messaging (pub/sub)	Fan-out, ordering, retries	Custom protocol, tight control; but complex ops	Global topics; but quotas, regional ties	>1M msgs/min, multi-region listeners, strict order need
Matchmaking	Fairness vs queue time; hot shards	Tuned logic; but heavy test suite	Faster start; but opaque algorithm	p95 match start > 8s at 100k CCU
Data lake for telemetry	Ingest bursts, schema drift	Flexible stack; but governance toil	Managed ingest; but cost shocks	>2 TB/day events, ≥30-day retention, wide schema
Payments/KYC bridge	Retries, timeouts, audits	Tailored UX; but long audits	Fast rollout; but vendor lock-in	Chargebacks >1%, cashout SLA >24h

LiveOps and content pipelines

Ship content and fixes without fear. Use small changes, short deploys, and fast rollbacks.

CI/CD: build once, promote across stages. Keep prod parity.
DB changes: add columns first, switch reads, then drop old fields.
Feature flags: flip on for 1%, then 10%, then 50%, then all.

To add control to rollouts, look at progressive delivery with feature flags. Tie flags to alerts so you can turn off a feature when p95 jumps.

Regions, regulations, and payments

“Just add regions” can break rules and wallets. Data must stay where laws say. Audit logs must be clean. If you touch real money, review the UK’s Remote Technical Standards for a sense of control needs in regulated ops. Build with these in mind from day one.

Payments add more limits. Cards need PCI scope. Split data, tokenise, and keep only what you must. Read the PCI DSS overview and plan how you will pass audits with the least surface area.

Player trust lives on payout speed and KYC ease. Measure cashout time end to end, not just API time. Keep a clear path for retries and re-checks. Independent operator reviews can help you benchmark the user view on this. For example, Swedish mobile players track payout and ID steps closely; a resource like mobilcasino Sverige surfaces how real users feel about mobile payout speed and KYC flow. Use this as a mirror for your own UX and infra gaps.

Cost model and FinOps reality check

Price the shape of traffic, not just peak CCU. A simple start:

Egress cost = (egress per CCU per min) × (avg CCU) × (mins) × (regions).
DB IOPS = writes/user action × actions/sec × replication factor.
Cache RAM = hot keys × avg value size × headroom (1.3–1.6x).

Set budgets and alerts per unit (per 1k CCU). Review unit costs weekly. See FinOps best practices for clean ways to track, tag, and forecast. Rightsize nodes, use spot for workers, and cache at the edge to cut egress.

Observability first, not last

Write SLOs before launch. Choose user-facing goals. For example: “p95 match start under 8s” and “auth error rate under 0.5% per 10 minutes.” For a guide, use the SRE workbook guidance on SLOs.

Measure RED (rate, errors, duration) for every service. Track USE (utilization, saturation, errors) for infra. Use Prometheus metrics and alerting and chart them with Grafana for dashboards. Add synthetic checks from three continents. Tie alerts to runbooks with clear, short steps.

Security baseline and anti-cheat at the edge

Lock down basics: WAF, rate limits, token binding, and strict secrets hygiene. Do not trust the client. Move checks to the server and the edge.

Use a Zero Trust reference to shape auth between services. Review OWASP API threats and fix the top risks in your API gateway. Add device signals with care. Keep anti-cheat checks outside the game loop when you can.

Disaster avoidance: multi-region without split-brain

Be clear: most real-time games do not need active/active on the DB. The blast radius is too big. Use active/passive, or event logs with read-only replicas, and fail over in a runbook.

Strong global writes are rare and costly. If you must, study multi-region database trade-offs like read latency and commit waits. For many games, an event log with per-region writes and global async fan-out is safer.

Practice failure. Kill pods. Break links. Drop a zone. Measure impact and time to heal. Tools like chaos engineering drills help you learn without real pain.

Vendor selection scorecard

Score vendors on what matters to your game and team. Keep it short and numeric.

Latency footprint (ms to your top 5 cities).
Support SLAs (hours to human, depth of help).
Egress pricing (per GB, cross-region fees).
Lock-in risk (APIs, data export ease).
Limits (max topics, connections, shards).

Launch day runbook and 24-hour rollback plan

Write it now. Keep it to one page. Make it easy to follow when stress is high.

Preconditions: load tests pass, flags set, runbooks linked, dashboards ready.
Traffic ramp: 10% → 25% → 50% → 100% with hold time at each step.
Rollback rules: name the exact metrics and thresholds to trigger rollback.
Comms tree: who calls whom, and on what channel.
Watch list: auth errors, match start p95/p99, queue lag, cache evictions, egress.

Migration path for the monolith you already have

Do not rewrite. Strangle. Move load to the edge first: static, images, and chat presence. Put session state in Redis. Add a thin matchmaking service. Keep the rest as is until stable.

Think hard before adding a mesh. It adds power and cost. If you want a survey of options, see service mesh considerations. Often, start with simple mTLS and an API gateway.

90-day plan: week 1–3 add metrics and SLOs; week 4–6 move session state; week 7–9 add queues for writes; week 10–12 split one hot endpoint; week 13 test failover; week 14–12 clean configs and docs.

Case notes and further reading

Riot has shared deep write-ups on netcode and scale. Start here: Riot Games tech blog.
Unity’s online services doc has notes on scale and patterns: Unity multiplayer scale notes.
If you build with Unreal, read their net guide: Unreal networking guide.

Monday morning checklist

Define p95/p99 SLOs for lobby and match start.
Measure current egress per CCU by region.
Add circuit breakers to the top 2 chattiest services.
Raise Redis memory until eviction rate < 0.5% at peak.
Add synthetic canaries from 3 continents.
Precompute a rollback plan for your last infra change.
Set WAF bot rules and API rate limits on auth endpoints.
Run a 60-minute chaos test on message broker failover.
Price a 20% traffic spike: instances, egress, storage IOPS.
Book a game-day rehearsal before your next content drop.

Short FAQ

Is serverless good for real-time play?
It can work for short, stateless calls like auth, lobby read, or leaderboards. It is hard for long stateful loops. Watch cold starts and concurrency caps. Keep the game loop on long-lived pods or VMs.

UDP or TCP/TLS?
Use UDP for real-time state when you can drop some packets. Use TCP/TLS for chat, store, and anything that must not lose data. Many games use both. Keep packets small.

One database for all tenants?
Multi-tenant can save money, but a noisy neighbor can hurt you. If you go multi-tenant, add quotas and clear limits. For a clean path, use one DB per big region or big title.

When should we shard?
Shard when you have a clear key that spreads load (user id, match id). If your top 1% of keys take 20%+ of traffic, a shard can help. Test the split first with reads.

Author

Editorial Team — This guide was compiled by engineers who have shipped online titles with six-figure peak CCU. We test numbers before we print them.