Capacity Planning for Microservices

Capacity planning is where observability meets engineering economics. Undersized services fail under stress, while oversized services hide inefficiencies and inflate cost. The goal is not maximum headroom everywhere. The goal is to know how your system behaves under load and size it deliberately.

These are starting point defaults. Always run performance testing to validate and tune for your specific workload. Every service has different memory, CPU, and connection profiles.

Quick Definitions

  • Request: the minimum CPU or memory Kubernetes reserves for a pod
  • Limit: the maximum CPU or memory a pod is allowed to consume
  • P99 latency: the response time that 99% of requests are faster than
  • Connection pool: a managed set of reusable database or cache connections shared by application threads
  • Headroom: spare capacity available to absorb spikes without immediate failure

Why Get Capacity Right

Under-provisioned services cause cascading failures, one slow pod brings down the entire request path. Over-provisioned services waste money and mask problems (a service shouldn’t need 8GB RAM for CRUD operations).

The goal is right-sized -> enough headroom to handle traffic spikes without burning cloud budget.

Key takeaway: Capacity planning is a balancing exercise between resilience, performance, and cost.

Capacity Planning Workflow

1. BASELINE → Run load test at expected peak traffic
2. MEASURE → Record CPU%, memory%, connection pool usage, GC pause time
3. TUNE → Adjust limits until p99 latency is stable and CPU < 60%
4. VALIDATE → Re-run load test at 2× peak (test the headroom)
5. DEPLOY → Set requests/limits based on validated numbers
6. MONITOR → Watch for drift over the first two weeks post-deploy

Key takeaway: sizing should come from measurement and retesting, not from copying values blindly between services.

Kubernetes (Infra)

Kubernetes settings determine whether pods can start reliably, scale safely, and survive real traffic instead of only synthetic happy paths.

ConfigProductionWhy
Minimum pods3Handles loss of 1 pod without impact; enables zero-downtime rolling deploys
Maximum pods10 (tune per requirements)Sets upper bound on cost and connection pool usage
resources.requests.memory2 GBScheduler uses this for pod placement; set to typical working set
resources.requests.cpu1 CPUScheduler uses this; set to steady-state usage, not peak
resources.limits.memory3 GBOOM kill threshold; 1.5× requests gives headroom for spikes
resources.limits.cpu2 CPUCPU is throttled (not killed) at this limit; allow 2× for burst
Autoscale CPU threshold70%At 70% CPU we add pods. Don’t wait until 80%, scaling takes 1–2 min
Autoscale Memory threshold90%Memory autoscale is a safety net; ideally tune requests to avoid hitting it
JVM -Xms1.5 GBPre-allocate heap to avoid GC pressure during warm-up
JVM -Xmx1.5 GBSame as Xms to prevent heap resizing pauses in production

Why requests ≠ limits?

requests = what the scheduler reserves. limits = the hard cap.

Setting them equal eliminates bursting – under sudden load, the pod is throttled at its request value. Setting limits 1.5–2× requests allows burst absorption.

Tip: JVM Memory Alternative

If -Xms/-Xmx are not set, use container-aware flags instead:

-XX:+UseContainerSupport -XX:MaxRAMPercentage=70.0

This sets heap to 70% of the container memory limit, automatically respecting Kubernetes limits.

Warning

Keep Xms = Xmx in production. Letting JVM start at a small heap and grow to Xmx causes GC pressure during ramp-up(your first traffic spike after a deploy will trigger Full GCs).

Common Pitfall

Teams often copy memory and CPU limits from another service because the tech stack is similar. That usually ignores actual differences in payload size, concurrency, caching behaviour, and background processing.

Key takeaway: Kubernetes defaults are starting points only. Validate them under realistic concurrency and traffic mix.

Postgres

Database capacity is often the real ceiling for a service, even when pods still look healthy.

ConfigProductionNotes
CPU4–8 cores4 cores handles most OLTP workloads; 8 cores for high write throughput
Memory8 GB – 16 GBMore RAM = larger shared_buffers = fewer disk reads
SSD300 GBIncludes data, WAL, and temporary sort files
max_connections300 (based on RAM)Each connection uses ~5–10 MB RAM. Don’t exceed 500 without pgBouncer

Why Not Set max_connections Higher?

Each Postgres connection holds memory for its query context, sort buffers, and transaction state. Setting max_connections = 1000 Without a connection pooler means 1000 × 10MB = 10GB RAM just for connections, leaving nothing for actual query execution.

Use HikariCP (application-level pooling) and set max_connections conservatively.

Use pgtune to tweak postgres configs like shared_buffer, effective_cache etc based on postgres capacity, do not go by default configs.

Real-World Scenario

A team may respond to too many connections by simply increasing max_connections. That can postpone the symptom briefly while making the database slower overall because RAM is now tied up by idle and competing sessions.

Key takeaway: More connections are not automatically more capacity. Database throughput often improves when connection counts are controlled.

HikariCP Connection Pool

Application pool sizing must reflect both database limits and the maximum number of pods that can exist at the same time.

PropertyValueNotes
connection-timeout20,000 msTime a thread waits for a connection. Requests fail after this with ConnectionTimeout
minimum-idle5–10Connections kept warm. Low = slower first requests after idle period
maximum-pool-size10–20See formula below
idle-timeout10,000 msConnections idle longer than this are retired from the pool
max-lifetime1,800,000 msMax connection age. Retire before Postgres tcp_keepalives_idle to avoid stale connections
keepaliveTime30,000 msSends lightweight queries to keep connections alive through load balancers

Pool Size Formula

maximum-pool-size = floor(postgres_max_connections / max_pods)
Example:
postgres max_connections = 600
max_pods = 40
maximum-pool-size = floor(600 / 40) = 15

Warning

Always leave 10–20 connections reserved for admin operations (pg_dump, direct psql access, monitoring queries). Don’t assign 100% of max_connections to the pool.

Adjusted formula:

maximum-pool-size = floor((postgres_max_connections - 20) / max_pods)

Common Pitfall

If each pod is given an oversized pool “just in case,” autoscaling can multiply that decision into hundreds of possible connections that the database cannot actually support.

Key takeaway: pool sizing is a system-wide calculation, not a per-service guess.

Redis Connection Pool

Redis is fast, but poor timeout and pool settings can still turn it into a bottleneck under concurrency.

PropertyValueNotes
connectTimeout5,000 msTime to establish TCP connection to Redis
soTimeout1,000 msTime to wait for a response after sending a command
redis-connection-max-total15For 40 pods max → 600 total connections
redis-connection-max-idle15Max idle connections to keep in pool
redis-connection-min-idle10Connections kept warm avoids connection ramp-up under load
maxWaitMillis1,000 msTime to wait for pool to return a connection. Should equal soTimeout
min-evictable-idle-time-millis60 sec – 30 minHigher value = lower connection ramp-up. Match to your traffic pattern

Key takeaway: Redis pool settings should absorb burst traffic without creating more client pressure than the cache can handle.

Signs Your Capacity Is Wrong

These symptoms are often easier to observe than the original sizing mistake, so they are useful during tuning and early production monitoring.

SymptomLikely CauseFix
High GC pause time (>100ms)Heap too small or Xms < XmxIncrease Xmx or set Xms = Xmx
ConnectionTimeout errorsPool exhaustedIncrease maximum-pool-size or reduce max_connections from other services
Pod OOMKilledMemory limit too lowIncrease resources.limits.memory or fix memory leak
HPA oscillating (scaling up and down rapidly)Autoscale threshold too sensitive or cooldown too shortRaise CPU threshold to 70%, set stabilization window to 5 min
Postgres too many connectionsConnection pool misconfiguredVerify formula; add pgBouncer if > 300 connections needed
Redis ERR max number of clients reachedRedis maxclients too low or pool too largeCheck redis-connection-max-total × max_pods vs Redis maxclients config

Warning

After any capacity change, monitor connection pool metrics (idle, pending, active) for at least one full traffic cycle before considering the sizing stable.

Final Takeaways

  • Start with defaults, but treat them as hypotheses to validate with load testing.
  • Size Kubernetes pods, databases, and connection pools together rather than independently.
  • Leave headroom for spikes, admin access, and autoscaling side effects.
  • Watch for second-order symptoms such as GC pauses, queueing, and connection exhaustion.
  • Recheck capacity after major feature, traffic, or workload shape changes.
Balaji G
Written by
Balaji G

Leave a Reply

Discover more from 2G

Subscribe now to keep reading and get access to the full archive.

Continue reading