Capacity planning is where observability meets engineering economics. Undersized services fail under stress, while oversized services hide inefficiencies and inflate cost. The goal is not maximum headroom everywhere. The goal is to know how your system behaves under load and size it deliberately.
These are starting point defaults. Always run performance testing to validate and tune for your specific workload. Every service has different memory, CPU, and connection profiles.
Quick Definitions
- Request: the minimum CPU or memory Kubernetes reserves for a pod
- Limit: the maximum CPU or memory a pod is allowed to consume
- P99 latency: the response time that 99% of requests are faster than
- Connection pool: a managed set of reusable database or cache connections shared by application threads
- Headroom: spare capacity available to absorb spikes without immediate failure
Why Get Capacity Right
Under-provisioned services cause cascading failures, one slow pod brings down the entire request path. Over-provisioned services waste money and mask problems (a service shouldn’t need 8GB RAM for CRUD operations).
The goal is right-sized -> enough headroom to handle traffic spikes without burning cloud budget.
Key takeaway: Capacity planning is a balancing exercise between resilience, performance, and cost.
Capacity Planning Workflow
1. BASELINE → Run load test at expected peak traffic2. MEASURE → Record CPU%, memory%, connection pool usage, GC pause time3. TUNE → Adjust limits until p99 latency is stable and CPU < 60%4. VALIDATE → Re-run load test at 2× peak (test the headroom)5. DEPLOY → Set requests/limits based on validated numbers6. MONITOR → Watch for drift over the first two weeks post-deploy
Key takeaway: sizing should come from measurement and retesting, not from copying values blindly between services.
Kubernetes (Infra)
Kubernetes settings determine whether pods can start reliably, scale safely, and survive real traffic instead of only synthetic happy paths.
| Config | Production | Why |
|---|---|---|
| Minimum pods | 3 | Handles loss of 1 pod without impact; enables zero-downtime rolling deploys |
| Maximum pods | 10 (tune per requirements) | Sets upper bound on cost and connection pool usage |
resources.requests.memory | 2 GB | Scheduler uses this for pod placement; set to typical working set |
resources.requests.cpu | 1 CPU | Scheduler uses this; set to steady-state usage, not peak |
resources.limits.memory | 3 GB | OOM kill threshold; 1.5× requests gives headroom for spikes |
resources.limits.cpu | 2 CPU | CPU is throttled (not killed) at this limit; allow 2× for burst |
| Autoscale CPU threshold | 70% | At 70% CPU we add pods. Don’t wait until 80%, scaling takes 1–2 min |
| Autoscale Memory threshold | 90% | Memory autoscale is a safety net; ideally tune requests to avoid hitting it |
JVM -Xms | 1.5 GB | Pre-allocate heap to avoid GC pressure during warm-up |
JVM -Xmx | 1.5 GB | Same as Xms to prevent heap resizing pauses in production |
Why requests ≠ limits?
requests = what the scheduler reserves. limits = the hard cap.
Setting them equal eliminates bursting – under sudden load, the pod is throttled at its request value. Setting limits 1.5–2× requests allows burst absorption.
Tip: JVM Memory Alternative
If
-Xms/-Xmxare not set, use container-aware flags instead:-XX:+UseContainerSupport -XX:MaxRAMPercentage=70.0This sets heap to 70% of the container memory limit, automatically respecting Kubernetes limits.
Warning
Keep
Xms = Xmxin production. Letting JVM start at a small heap and grow to Xmx causes GC pressure during ramp-up(your first traffic spike after a deploy will trigger Full GCs).
Common Pitfall
Teams often copy memory and CPU limits from another service because the tech stack is similar. That usually ignores actual differences in payload size, concurrency, caching behaviour, and background processing.
Key takeaway: Kubernetes defaults are starting points only. Validate them under realistic concurrency and traffic mix.
Postgres
Database capacity is often the real ceiling for a service, even when pods still look healthy.
| Config | Production | Notes |
|---|---|---|
| CPU | 4–8 cores | 4 cores handles most OLTP workloads; 8 cores for high write throughput |
| Memory | 8 GB – 16 GB | More RAM = larger shared_buffers = fewer disk reads |
| SSD | 300 GB | Includes data, WAL, and temporary sort files |
max_connections | 300 (based on RAM) | Each connection uses ~5–10 MB RAM. Don’t exceed 500 without pgBouncer |
Why Not Set max_connections Higher?
Each Postgres connection holds memory for its query context, sort buffers, and transaction state. Setting max_connections = 1000 Without a connection pooler means 1000 × 10MB = 10GB RAM just for connections, leaving nothing for actual query execution.
Use HikariCP (application-level pooling) and set max_connections conservatively.
Use pgtune to tweak postgres configs like shared_buffer, effective_cache etc based on postgres capacity, do not go by default configs.
Real-World Scenario
A team may respond to too many connections by simply increasing max_connections. That can postpone the symptom briefly while making the database slower overall because RAM is now tied up by idle and competing sessions.
Key takeaway: More connections are not automatically more capacity. Database throughput often improves when connection counts are controlled.
HikariCP Connection Pool
Application pool sizing must reflect both database limits and the maximum number of pods that can exist at the same time.
| Property | Value | Notes |
|---|---|---|
connection-timeout | 20,000 ms | Time a thread waits for a connection. Requests fail after this with ConnectionTimeout |
minimum-idle | 5–10 | Connections kept warm. Low = slower first requests after idle period |
maximum-pool-size | 10–20 | See formula below |
idle-timeout | 10,000 ms | Connections idle longer than this are retired from the pool |
max-lifetime | 1,800,000 ms | Max connection age. Retire before Postgres tcp_keepalives_idle to avoid stale connections |
keepaliveTime | 30,000 ms | Sends lightweight queries to keep connections alive through load balancers |
Pool Size Formula
maximum-pool-size = floor(postgres_max_connections / max_pods)Example: postgres max_connections = 600 max_pods = 40 maximum-pool-size = floor(600 / 40) = 15
Warning
Always leave 10–20 connections reserved for admin operations (
pg_dump, direct psql access, monitoring queries). Don’t assign 100% ofmax_connectionsto the pool.Adjusted formula:
maximum-pool-size = floor((postgres_max_connections - 20) / max_pods)
Common Pitfall
If each pod is given an oversized pool “just in case,” autoscaling can multiply that decision into hundreds of possible connections that the database cannot actually support.
Key takeaway: pool sizing is a system-wide calculation, not a per-service guess.
Redis Connection Pool
Redis is fast, but poor timeout and pool settings can still turn it into a bottleneck under concurrency.
| Property | Value | Notes |
|---|---|---|
connectTimeout | 5,000 ms | Time to establish TCP connection to Redis |
soTimeout | 1,000 ms | Time to wait for a response after sending a command |
redis-connection-max-total | 15 | For 40 pods max → 600 total connections |
redis-connection-max-idle | 15 | Max idle connections to keep in pool |
redis-connection-min-idle | 10 | Connections kept warm avoids connection ramp-up under load |
maxWaitMillis | 1,000 ms | Time to wait for pool to return a connection. Should equal soTimeout |
min-evictable-idle-time-millis | 60 sec – 30 min | Higher value = lower connection ramp-up. Match to your traffic pattern |
Key takeaway: Redis pool settings should absorb burst traffic without creating more client pressure than the cache can handle.
Signs Your Capacity Is Wrong
These symptoms are often easier to observe than the original sizing mistake, so they are useful during tuning and early production monitoring.
| Symptom | Likely Cause | Fix |
|---|---|---|
| High GC pause time (>100ms) | Heap too small or Xms < Xmx | Increase Xmx or set Xms = Xmx |
ConnectionTimeout errors | Pool exhausted | Increase maximum-pool-size or reduce max_connections from other services |
| Pod OOMKilled | Memory limit too low | Increase resources.limits.memory or fix memory leak |
| HPA oscillating (scaling up and down rapidly) | Autoscale threshold too sensitive or cooldown too short | Raise CPU threshold to 70%, set stabilization window to 5 min |
Postgres too many connections | Connection pool misconfigured | Verify formula; add pgBouncer if > 300 connections needed |
Redis ERR max number of clients reached | Redis maxclients too low or pool too large | Check redis-connection-max-total × max_pods vs Redis maxclients config |
Warning
After any capacity change, monitor connection pool metrics (idle, pending, active) for at least one full traffic cycle before considering the sizing stable.
Final Takeaways
- Start with defaults, but treat them as hypotheses to validate with load testing.
- Size Kubernetes pods, databases, and connection pools together rather than independently.
- Leave headroom for spikes, admin access, and autoscaling side effects.
- Watch for second-order symptoms such as GC pauses, queueing, and connection exhaustion.
- Recheck capacity after major feature, traffic, or workload shape changes.
Leave a Reply