Go Runtime Tuning: GOMAXPROCS, GC, Scheduler
GOMAXPROCS sets how many OS threads execute Go code concurrently (default = cpu count). GOGC controls when GC runs (default 100 = collect when heap doubles). GOMEMLIMIT (Go 1.19+) caps RSS to keep within container limits. The Go scheduler is M:N; rarely worth thinking about, but knowing the levers helps when it matters.
What it is
The Go runtime hides almost everything: scheduling, GC, memory layout. For most code, that is the right choice. When tuning is required (containers, latency-sensitive services, big batch jobs), the runtime exposes a few knobs.
GOMAXPROCS
How many OS threads can run Go code concurrently. Default: the number of CPUs Go sees at startup.
The catch: in containers with CFS CPU limits, Go sees the host's CPU count, not the container limit. Result: GOMAXPROCS is set to (say) 32, the runtime starts 32 P's, the kernel throttles to a 2-core quota, and the runtime over-schedules into the throttle. Performance suffers.
The standard fix: import _ "go.uber.org/automaxprocs" in the main package. It reads the CFS quota and sets GOMAXPROCS to match. One line, no configuration. This is now considered a basic requirement for any Go service in a container.
GOGC
Controls when GC runs. The number is "trigger GC when the heap has grown by N percent since the last GC". Default 100 means "collect when the heap doubles".
Higher values: less GC, more memory. Useful for batch jobs and CLI tools where peak memory does not matter and throughput does.
Lower values: more GC, less memory. Useful when memory is tight (small containers).
Setting GOGC=off disables GC entirely; only useful for debugging or for programs that exit quickly.
GOMEMLIMIT
Added in Go 1.19. A soft cap on total runtime memory. The GC becomes increasingly aggressive as the heap approaches the limit, trading CPU for memory.
Without GOMEMLIMIT, a Go program in a container can OOM even with GOGC tuned low: a sudden spike pushes the heap past the container limit and the OOM killer takes over with no grace. With GOMEMLIMIT set ~10% below the container limit, the runtime has a chance to run extra GCs and stay under.
For containerised services, set GOMEMLIMIT. It is the single biggest win for OOM stability.
The scheduler
M:N. Many goroutines (G) onto a few logical processors (P), each bound to an OS thread (M). When a goroutine blocks on a syscall, the OS thread can detach from its P and let another P pick up other goroutines. When a goroutine blocks on a channel, it parks in user space without an OS context switch.
The practical result: goroutines are extremely cheap to create and switch. Programs with 100K+ goroutines are normal. The scheduler does work-stealing across P's, so unbalanced load self-balances.
The scheduler almost never demands attention. Two cases when it might:
- A long-running goroutine that does not yield (no channels, no syscalls, no function calls). Pre-Go-1.14, this could starve other goroutines on the same P. Modern Go has preemptive scheduling, so this is rare; tight CPU loops still benefit from explicit
runtime.Gosched()if they blockingly hold a P. - Diagnosing why goroutines are not making progress.
GODEBUG=schedtrace=1000prints scheduler stats every second. Pair with pprof for a fuller picture.
The settings that actually matter
For most production services, three settings cover 90% of the tuning that's ever needed. Use GOMAXPROCS via automaxprocs in containers and the default on bare metal. Set GOMEMLIMIT to about 90% of the container memory limit so the runtime gets a chance to GC aggressively before the OOM killer fires. Leave GOGC at its default unless a measurement shows a specific reason to change it.
Beyond that, reach for pprof, runtime/trace, and benchmarks. Only tune what the data says is the bottleneck.
Primitives by language
- GOMAXPROCS env / runtime.GOMAXPROCS(n)
- GOGC env / debug.SetGCPercent(n)
- GOMEMLIMIT env / debug.SetMemoryLimit(b)
- runtime/trace, runtime/pprof
- GODEBUG=schedtrace=N,scheddetail=1
Implementation
These are typically set via env vars at process start. Setting them programmatically is fine but they take effect immediately and apply globally. Don't tune mid-request based on load.
1 package main
2
3 import (
4 "runtime"
5 "runtime/debug"
6 )
7
8 func init() {
9 // Cap at 4 OS threads for Go execution
10 runtime.GOMAXPROCS(4)
11
12 // Trigger GC when heap is 50% larger than after last GC
13 // (lower than default 100 = more frequent GC, less memory)
14 debug.SetGCPercent(50)
15
16 // Soft cap total memory at 2 GB
17 debug.SetMemoryLimit(2 << 30)
18 }In a container with --cpus=2, the host has 32 cores but Go sees 32 by default. The result: GOMAXPROCS=32, the runtime spins up 32 P's, but the kernel throttles execution to 2 cores. uber-go/automaxprocs reads CFS quota and sets GOMAXPROCS correctly. Standard practice in any Go service that runs in containers.
1 package main
2
3 // Add to imports of your main package, that's the entire setup
4 import _ "go.uber.org/automaxprocs"
5
6 // Now GOMAXPROCS reflects your CFS CPU limit, not the host
7
8 // To verify:
9 // import "runtime"
10 // log.Printf("GOMAXPROCS=%d", runtime.GOMAXPROCS(0))In a container with a 1 GB memory limit, the OOM killer takes the process down with no grace. GOMEMLIMIT tells the Go runtime "do whatever it takes to stay under this". The GC becomes more aggressive as the heap approaches the limit. Worst case, GC dominates CPU; preferable to a hard kill.
1 // In Dockerfile or Kubernetes manifest:
2 //
3 // ENV GOMEMLIMIT=900MiB
4 //
5 // Set ~10% below the container memory limit so the runtime has headroom.
6
7 // Or programmatically:
8 package main
9
10 import "runtime/debug"
11
12 func init() {
13 debug.SetMemoryLimit(900 * 1024 * 1024) // 900 MB
14 }schedtrace prints a snapshot of the scheduler every N ms. scheddetail=1 adds per-P detail. Useful when goroutines are not making progress and the scheduler is suspect. For most cases, pprof is more useful; reach for this when pprof shows everything blocked.
1 // Run with: GODEBUG=schedtrace=1000,scheddetail=1 ./your-app
2 //
3 // Sample output:
4 // SCHED 0ms: gomaxprocs=8 idleprocs=4 threads=12 spinningthreads=0 idlethreads=2 runqueue=0
5 // P0: status=1 schedtick=0 syscalltick=0 m=4 runqsize=0 gfreecnt=0
6 // P1: status=0 schedtick=0 syscalltick=0 m=-1 runqsize=0 gfreecnt=0
7 // ...
8 //
9 // gomaxprocs: P count
10 // idleprocs: P's not running anything
11 // threads: OS threads
12 // spinningthreads: threads spinning looking for work
13 // runqueue: global runnable goroutines
14
15 // Pair with: pprof for goroutine profile
16 // import _ "net/http/pprof"
17 // go http.ListenAndServe("localhost:6060", nil)
18 // go tool pprof http://localhost:6060/debug/pprof/goroutineKey points
- •GOMAXPROCS defaults to cpu count. In containers without CFS-quota awareness, that may be the host's cores, not the container limit. Use uber-go/automaxprocs.
- •GOGC=100 means GC when heap doubles since last GC. Higher values trade memory for fewer collections.
- •GOMEMLIMIT is a soft cap on total memory. Without it, Go can OOM in containers even with GOGC tuned low.
- •The scheduler is M:N: many goroutines on a few OS threads. Goroutines park and resume cheaply (microseconds).
- •GODEBUG=schedtrace=1000 prints scheduler stats every 1s. Useful for diagnosing scheduling stalls.
Follow-up questions
▸Should GOMAXPROCS be tuned manually?
▸What is the right GOGC?
▸How does the M:N scheduler differ from OS threads?
▸What is GOTRACEBACK and when is it needed?
Gotchas
- !GOMAXPROCS default is host CPU count, NOT container CPU limit; use automaxprocs
- !Without GOMEMLIMIT, Go can OOM in tight containers even with GOGC tuned
- !Running pprof with too many concurrent profiles slows the program; profile sparingly in prod
- !GODEBUG=schedtrace adds overhead; not for prod, only for diagnostics
- !runtime.NumGoroutine() shows total; doesn't reveal which are blocked vs runnable