Go ConcurrencyTopic 13 of 13

LanguageGoAdvancedSometimes

Go Runtime Tuning: GOMAXPROCS, GC, Scheduler

In one line

GOMAXPROCS sets how many OS threads execute Go code concurrently (default = cpu count). GOGC controls when GC runs (default 100 = collect when heap doubles). GOMEMLIMIT (Go 1.19+) caps RSS to keep within container limits. The Go scheduler is M:N; rarely worth thinking about, but knowing the levers helps when it matters.

What it is

The Go runtime hides almost everything: scheduling, GC, memory layout. For most code, that is the right choice. When tuning is required (containers, latency-sensitive services, big batch jobs), the runtime exposes a few knobs.

GOMAXPROCS

How many OS threads can run Go code concurrently. Default: the number of CPUs Go sees at startup.

The catch: in containers with CFS CPU limits, Go sees the host's CPU count, not the container limit. Result: GOMAXPROCS is set to (say) 32, the runtime starts 32 P's, the kernel throttles to a 2-core quota, and the runtime over-schedules into the throttle. Performance suffers.

The standard fix: import _ "go.uber.org/automaxprocs" in the main package. It reads the CFS quota and sets GOMAXPROCS to match. One line, no configuration. This is now considered a basic requirement for any Go service in a container.

GOGC

Controls when GC runs. The number is "trigger GC when the heap has grown by N percent since the last GC". Default 100 means "collect when the heap doubles".

Higher values: less GC, more memory. Useful for batch jobs and CLI tools where peak memory does not matter and throughput does.

Lower values: more GC, less memory. Useful when memory is tight (small containers).

Setting GOGC=off disables GC entirely; only useful for debugging or for programs that exit quickly.

GOMEMLIMIT

Added in Go 1.19. A soft cap on total runtime memory. The GC becomes increasingly aggressive as the heap approaches the limit, trading CPU for memory.

Without GOMEMLIMIT, a Go program in a container can OOM even with GOGC tuned low: a sudden spike pushes the heap past the container limit and the OOM killer takes over with no grace. With GOMEMLIMIT set ~10% below the container limit, the runtime has a chance to run extra GCs and stay under.

For containerised services, set GOMEMLIMIT. It is the single biggest win for OOM stability.

The scheduler

M:N. Many goroutines (G) onto a few logical processors (P), each bound to an OS thread (M). When a goroutine blocks on a syscall, the OS thread can detach from its P and let another P pick up other goroutines. When a goroutine blocks on a channel, it parks in user space without an OS context switch.

The practical result: goroutines are extremely cheap to create and switch. Programs with 100K+ goroutines are normal. The scheduler does work-stealing across P's, so unbalanced load self-balances.

The scheduler almost never demands attention. Two cases when it might:

A long-running goroutine that does not yield (no channels, no syscalls, no function calls). Pre-Go-1.14, this could starve other goroutines on the same P. Modern Go has preemptive scheduling, so this is rare; tight CPU loops still benefit from explicit runtime.Gosched() if they blockingly hold a P.
Diagnosing why goroutines are not making progress. GODEBUG=schedtrace=1000 prints scheduler stats every second. Pair with pprof for a fuller picture.

The settings that actually matter

For most production services, three settings cover 90% of the tuning that's ever needed. Use GOMAXPROCS via automaxprocs in containers and the default on bare metal. Set GOMEMLIMIT to about 90% of the container memory limit so the runtime gets a chance to GC aggressively before the OOM killer fires. Leave GOGC at its default unless a measurement shows a specific reason to change it.

Beyond that, reach for pprof, runtime/trace, and benchmarks. Only tune what the data says is the bottleneck.

Primitives by language

GOMAXPROCS env / runtime.GOMAXPROCS(n)
GOGC env / debug.SetGCPercent(n)
GOMEMLIMIT env / debug.SetMemoryLimit(b)
runtime/trace, runtime/pprof
GODEBUG=schedtrace=N,scheddetail=1

Implementation

Setting limits at runtime

These are typically set via env vars at process start. Setting them programmatically is fine but they take effect immediately and apply globally. Don't tune mid-request based on load.

 1  package main
 2  
 3  import (
 4      "runtime"
 5      "runtime/debug"
 6  )
 7  
 8  func init() {
 9      // Cap at 4 OS threads for Go execution
10      runtime.GOMAXPROCS(4)
11  
12      // Trigger GC when heap is 50% larger than after last GC
13      // (lower than default 100 = more frequent GC, less memory)
14      debug.SetGCPercent(50)
15  
16      // Soft cap total memory at 2 GB
17      debug.SetMemoryLimit(2 << 30)
18  }

Container-aware GOMAXPROCS

In a container with --cpus=2, the host has 32 cores but Go sees 32 by default. The result: GOMAXPROCS=32, the runtime spins up 32 P's, but the kernel throttles execution to 2 cores. uber-go/automaxprocs reads CFS quota and sets GOMAXPROCS correctly. Standard practice in any Go service that runs in containers.

 1  package main
 2  
 3  // Add to imports of your main package, that's the entire setup
 4  import _ "go.uber.org/automaxprocs"
 5  
 6  // Now GOMAXPROCS reflects your CFS CPU limit, not the host
 7  
 8  // To verify:
 9  // import "runtime"
10  // log.Printf("GOMAXPROCS=%d", runtime.GOMAXPROCS(0))

GOMEMLIMIT for container OOM safety

In a container with a 1 GB memory limit, the OOM killer takes the process down with no grace. GOMEMLIMIT tells the Go runtime "do whatever it takes to stay under this". The GC becomes more aggressive as the heap approaches the limit. Worst case, GC dominates CPU; preferable to a hard kill.

 1  // In Dockerfile or Kubernetes manifest:
 2  //
 3  //   ENV GOMEMLIMIT=900MiB
 4  //
 5  // Set ~10% below the container memory limit so the runtime has headroom.
 6  
 7  // Or programmatically:
 8  package main
 9  
10  import "runtime/debug"
11  
12  func init() {
13      debug.SetMemoryLimit(900 * 1024 * 1024)    // 900 MB
14  }

Diagnosing the scheduler

schedtrace prints a snapshot of the scheduler every N ms. scheddetail=1 adds per-P detail. Useful when goroutines are not making progress and the scheduler is suspect. For most cases, pprof is more useful; reach for this when pprof shows everything blocked.

 1  // Run with: GODEBUG=schedtrace=1000,scheddetail=1 ./your-app
 2  //
 3  // Sample output:
 4  // SCHED 0ms: gomaxprocs=8 idleprocs=4 threads=12 spinningthreads=0 idlethreads=2 runqueue=0
 5  //   P0: status=1 schedtick=0 syscalltick=0 m=4 runqsize=0 gfreecnt=0
 6  //   P1: status=0 schedtick=0 syscalltick=0 m=-1 runqsize=0 gfreecnt=0
 7  //   ...
 8  //
 9  // gomaxprocs:    P count
10  // idleprocs:     P's not running anything
11  // threads:       OS threads
12  // spinningthreads: threads spinning looking for work
13  // runqueue:      global runnable goroutines
14  
15  // Pair with: pprof for goroutine profile
16  // import _ "net/http/pprof"
17  // go http.ListenAndServe("localhost:6060", nil)
18  // go tool pprof http://localhost:6060/debug/pprof/goroutine

Key points

•GOMAXPROCS defaults to cpu count. In containers without CFS-quota awareness, that may be the host's cores, not the container limit. Use uber-go/automaxprocs.
•GOGC=100 means GC when heap doubles since last GC. Higher values trade memory for fewer collections.
•GOMEMLIMIT is a soft cap on total memory. Without it, Go can OOM in containers even with GOGC tuned low.
•The scheduler is M:N: many goroutines on a few OS threads. Goroutines park and resume cheaply (microseconds).
•GODEBUG=schedtrace=1000 prints scheduler stats every 1s. Useful for diagnosing scheduling stalls.

Follow-up questions

▸Should GOMAXPROCS be tuned manually?

Almost never on bare metal or VMs. The default (cpu count) is right. In containers with CFS quota, use uber-go/automaxprocs to read the limit correctly. The only time to override manually is when cores are deliberately reserved for other processes (e.g., a sidecar).

▸What is the right GOGC?

Default 100 is fine for most services. Higher values (200, 500) reduce GC cost at the cost of higher memory; useful for batch jobs that prioritise throughput. Lower values (50, 25) reduce memory at the cost of more CPU on GC; useful when memory is tight. With GOMEMLIMIT set, GOGC can usually stay at default and the limit drives GC pressure.

▸How does the M:N scheduler differ from OS threads?

Goroutines (G) are scheduled onto P's (logical processors); P's are bound to M's (OS threads). Goroutines that block on syscalls or channels park; the M can run another goroutine without OS context switch. Park/unpark is microseconds vs the OS context switch's microseconds-to-milliseconds. Result: Go programs can have hundreds of thousands of goroutines without scheduling overhead becoming dominant.

▸What is GOTRACEBACK and when is it needed?

Controls how much detail is in panic stack traces. Default 'single' shows the panicking goroutine; 'all' shows every goroutine. Set to 'all' in production for richer crash dumps. Doesn't affect performance, only crash output.

Gotchas

!GOMAXPROCS default is host CPU count, NOT container CPU limit; use automaxprocs
!Without GOMEMLIMIT, Go can OOM in tight containers even with GOGC tuned
!Running pprof with too many concurrent profiles slows the program; profile sparingly in prod
!GODEBUG=schedtrace adds overhead; not for prod, only for diagnostics
!runtime.NumGoroutine() shows total; doesn't reveal which are blocked vs runnable

Go Runtime Tuning: GOMAXPROCS, GC, Scheduler

In one line

What it is

GOMAXPROCS

How many OS threads can run Go code concurrently. Default: the number of CPUs Go sees at startup.

GOGC

Controls when GC runs. The number is "trigger GC when the heap has grown by N percent since the last GC". Default 100 means "collect when the heap doubles".

Higher values: less GC, more memory. Useful for batch jobs and CLI tools where peak memory does not matter and throughput does.

Lower values: more GC, less memory. Useful when memory is tight (small containers).

Setting GOGC=off disables GC entirely; only useful for debugging or for programs that exit quickly.

GOMEMLIMIT

Added in Go 1.19. A soft cap on total runtime memory. The GC becomes increasingly aggressive as the heap approaches the limit, trading CPU for memory.

For containerised services, set GOMEMLIMIT. It is the single biggest win for OOM stability.

The scheduler

The practical result: goroutines are extremely cheap to create and switch. Programs with 100K+ goroutines are normal. The scheduler does work-stealing across P's, so unbalanced load self-balances.

The scheduler almost never demands attention. Two cases when it might:

A long-running goroutine that does not yield (no channels, no syscalls, no function calls). Pre-Go-1.14, this could starve other goroutines on the same P. Modern Go has preemptive scheduling, so this is rare; tight CPU loops still benefit from explicit runtime.Gosched() if they blockingly hold a P.
Diagnosing why goroutines are not making progress. GODEBUG=schedtrace=1000 prints scheduler stats every second. Pair with pprof for a fuller picture.

The settings that actually matter

Beyond that, reach for pprof, runtime/trace, and benchmarks. Only tune what the data says is the bottleneck.

Primitives by language

GOMAXPROCS env / runtime.GOMAXPROCS(n)
GOGC env / debug.SetGCPercent(n)
GOMEMLIMIT env / debug.SetMemoryLimit(b)
runtime/trace, runtime/pprof
GODEBUG=schedtrace=N,scheddetail=1

Implementation

Setting limits at runtime

These are typically set via env vars at process start. Setting them programmatically is fine but they take effect immediately and apply globally. Don't tune mid-request based on load.

 1  package main
 2  
 3  import (
 4      "runtime"
 5      "runtime/debug"
 6  )
 7  
 8  func init() {
 9      // Cap at 4 OS threads for Go execution
10      runtime.GOMAXPROCS(4)
11  
12      // Trigger GC when heap is 50% larger than after last GC
13      // (lower than default 100 = more frequent GC, less memory)
14      debug.SetGCPercent(50)
15  
16      // Soft cap total memory at 2 GB
17      debug.SetMemoryLimit(2 << 30)
18  }

Container-aware GOMAXPROCS

 1  package main
 2  
 3  // Add to imports of your main package, that's the entire setup
 4  import _ "go.uber.org/automaxprocs"
 5  
 6  // Now GOMAXPROCS reflects your CFS CPU limit, not the host
 7  
 8  // To verify:
 9  // import "runtime"
10  // log.Printf("GOMAXPROCS=%d", runtime.GOMAXPROCS(0))

GOMEMLIMIT for container OOM safety

 1  // In Dockerfile or Kubernetes manifest:
 2  //
 3  //   ENV GOMEMLIMIT=900MiB
 4  //
 5  // Set ~10% below the container memory limit so the runtime has headroom.
 6  
 7  // Or programmatically:
 8  package main
 9  
10  import "runtime/debug"
11  
12  func init() {
13      debug.SetMemoryLimit(900 * 1024 * 1024)    // 900 MB
14  }

Diagnosing the scheduler

 1  // Run with: GODEBUG=schedtrace=1000,scheddetail=1 ./your-app
 2  //
 3  // Sample output:
 4  // SCHED 0ms: gomaxprocs=8 idleprocs=4 threads=12 spinningthreads=0 idlethreads=2 runqueue=0
 5  //   P0: status=1 schedtick=0 syscalltick=0 m=4 runqsize=0 gfreecnt=0
 6  //   P1: status=0 schedtick=0 syscalltick=0 m=-1 runqsize=0 gfreecnt=0
 7  //   ...
 8  //
 9  // gomaxprocs:    P count
10  // idleprocs:     P's not running anything
11  // threads:       OS threads
12  // spinningthreads: threads spinning looking for work
13  // runqueue:      global runnable goroutines
14  
15  // Pair with: pprof for goroutine profile
16  // import _ "net/http/pprof"
17  // go http.ListenAndServe("localhost:6060", nil)
18  // go tool pprof http://localhost:6060/debug/pprof/goroutine

Key points

•GOMAXPROCS defaults to cpu count. In containers without CFS-quota awareness, that may be the host's cores, not the container limit. Use uber-go/automaxprocs.
•GOGC=100 means GC when heap doubles since last GC. Higher values trade memory for fewer collections.
•GOMEMLIMIT is a soft cap on total memory. Without it, Go can OOM in containers even with GOGC tuned low.
•The scheduler is M:N: many goroutines on a few OS threads. Goroutines park and resume cheaply (microseconds).
•GODEBUG=schedtrace=1000 prints scheduler stats every 1s. Useful for diagnosing scheduling stalls.

Follow-up questions

▸Should GOMAXPROCS be tuned manually?

▸What is the right GOGC?

▸How does the M:N scheduler differ from OS threads?

▸What is GOTRACEBACK and when is it needed?

Gotchas

!GOMAXPROCS default is host CPU count, NOT container CPU limit; use automaxprocs
!Without GOMEMLIMIT, Go can OOM in tight containers even with GOGC tuned
!Running pprof with too many concurrent profiles slows the program; profile sparingly in prod
!GODEBUG=schedtrace adds overhead; not for prod, only for diagnostics
!runtime.NumGoroutine() shows total; doesn't reveal which are blocked vs runnable

Go Runtime Tuning: GOMAXPROCS, GC, Scheduler

What it is

GOMAXPROCS

GOGC

GOMEMLIMIT

The scheduler

The settings that actually matter

Primitives by language

Implementation

Key points

Follow-up questions

Gotchas

Related reading

Go Runtime Tuning: GOMAXPROCS, GC, Scheduler

What it is

GOMAXPROCS

GOGC

GOMEMLIMIT

The scheduler

The settings that actually matter

Primitives by language

Implementation

Key points

Follow-up questions

Gotchas

Related reading