Memory Management in High-Throughput Go Services

Go’s garbage collector has improved dramatically over the years — sub-millisecond pause times are now the norm. But “the GC is fast” is not a license to ignore allocation patterns. In high-throughput services processing millions of events per day, GC pressure compounds: more allocations mean more GC cycles, more cycles mean more CPU stolen from your hot path.

What to Instrument First

Before optimizing anything, measure. The two metrics that matter most:

go_gc_duration_seconds     — how long GC pauses are
go_memstats_alloc_bytes    — live heap size
go_memstats_sys_bytes      — total memory requested from OS

A rising sys_bytes that stays high after GC cycles usually means you’re holding references to memory you don’t need. A high alloc_bytes / sys_bytes ratio means your working set is small but your allocation rate is high — the GC is working hard just to keep up.

The Escape Analysis Tool

Before profiling, use the compiler’s escape analysis to understand what’s going on:

go build -gcflags="-m=2" ./... 2>&1 | grep "escapes to heap"

Heap escapes are not always bad, but understanding why an allocation escapes tells you where to focus. The most common culprits:

Interface boxing — passing a concrete type through an interface allocates unless the value fits in a pointer-sized slot.

Closures capturing variables — the captured variable moves to the heap.

Slices exceeding stack limits — Go’s stack limit for inline allocation is roughly 64KB; anything larger goes to the heap immediately.

The sync.Pool Pattern

For services with a high rate of short-lived, same-shape allocations, sync.Pool is the right tool:

var bufPool = sync.Pool{
    New: func() any {
        return make([]byte, 0, 4096)
    },
}

func processEvent(data []byte) error {
    buf := bufPool.Get().([]byte)
    defer func() {
        buf = buf[:0]
        bufPool.Put(buf)
    }()

    buf = append(buf, data...)
    // ... process
    return nil
}

sync.Pool objects are cleared between GC cycles — which is the point. They reduce allocation pressure without leaking memory. Don’t use a pool for objects with non-trivial cleanup requirements; the GC will clear them at an unpredictable time.

Profiling in Production

pprof over HTTP is safe to enable in production with a tight IP allowlist:

import _ "net/http/pprof"

go func() {
    log.Println(http.ListenAndServe("127.0.0.1:6060", nil))
}()

Capture a 30-second heap profile during a high-traffic window:

go tool pprof -http :8080 http://localhost:6060/debug/pprof/heap

The flame graph view makes allocation hot spots immediately visible. Sort by inuse_space to see what’s currently live; sort by alloc_space to see cumulative pressure.

The GOGC Lever

GOGC controls when the GC triggers, expressed as a percentage of heap growth since the last collection. The default is 100 (trigger when heap doubles).

For latency-sensitive services with a small working set, a lower GOGC (e.g. GOGC=50) trades CPU for more frequent but shorter GC cycles. For throughput-first services with a large working set, raising GOGC reduces GC frequency at the cost of higher peak memory.

There is no universal right value. Measure your specific workload.