Graceful Shutdowns in Go (Part 2): Waiting for Goroutines to Finish

In Part 1, I focused on shutting down an HTTP server cleanly using signals, contexts, and timeouts. That solved the problem of stopping new work. What it didn’t solve was knowing when my own background goroutines had actually finished.

This post is about that missing piece: coordinating goroutines during shutdown so the process doesn’t exit while work is still in flight.

The Problem Context Alone Doesn’t Solve

Context cancellation is a signal, not a guarantee. It tells goroutines when they should stop, but it does not tell the main process when they have stopped.

If your service starts background work, metrics flushers, async writers, or outbound calls, you need a way to wait for those goroutines to finish before exiting.

The Naive Approaches (And Why They Fail)

Sleeping for a fixed amount of time and hoping work finishes
Relying on context cancellation alone
Using global flags or shared booleans
Letting the process exit and assuming goroutines will clean up

All of these approaches either race, block forever, or fail under load. They work until they don’t, usually in production.

Enter sync.WaitGroup

A sync.WaitGroup is a simple coordination primitive. It allows one part of your program to wait until a set of goroutines has completed.

Add increments the number of goroutines to wait for
Done decrements the counter when a goroutine finishes
Wait blocks until the counter reaches zero

Tracking Background Workers

Any goroutine that must finish before shutdown should be tracked explicitly. This includes workers started at boot time or spawned in response to requests.

In the snippet below, think of doWork() as whatever background task your service runs (polling a queue, flushing metrics, writing to a stream). The wait group is shared at the service level so every long-lived worker can register itself.

var wg sync.WaitGroup

func startWorker(ctx context.Context) {
    // Shared wait group: each worker adds once and marks done on exit.
    wg.Add(1)

    go func() {
        defer wg.Done()

        for {
            select {
            case <-ctx.Done():
                log.Println("Worker received shutdown signal")
                return
            default:
                // One unit of work for this worker (e.g., process a queue item).
                doWork()
            }
        }
    }()
}

Wiring WaitGroups Into Shutdown

The cancel function comes from a context you control, such as ctx, cancel := context.WithCancel(parent) or signal.NotifyContext. Pass that ctx into your workers so they can stop when shutdown begins.

During shutdown, you now have two responsibilities: signal cancellation, and wait for goroutines to exit. Both are required.

// Trigger cancellation
cancel()

// Wait for background work to finish
wg.Wait()

log.Println("All background workers stopped")

Context and WaitGroup Work Together

The key insight is this: context controls when work should stop, and the wait group confirms that it has stopped.

Using one without the other either leaks goroutines or exits too early.

Common WaitGroup Gotchas

Calling Add after Wait has started
Forgetting to defer Done in a goroutine
Blocking forever because a goroutine ignores context
Reusing a WaitGroup across unrelated lifecycles

When Not to Use WaitGroups

WaitGroups are best for a known set of goroutines with a clear lifecycle. They are not always the right tool.

Unbounded goroutine pools
Highly dynamic worker lifecycles
Cases where structured concurrency is a better fit

Key Takeaways

Context cancellation does not wait for work to finish
WaitGroups provide explicit lifecycle coordination
Every long-lived goroutine should have an owner
Clean shutdown requires both signalling and waiting

What Comes Next

With contexts and WaitGroups in place, the next step is structured concurrency. In the next post, I’ll look at errgroup and how it simplifies error handling and cancellation across goroutines.