Coordination, Waiting & Cancellation

Semaphore Permit Leak

Semaphore Permit Leak: practice a Java concurrency bug with symptoms like Throughput drops to zero, Work stops entering, App appears stuck under load....

  • Permits and throttling
  • Semaphore
  • Permit Leak
  • Java
  • Intermediate

Production symptoms

  • Throughput drops to zero
  • Work stops entering
  • App appears stuck under load

Failure scenario

Code

Java example
semaphore.acquire();

if (request.isBad()) {
    throw new IllegalArgumentException("bad request");
}

callDownstream(request);
semaphore.release();

Prod Symptoms

A Semaphore limits concurrent calls to a downstream service. Exceptional paths leak permits, so effective capacity decreases after each failure.

Key signal: Each skipped release reduces this Semaphore's effective capacity until another release occurs or the Semaphore is recreated.

  • Throughput falls gradually as failures accumulate
  • Queueing time grows while fewer calls reach the downstream service
  • Threads pile up in Semaphore.acquire()
  • CPU stays low because callers are parked
  • Restart restores the original capacity
  • Once every permit is leaked, no new call enters the protected section

Run Locally

  • worker 1 and worker 2 acquire permits and fail
  • No permit is released for those failures
  • availablePermits() reports zero before later workers start
  • Later workers wait in acquire
  • The protected section has capacity zero after both permits leak

What to look for

  • Threads parked in Semaphore.acquire
  • Exception or return paths between acquire and release
  • Permit count smaller than expected after failures
Run
javac SemaphorePermitLeakDemo.java
java SemaphorePermitLeakDemo
Inspect while stuck
jps
jstack <pid>
jcmd <pid> Thread.print
SemaphorePermitLeakDemo.java
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Semaphore;

public class SemaphorePermitLeakDemo {
    private static final Semaphore gate = new Semaphore(2);

    public static void main(String[] args) throws Exception {
        List<Thread> workers = new ArrayList<>();

        Thread first = failingWorker(1);
        Thread second = failingWorker(2);
        first.start();
        second.start();
        first.join();
        second.join();

        System.out.println("available permits after failures = "
                + gate.availablePermits());

        for (int i = 3; i <= 5; i++) {
            Thread worker = new Thread(() -> doWork(), "worker-" + i);
            workers.add(worker);
            worker.start();
        }

        Thread.sleep(500);
        for (Thread worker : workers) {
            System.out.println(worker.getName() + " state = " + worker.getState());
        }
    }

    private static Thread failingWorker(int workerId) {
        Thread worker = new Thread(() -> {
            try {
                gate.acquire();
                System.out.println("worker " + workerId + " acquired permit");
                throw new RuntimeException("failed before release");
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }, "worker-" + workerId);
        worker.setUncaughtExceptionHandler((thread, error) ->
                System.out.println(thread.getName() + " failed: " + error));
        return worker;
    }

    private static void doWork() {
        try {
            gate.acquire();
            System.out.println(Thread.currentThread().getName()
                    + " acquired permit");
            sleepQuietly(300);
            gate.release();
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }

    private static void sleepQuietly(long millis) {
        try {
            Thread.sleep(millis);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }
}

Note: The first two workers leak both permits before the remaining workers start.

Diagnosis and fix

Explanation

In this code, Semaphore is used as a concurrency counter. Every successful acquire must eventually be balanced by exactly one release.

Key signal: A successful acquire creates an accounting obligation: release exactly one permit when the protected attempt ends.

  • acquire() decrements the available permit count
  • An exception skips release()
  • Each skipped release reduces this Semaphore's effective capacity
  • After all permits are lost, later callers wait indefinitely
  • Semaphore does not track thread ownership or enforce a maximum permit count
  • Releasing without a successful acquire causes the opposite bug: the concurrency limit grows

How to Diagnose

Use thread dumps to find blocked callers, then use metrics and code review to distinguish a leak from normal saturation.

  • Look for callers parked in Semaphore.acquire()
  • Compare configured capacity, available permits, queued callers, and active protected operations
  • Check whether permits recover after active operations complete
  • Correlate permanent capacity loss with exceptions and early returns after acquire()
  • Inspect interrupted acquire paths and verify that they do not release
  • Inspect release paths for both missing releases and over-release
Commands
jps
jstack <pid>
jcmd <pid> Thread.print
Expected dump shape
"worker-3" #... WAITING (parking)
  at jdk.internal.misc.Unsafe.park(Native Method)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:...)
  at java.util.concurrent.Semaphore.acquire(Semaphore.java:...)

How to Fix

  • Release exactly once in finally after a successful acquire
  • Do not release when acquire() was interrupted or otherwise failed
  • Keep acquisition and release accounting in one small code scope
  • Use tryAcquire(timeout) for bounded waiting, not as a fix for permit leaks
  • Treat timeout as rejected or deferred work, not successful admission
  • Monitor effective capacity and queued callers
  • Do not increase permit count, enable fairness, or restart as a substitute for fixing the accounting
SemaphorePermitFinallyFixed.java
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Semaphore;

public class SemaphorePermitFinallyFixed {
    private static final Semaphore gate = new Semaphore(2);

    public static void main(String[] args) throws Exception {
        List<Thread> workers = new ArrayList<>();

        for (int i = 1; i <= 5; i++) {
            final int workerId = i;
            Thread worker = new Thread(() -> doWork(workerId), "worker-" + i);
            workers.add(worker);
            worker.start();
        }

        for (Thread worker : workers) {
            worker.join();
        }
        System.out.println("all workers reached a terminal state");
    }

    private static void doWork(int workerId) {
        try {
            gate.acquire();
            try {
                System.out.println("worker " + workerId + " acquired permit");

                if (workerId <= 2) {
                    throw new RuntimeException("failed during work");
                }

                sleepQuietly(300);
                System.out.println("worker " + workerId + " completed work");
            } finally {
                gate.release();
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        } catch (RuntimeException error) {
            System.out.println("worker " + workerId + " failed: " + error);
        }
    }

    private static void sleepQuietly(long millis) {
        try {
            Thread.sleep(millis);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }
}

Note: The inner finally is entered only after acquire() succeeds, so each admitted attempt releases exactly one permit.