Coordination, Waiting & Cancellation
Semaphore Permit Leak
Semaphore Permit Leak: practice a Java concurrency bug with symptoms like Throughput drops to zero, Work stops entering, App appears stuck under load....
- Permits and throttling
- Semaphore
- Permit Leak
- Java
- Intermediate
Production symptoms
- Throughput drops to zero
- Work stops entering
- App appears stuck under load
Failure scenario
Code
semaphore.acquire();
if (request.isBad()) {
throw new IllegalArgumentException("bad request");
}
callDownstream(request);
semaphore.release();
Prod Symptoms
A Semaphore limits concurrent calls to a downstream service. Exceptional paths leak permits, so effective capacity decreases after each failure.
Key signal: Each skipped release reduces this Semaphore's effective capacity until another release occurs or the Semaphore is recreated.
- Throughput falls gradually as failures accumulate
- Queueing time grows while fewer calls reach the downstream service
- Threads pile up in Semaphore.acquire()
- CPU stays low because callers are parked
- Restart restores the original capacity
- Once every permit is leaked, no new call enters the protected section
Run Locally
- worker 1 and worker 2 acquire permits and fail
- No permit is released for those failures
- availablePermits() reports zero before later workers start
- Later workers wait in acquire
- The protected section has capacity zero after both permits leak
What to look for
- Threads parked in Semaphore.acquire
- Exception or return paths between acquire and release
- Permit count smaller than expected after failures
javac SemaphorePermitLeakDemo.java
java SemaphorePermitLeakDemo
jps
jstack <pid>
jcmd <pid> Thread.print
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Semaphore;
public class SemaphorePermitLeakDemo {
private static final Semaphore gate = new Semaphore(2);
public static void main(String[] args) throws Exception {
List<Thread> workers = new ArrayList<>();
Thread first = failingWorker(1);
Thread second = failingWorker(2);
first.start();
second.start();
first.join();
second.join();
System.out.println("available permits after failures = "
+ gate.availablePermits());
for (int i = 3; i <= 5; i++) {
Thread worker = new Thread(() -> doWork(), "worker-" + i);
workers.add(worker);
worker.start();
}
Thread.sleep(500);
for (Thread worker : workers) {
System.out.println(worker.getName() + " state = " + worker.getState());
}
}
private static Thread failingWorker(int workerId) {
Thread worker = new Thread(() -> {
try {
gate.acquire();
System.out.println("worker " + workerId + " acquired permit");
throw new RuntimeException("failed before release");
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}, "worker-" + workerId);
worker.setUncaughtExceptionHandler((thread, error) ->
System.out.println(thread.getName() + " failed: " + error));
return worker;
}
private static void doWork() {
try {
gate.acquire();
System.out.println(Thread.currentThread().getName()
+ " acquired permit");
sleepQuietly(300);
gate.release();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
private static void sleepQuietly(long millis) {
try {
Thread.sleep(millis);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
Note: The first two workers leak both permits before the remaining workers start.
Diagnosis and fix
Explanation
In this code, Semaphore is used as a concurrency counter. Every successful acquire must eventually be balanced by exactly one release.
Key signal: A successful acquire creates an accounting obligation: release exactly one permit when the protected attempt ends.
- acquire() decrements the available permit count
- An exception skips release()
- Each skipped release reduces this Semaphore's effective capacity
- After all permits are lost, later callers wait indefinitely
- Semaphore does not track thread ownership or enforce a maximum permit count
- Releasing without a successful acquire causes the opposite bug: the concurrency limit grows
How to Diagnose
Use thread dumps to find blocked callers, then use metrics and code review to distinguish a leak from normal saturation.
- Look for callers parked in Semaphore.acquire()
- Compare configured capacity, available permits, queued callers, and active protected operations
- Check whether permits recover after active operations complete
- Correlate permanent capacity loss with exceptions and early returns after acquire()
- Inspect interrupted acquire paths and verify that they do not release
- Inspect release paths for both missing releases and over-release
jps
jstack <pid>
jcmd <pid> Thread.print
"worker-3" #... WAITING (parking)
at jdk.internal.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:...)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:...)
How to Fix
- Release exactly once in finally after a successful acquire
- Do not release when acquire() was interrupted or otherwise failed
- Keep acquisition and release accounting in one small code scope
- Use tryAcquire(timeout) for bounded waiting, not as a fix for permit leaks
- Treat timeout as rejected or deferred work, not successful admission
- Monitor effective capacity and queued callers
- Do not increase permit count, enable fairness, or restart as a substitute for fixing the accounting
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Semaphore;
public class SemaphorePermitFinallyFixed {
private static final Semaphore gate = new Semaphore(2);
public static void main(String[] args) throws Exception {
List<Thread> workers = new ArrayList<>();
for (int i = 1; i <= 5; i++) {
final int workerId = i;
Thread worker = new Thread(() -> doWork(workerId), "worker-" + i);
workers.add(worker);
worker.start();
}
for (Thread worker : workers) {
worker.join();
}
System.out.println("all workers reached a terminal state");
}
private static void doWork(int workerId) {
try {
gate.acquire();
try {
System.out.println("worker " + workerId + " acquired permit");
if (workerId <= 2) {
throw new RuntimeException("failed during work");
}
sleepQuietly(300);
System.out.println("worker " + workerId + " completed work");
} finally {
gate.release();
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} catch (RuntimeException error) {
System.out.println("worker " + workerId + " failed: " + error);
}
}
private static void sleepQuietly(long millis) {
try {
Thread.sleep(millis);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
Note: The inner finally is entered only after acquire() succeeds, so each admitted attempt releases exactly one permit.