Deadlock when handling onFailedToConnect to connection pool

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Deadlock when handling onFailedToConnect to connection pool

Fabian Gonzalez
Hello,

This is my first mail to the list, so greetings to all.
I'm facing a problem concerning the handling of a connection failure in a connection pool that results in a deadlock.
The problem arises in the following situation:

1) thread1 and thread2 try to retrieve a connection to endpoint1 and endpoint2 respectively through SingleEndpointPool.

2) both threads fail to connect so the method onFailedToConnect of ConnectCompletionHandler is invoked.

3) this generate a synchronized section which is locked based on the endpoint1 and endpoint2 respectively.

4) Within the synchronized piece of code the handlers of failure to connect are recursively invoked. In the handling managed by the app which thread1 tries to reconnect to endpoint2 and thread1 to endpoint2. 

5) When SingleEndpointPool is taken (method take) a new synchronized area is generated. But both threads get hang because thread1 is waiting for the lock of endpoint2 to be freed (taken by thread2 in onFailedToConnect) and thread2 is waiting for endpoint1 to be freed (taken by thread1 in onFailedToConnect).

This situation is generated in the scenario where several addresses are resolved through dns for the domain and they are not returned in the same order so when the connect failure arises a reconnection is attempted using the following ip in the ip list resolved by the dns.  

I copy a part of the thread dump:

this is the stacktrace wich results in the deadlock:

"[na-stg-mcd].http.requester.GatewayRequestor(4) SelectorRunner":
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.take(SingleEndpointPool.java:755)
- waiting to lock <0x000000068d7c7348> (a java.lang.Object)
at org.glassfish.grizzly.connectionpool.MultiEndpointPool.take(MultiEndpointPool.java:592)
at com.ning.http.client.providers.grizzly.ConnectionManager.openAsync(ConnectionManager.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider.execute(GrizzlyAsyncHttpProvider.java:174)
at com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:506)
at org.mule.module.http.internal.request.grizzly.GrizzlyHttpClient.send(GrizzlyHttpClient.java:295)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient.access$1(EEGrizzlyHttpClient.java:1)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient$1.onFailure(EEGrizzlyHttpClient.java:150)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient$1.onFailure(EEGrizzlyHttpClient.java:1)
at org.mule.module.http.internal.request.grizzly.GrizzlyHttpClient$WorkManagerSourceAsyncCompletionHandler.onThrowable(GrizzlyHttpClient.java:327)
at com.ning.http.client.providers.grizzly.GrizzlyResponseFuture.failed(GrizzlyResponseFuture.java:159)
at org.glassfish.grizzly.impl.SafeFutureImpl.notifyCompletionHandlers(SafeFutureImpl.java:187)
at org.glassfish.grizzly.impl.SafeFutureImpl.done(SafeFutureImpl.java:277)
at org.glassfish.grizzly.impl.SafeFutureImpl$Sync.innerSetException(SafeFutureImpl.java:382)
at org.glassfish.grizzly.impl.SafeFutureImpl.failure(SafeFutureImpl.java:122)
at com.ning.http.client.providers.grizzly.GrizzlyResponseFuture.abort(GrizzlyResponseFuture.java:72)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$1.failed(GrizzlyAsyncHttpProvider.java:138)
at org.glassfish.grizzly.utils.Futures.notifyFailure(Futures.java:129)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.notifyAsyncPollersOfFailure(SingleEndpointPool.java:1181)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.access$1400(SingleEndpointPool.java:101)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool$ConnectCompletionHandler.onFailedToConnect(SingleEndpointPool.java:1313)
- locked <0x000000068d6d6770> (a java.lang.Object)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool$ConnectCompletionHandler.failed(SingleEndpointPool.java:1289)
at org.glassfish.grizzly.impl.SafeFutureImpl.notifyCompletionHandlers(SafeFutureImpl.java:187)
at org.glassfish.grizzly.impl.SafeFutureImpl.done(SafeFutureImpl.java:277)
at org.glassfish.grizzly.impl.SafeFutureImpl$Sync.innerSetException(SafeFutureImpl.java:382)
at org.glassfish.grizzly.impl.SafeFutureImpl.failure(SafeFutureImpl.java:122)
at org.glassfish.grizzly.utils.CompletionHandlerAdapter.failed(CompletionHandlerAdapter.java:104)
at org.glassfish.grizzly.nio.transport.TCPNIOConnectorHandler.abortConnection(TCPNIOConnectorHandler.java:268)
at org.glassfish.grizzly.nio.transport.TCPNIOConnectorHandler.onConnectedAsync(TCPNIOConnectorHandler.java:221)
at org.glassfish.grizzly.nio.transport.TCPNIOConnectorHandler$1.connected(TCPNIOConnectorHandler.java:158)
at org.glassfish.grizzly.nio.transport.TCPNIOConnection.onConnect(TCPNIOConnection.java:258)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:517)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
at org.mule.module.http.internal.request.grizzly.FlowWorkManagerIOStrategy.run0(FlowWorkManagerIOStrategy.java:134)
at org.mule.module.http.internal.request.grizzly.FlowWorkManagerIOStrategy.executeIoEvent(FlowWorkManagerIOStrategy.java:77)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.executeIoEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.nio.SelectorRunner.iterateKeyEvents(SelectorRunner.java:415)
at org.glassfish.grizzly.nio.SelectorRunner.iterateKeys(SelectorRunner.java:384)
at org.glassfish.grizzly.nio.SelectorRunner.doSelect(SelectorRunner.java:348)
at org.glassfish.grizzly.nio.SelectorRunner.run(SelectorRunner.java:279)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:591)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:571)
at java.lang.Thread.run(Thread.java:745)
"[na-stg-mcd].http.requester.GatewayRequestor(1) SelectorRunner":
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.take(SingleEndpointPool.java:755)
- waiting to lock <0x000000068d6d6770> (a java.lang.Object)
at org.glassfish.grizzly.connectionpool.MultiEndpointPool.take(MultiEndpointPool.java:592)
at com.ning.http.client.providers.grizzly.ConnectionManager.openAsync(ConnectionManager.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider.execute(GrizzlyAsyncHttpProvider.java:174)
at com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:506)
at org.mule.module.http.internal.request.grizzly.GrizzlyHttpClient.send(GrizzlyHttpClient.java:295)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient.access$1(EEGrizzlyHttpClient.java:1)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient$1.onFailure(EEGrizzlyHttpClient.java:150)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient$1.onFailure(EEGrizzlyHttpClient.java:1)
at org.mule.module.http.internal.request.grizzly.GrizzlyHttpClient$WorkManagerSourceAsyncCompletionHandler.onThrowable(GrizzlyHttpClient.java:327)
at com.ning.http.client.providers.grizzly.GrizzlyResponseFuture.failed(GrizzlyResponseFuture.java:159)
at org.glassfish.grizzly.impl.SafeFutureImpl.notifyCompletionHandlers(SafeFutureImpl.java:187)
at org.glassfish.grizzly.impl.SafeFutureImpl.done(SafeFutureImpl.java:277)
at org.glassfish.grizzly.impl.SafeFutureImpl$Sync.innerSetException(SafeFutureImpl.java:382)
at org.glassfish.grizzly.impl.SafeFutureImpl.failure(SafeFutureImpl.java:122)
at com.ning.http.client.providers.grizzly.GrizzlyResponseFuture.abort(GrizzlyResponseFuture.java:72)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$1.failed(GrizzlyAsyncHttpProvider.java:138)
at org.glassfish.grizzly.utils.Futures.notifyFailure(Futures.java:129)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.notifyAsyncPollersOfFailure(SingleEndpointPool.java:1181)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.access$1400(SingleEndpointPool.java:101)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool$ConnectCompletionHandler.onFailedToConnect(SingleEndpointPool.java:1313)
- locked <0x000000068d7c7348> (a java.lang.Object)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool$ConnectCompletionHandler.failed(SingleEndpointPool.java:1289)
at org.glassfish.grizzly.impl.SafeFutureImpl.notifyCompletionHandlers(SafeFutureImpl.java:187)

Found one Java-level deadlock:
=============================
"[na-stg-mcd].https-lc-0.0.0.0-8082.worker.9480":
  waiting to lock monitor 0x00007fc330286fe8 (object 0x000000068d6d6770, a java.lang.Object),
  which is held by "[na-stg-mcd].http.requester.GatewayRequestor(4) SelectorRunner"
"[na-stg-mcd].http.requester.GatewayRequestor(4) SelectorRunner":
  waiting to lock monitor 0x00007fc3323b0e68 (object 0x000000068d7c7348, a java.lang.Object),
  which is held by "[na-stg-mcd].http.requester.GatewayRequestor(1) SelectorRunner"
"[na-stg-mcd].http.requester.GatewayRequestor(1) SelectorRunner":
  waiting to lock monitor 0x00007fc330286fe8 (object 0x000000068d6d6770, a java.lang.Object),
  which is held by "[na-stg-mcd].http.requester.GatewayRequestor(4) SelectorRunner"




Reply | Threaded
Open this post in threaded view
|

Re: Deadlock when handling onFailedToConnect to connection pool

Ryan Lubke-2
Thanks Fabian,

I see you've logged an issue.  We'll investigate asap.



February 10, 2017 at 06:59
Hello,

This is my first mail to the list, so greetings to all.
I'm facing a problem concerning the handling of a connection failure in a connection pool that results in a deadlock.
The problem arises in the following situation:

1) thread1 and thread2 try to retrieve a connection to endpoint1 and endpoint2 respectively through SingleEndpointPool.

2) both threads fail to connect so the method onFailedToConnect of ConnectCompletionHandler is invoked.

3) this generate a synchronized section which is locked based on the endpoint1 and endpoint2 respectively.

4) Within the synchronized piece of code the handlers of failure to connect are recursively invoked. In the handling managed by the app which thread1 tries to reconnect to endpoint2 and thread1 to endpoint2. 

5) When SingleEndpointPool is taken (method take) a new synchronized area is generated. But both threads get hang because thread1 is waiting for the lock of endpoint2 to be freed (taken by thread2 in onFailedToConnect) and thread2 is waiting for endpoint1 to be freed (taken by thread1 in onFailedToConnect).

This situation is generated in the scenario where several addresses are resolved through dns for the domain and they are not returned in the same order so when the connect failure arises a reconnection is attempted using the following ip in the ip list resolved by the dns.  

I copy a part of the thread dump:

this is the stacktrace wich results in the deadlock:

"[na-stg-mcd].http.requester.GatewayRequestor(4) SelectorRunner":
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.take(SingleEndpointPool.java:755)
- waiting to lock <0x000000068d7c7348> (a java.lang.Object)
at org.glassfish.grizzly.connectionpool.MultiEndpointPool.take(MultiEndpointPool.java:592)
at com.ning.http.client.providers.grizzly.ConnectionManager.openAsync(ConnectionManager.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider.execute(GrizzlyAsyncHttpProvider.java:174)
at com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:506)
at org.mule.module.http.internal.request.grizzly.GrizzlyHttpClient.send(GrizzlyHttpClient.java:295)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient.access$1(EEGrizzlyHttpClient.java:1)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient$1.onFailure(EEGrizzlyHttpClient.java:150)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient$1.onFailure(EEGrizzlyHttpClient.java:1)
at org.mule.module.http.internal.request.grizzly.GrizzlyHttpClient$WorkManagerSourceAsyncCompletionHandler.onThrowable(GrizzlyHttpClient.java:327)
at com.ning.http.client.providers.grizzly.GrizzlyResponseFuture.failed(GrizzlyResponseFuture.java:159)
at org.glassfish.grizzly.impl.SafeFutureImpl.notifyCompletionHandlers(SafeFutureImpl.java:187)
at org.glassfish.grizzly.impl.SafeFutureImpl.done(SafeFutureImpl.java:277)
at org.glassfish.grizzly.impl.SafeFutureImpl$Sync.innerSetException(SafeFutureImpl.java:382)
at org.glassfish.grizzly.impl.SafeFutureImpl.failure(SafeFutureImpl.java:122)
at com.ning.http.client.providers.grizzly.GrizzlyResponseFuture.abort(GrizzlyResponseFuture.java:72)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$1.failed(GrizzlyAsyncHttpProvider.java:138)
at org.glassfish.grizzly.utils.Futures.notifyFailure(Futures.java:129)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.notifyAsyncPollersOfFailure(SingleEndpointPool.java:1181)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.access$1400(SingleEndpointPool.java:101)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool$ConnectCompletionHandler.onFailedToConnect(SingleEndpointPool.java:1313)
- locked <0x000000068d6d6770> (a java.lang.Object)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool$ConnectCompletionHandler.failed(SingleEndpointPool.java:1289)
at org.glassfish.grizzly.impl.SafeFutureImpl.notifyCompletionHandlers(SafeFutureImpl.java:187)
at org.glassfish.grizzly.impl.SafeFutureImpl.done(SafeFutureImpl.java:277)
at org.glassfish.grizzly.impl.SafeFutureImpl$Sync.innerSetException(SafeFutureImpl.java:382)
at org.glassfish.grizzly.impl.SafeFutureImpl.failure(SafeFutureImpl.java:122)
at org.glassfish.grizzly.utils.CompletionHandlerAdapter.failed(CompletionHandlerAdapter.java:104)
at org.glassfish.grizzly.nio.transport.TCPNIOConnectorHandler.abortConnection(TCPNIOConnectorHandler.java:268)
at org.glassfish.grizzly.nio.transport.TCPNIOConnectorHandler.onConnectedAsync(TCPNIOConnectorHandler.java:221)
at org.glassfish.grizzly.nio.transport.TCPNIOConnectorHandler$1.connected(TCPNIOConnectorHandler.java:158)
at org.glassfish.grizzly.nio.transport.TCPNIOConnection.onConnect(TCPNIOConnection.java:258)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:517)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
at org.mule.module.http.internal.request.grizzly.FlowWorkManagerIOStrategy.run0(FlowWorkManagerIOStrategy.java:134)
at org.mule.module.http.internal.request.grizzly.FlowWorkManagerIOStrategy.executeIoEvent(FlowWorkManagerIOStrategy.java:77)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.executeIoEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.nio.SelectorRunner.iterateKeyEvents(SelectorRunner.java:415)
at org.glassfish.grizzly.nio.SelectorRunner.iterateKeys(SelectorRunner.java:384)
at org.glassfish.grizzly.nio.SelectorRunner.doSelect(SelectorRunner.java:348)
at org.glassfish.grizzly.nio.SelectorRunner.run(SelectorRunner.java:279)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:591)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:571)
at java.lang.Thread.run(Thread.java:745)
"[na-stg-mcd].http.requester.GatewayRequestor(1) SelectorRunner":
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.take(SingleEndpointPool.java:755)
- waiting to lock <0x000000068d6d6770> (a java.lang.Object)
at org.glassfish.grizzly.connectionpool.MultiEndpointPool.take(MultiEndpointPool.java:592)
at com.ning.http.client.providers.grizzly.ConnectionManager.openAsync(ConnectionManager.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider.execute(GrizzlyAsyncHttpProvider.java:174)
at com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:506)
at org.mule.module.http.internal.request.grizzly.GrizzlyHttpClient.send(GrizzlyHttpClient.java:295)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient.access$1(EEGrizzlyHttpClient.java:1)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient$1.onFailure(EEGrizzlyHttpClient.java:150)
at com.mulesoft.mule.http.request.grizzly.EEGrizzlyHttpClient$1.onFailure(EEGrizzlyHttpClient.java:1)
at org.mule.module.http.internal.request.grizzly.GrizzlyHttpClient$WorkManagerSourceAsyncCompletionHandler.onThrowable(GrizzlyHttpClient.java:327)
at com.ning.http.client.providers.grizzly.GrizzlyResponseFuture.failed(GrizzlyResponseFuture.java:159)
at org.glassfish.grizzly.impl.SafeFutureImpl.notifyCompletionHandlers(SafeFutureImpl.java:187)
at org.glassfish.grizzly.impl.SafeFutureImpl.done(SafeFutureImpl.java:277)
at org.glassfish.grizzly.impl.SafeFutureImpl$Sync.innerSetException(SafeFutureImpl.java:382)
at org.glassfish.grizzly.impl.SafeFutureImpl.failure(SafeFutureImpl.java:122)
at com.ning.http.client.providers.grizzly.GrizzlyResponseFuture.abort(GrizzlyResponseFuture.java:72)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$1.failed(GrizzlyAsyncHttpProvider.java:138)
at org.glassfish.grizzly.utils.Futures.notifyFailure(Futures.java:129)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.notifyAsyncPollersOfFailure(SingleEndpointPool.java:1181)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool.access$1400(SingleEndpointPool.java:101)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool$ConnectCompletionHandler.onFailedToConnect(SingleEndpointPool.java:1313)
- locked <0x000000068d7c7348> (a java.lang.Object)
at org.glassfish.grizzly.connectionpool.SingleEndpointPool$ConnectCompletionHandler.failed(SingleEndpointPool.java:1289)
at org.glassfish.grizzly.impl.SafeFutureImpl.notifyCompletionHandlers(SafeFutureImpl.java:187)

Found one Java-level deadlock:
=============================
"[na-stg-mcd].https-lc-0.0.0.0-8082.worker.9480":
  waiting to lock monitor 0x00007fc330286fe8 (object 0x000000068d6d6770, a java.lang.Object),
  which is held by "[na-stg-mcd].http.requester.GatewayRequestor(4) SelectorRunner"
"[na-stg-mcd].http.requester.GatewayRequestor(4) SelectorRunner":
  waiting to lock monitor 0x00007fc3323b0e68 (object 0x000000068d7c7348, a java.lang.Object),
  which is held by "[na-stg-mcd].http.requester.GatewayRequestor(1) SelectorRunner"
"[na-stg-mcd].http.requester.GatewayRequestor(1) SelectorRunner":
  waiting to lock monitor 0x00007fc330286fe8 (object 0x000000068d6d6770, a java.lang.Object),
  which is held by "[na-stg-mcd].http.requester.GatewayRequestor(4) SelectorRunner"