Windbg determination of race condition

A report came back from the field indicating that one of our services was not doing what it was designed to do.  Will not accept subscriptions, etc.  No errors were also reported in our Sumo logs.

We decided to take a memory dump of the server and determined there were about 300 threads waiting for something. A quick investigation using Windbg revealed a deadlock or race condition.

First we used !threads to get the following:

0:000> !threads
 ThreadCount: 279
 UnstartedThread: 0
 BackgroundThread: 273
 PendingThread: 0
 DeadThread: 6
 Hosted Runtime: no

Next we used !syncblck to get the following:

0:000> !syncblk
 Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
 33 000000fbfcbf0be8 497 1 000000fc03846780 548 26 000000fbe2724e78 System.Object
 88 000000fbfcbf1458 5 1 000000fc03846780 548 26 000000fbe27248c0 System.Object
 90 000000fbfcbf1368 3 1 000000fc02eeafb0 ad8 23 000000fbe26f80c8 System.Object
 -----------------------------
 Total 200
 CCW 3
 RCW 5
 ComClassFactory 0
 Free 132

The above table does not readily identify a deadlocked situation.  However, it references two threads worth further investigating to see if the are waiting on locks:  0x548 and 0xad8.  To further probe into each thread, we clicked on the link provided under the “Thread” column in the above table, and once the thread information was retrieved, issued !clrstack as demonstrated below.

Let’s start with thread 0x548.

0:000> ~~[548]s
ntdll!NtWaitForMultipleObjects+0xa:
00007ff9`b0540c6a c3 ret
0:026> !clrstack
OS Thread Id: 0x548 (26)
 Child SP IP Call Site
000000fc0353cca8 00007ff9b0540c6a [GCFrame: 000000fc0353cca8] 
000000fc0353ce18 00007ff9b0540c6a [GCFrame: 000000fc0353ce18] 
000000fc0353ce58 00007ff9b0540c6a [HelperMethodFrame_1OBJ: 000000fc0353ce58] System.Threading.Monitor.Enter(System.Object)
000000fc0353cf50 00007ff948ea067f MyCompany.Redis.Client.RedisClient.CheckForConnect()
000000fc0353cfc0 00007ff948ea05f0 MyCompany.Redis.Client.RedisClient.get_Subscriber()
000000fc0353cff0 00007ff948ea0540 MyCompany.Redis.Client.RedisClient.Subscribe(System.String, System.Action`2)
000000fc0353d060 00007ff948e9fe85 MyCompany.Redis.MessageBus.RedisMessageBus.Subscribe(System.String, System.Action`2)
000000fc0353d138 00007ff9a6bcc29c [StubHelperFrame: 000000fc0353d138] 
000000fc0353d190 00007ff948e9f894 ACompany.Redis.WebSocket.Channel.OnDemandMessageListener.Attach(Integration.OnDemand.WebSocket.Channel.IOnDemandMessageHandler)
000000fc0353d230 00007ff948e9e65b ACompany.Redis.WebSocket.Channel.OnDemandWebSocketHandler.OnOpen()
000000fc0353d380 00007ff948e9e0b6 Microsoft.Web.WebSockets.WebSocketHandler+d__9.MoveNext()
000000fc0353d3e0 00007ff948e9df75 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.Web.WebSockets.WebSocketHandler+d__9, Microsoft.WebSockets]](d__9 ByRef)
000000fc0353d490 00007ff948e9dece Microsoft.Web.WebSockets.WebSocketHandler.ProcessWebSocketRequestAsync(System.Web.WebSockets.AspNetWebSocketContext, System.Func`1>)
000000fc0353d550 00007ff949024c4d System.Web.WebSocketPipeline+c__DisplayClass9_0.b__0(System.Object)
000000fc0353d5b0 00007ff9490243f1 System.Web.Util.SynchronizationHelper.SafeWrapCallback(System.Action)
000000fc0353d600 00007ff9490242b6 System.Web.Util.SynchronizationHelper.QueueSynchronous(System.Action)
000000fc0353d660 00007ff949022da1 System.Web.WebSocketPipeline+d__9.MoveNext()
000000fc0353d6f0 00007ff94902268f System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].Start[[System.Web.WebSocketPipeline+d__9, System.Web]](d__9 ByRef)
000000fc0353d7a0 00007ff9490225df System.Web.WebSocketPipeline.ProcessRequestImplAsync()
000000fc0353d860 00007ff949022435 System.Web.WebSocketPipeline.ProcessRequest()
000000fc0353d8b0 00007ff947b6ceb0 System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32)
000000fc0353da60 00007ff947b6c3e4 System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32)
000000fc0353daa0 00007ff947b6b8ab DomainNeutralILStubClass.IL_STUB_ReversePInvoke(Int64, Int64, Int64, Int32)
000000fc0353e2a0 00007ff9a6b07fde [InlinedCallFrame: 000000fc0353e2a0] System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0353e2a0 00007ff948157a7e [InlinedCallFrame: 000000fc0353e2a0] System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0353e270 00007ff948157a7e DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0353e330 00007ff947b6cdc1 System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32)
000000fc0353e4e0 00007ff947b6c3e4 System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32)
000000fc0353e520 00007ff947b6b8ab DomainNeutralILStubClass.IL_STUB_ReversePInvoke(Int64, Int64, Int64, Int32)
000000fc0353e6f8 00007ff9a6b08233 [ContextTransitionFrame: 000000fc0353e6f8]

This thread is waiting on a lock to be released as indicated by this line at the top of its stack trace:

000000fc0353ce58 00007ff9b0540c6a [HelperMethodFrame_1OBJ: 000000fc0353ce58] System.Threading.Monitor.Enter(System.Object)

Let’s perform a similar exercise for thread 0xad8.  It’s stack trace is as follows:

0:026> ~~[ad8]s
ntdll!NtWaitForMultipleObjects+0xa:
00007ff9`b0540c6a c3 ret
0:023> !clrstack
OS Thread Id: 0xad8 (23)
 Child SP IP Call Site
000000fc0071ccc8 00007ff9b0540c6a [GCFrame: 000000fc0071ccc8] 
000000fc0071cef0 00007ff9b0540c6a [GCFrame: 000000fc0071cef0] 
000000fc0071cf28 00007ff9b0540c6a [HelperMethodFrame: 000000fc0071cf28] System.Threading.Monitor.Enter(System.Object)
000000fc0071d020 00007ff9491de236 Phreesia.Redis.MessageBus.RedisMessageBus.OnRedisConnectionReestablished(System.Object, System.EventArgs)
000000fc0071d110 00007ff947942cd3 [MulticastFrame: 000000fc0071d110] System.EventHandler`1[[System.__Canon, mscorlib]].Invoke(System.Object, System.__Canon)
000000fc0071d170 00007ff9489c6485 ACompany.Redis.Client.RedisClient.InitializeConnection()
000000fc0071d1c0 00007ff948ea06e7 ACompany.Redis.Client.RedisClient.CheckForConnect()
000000fc0071d230 00007ff948ea4da0 ACompany.Redis.Client.RedisClient.get_Database()
000000fc0071d260 00007ff9491b817a ACompany.Redis.Common.RedisOnDemandSubscriptionManager.RemoveSubscription(System.String, System.String, System.String)
000000fc0071d370 00007ff9491b7d73 ACompany.Redis.WebSocket.Channel.OnDemandWebSocketHandler.OnClose()
000000fc0071d4b0 00007ff948e9e50c Microsoft.Web.WebSockets.WebSocketHandler+d__9.MoveNext()
000000fc0071d500 00007ff948e9e3ca Microsoft.Web.WebSockets.WebSocketHandler+d__9.MoveNext()
000000fc0071d560 00007ff948e9df75 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.Web.WebSockets.WebSocketHandler+d__9, Microsoft.WebSockets]](d__9 ByRef)
000000fc0071d610 00007ff948e9dece Microsoft.Web.WebSockets.WebSocketHandler.ProcessWebSocketRequestAsync(System.Web.WebSockets.AspNetWebSocketContext, System.Func`1>)
000000fc0071d6d0 00007ff949024c4d System.Web.WebSocketPipeline+c__DisplayClass9_0.b__0(System.Object)
000000fc0071d730 00007ff9490243f1 System.Web.Util.SynchronizationHelper.SafeWrapCallback(System.Action)
000000fc0071d780 00007ff9490242b6 System.Web.Util.SynchronizationHelper.QueueSynchronous(System.Action)
000000fc0071d7e0 00007ff949022da1 System.Web.WebSocketPipeline+d__9.MoveNext()
000000fc0071d870 00007ff94902268f System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].Start[[System.Web.WebSocketPipeline+d__9, System.Web]](d__9 ByRef)
000000fc0071d920 00007ff9490225df System.Web.WebSocketPipeline.ProcessRequestImplAsync()
000000fc0071d9e0 00007ff949022435 System.Web.WebSocketPipeline.ProcessRequest()
000000fc0071da30 00007ff947b6ceb0 System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32)
000000fc0071dbe0 00007ff947b6c3e4 System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32)
000000fc0071dc20 00007ff947b6b8ab DomainNeutralILStubClass.IL_STUB_ReversePInvoke(Int64, Int64, Int64, Int32)
000000fc0071e420 00007ff9a6b07fde [InlinedCallFrame: 000000fc0071e420] System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0071e420 00007ff948157a7e [InlinedCallFrame: 000000fc0071e420] System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0071e3f0 00007ff948157a7e DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0071e4b0 00007ff947b6cdc1 System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32)
000000fc0071e660 00007ff947b6c3e4 System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32)
000000fc0071e6a0 00007ff947b6b8ab DomainNeutralILStubClass.IL_STUB_ReversePInvoke(Int64, Int64, Int64, Int32)
000000fc0071e878 00007ff9a6b08233 [ContextTransitionFrame: 000000fc0071e878]

Similarly, this thread is waiting for a lock to be released as indicated at the top of its stack trace:

000000fc0071cf28 00007ff9b0540c6a [HelperMethodFrame: 000000fc0071cf28] System.Threading.Monitor.Enter(System.Object)

It gets tricky from this point on with Windbg at one cannot easily identify the lock identifies, which would have helped.  A deadlock means one thread is waiting for a lock to be released, but this thread also has acquiring and is holding a lock which another thread is waiting to be released.

So what can we make from the above?

It starts with thread 0x548 acquiring a lock on its call to RedisClient.Subscribe, let’s call this _redisMessageBusLock, as indicated below:

000000fc0353ce58 00007ff9b0540c6a [HelperMethodFrame_1OBJ: 000000fc0353ce58] System.Threading.Monitor.Enter(System.Object)
000000fc0353cf50 00007ff948ea067f Phreesia.Redis.Client.RedisClient.CheckForConnect()
000000fc0353cfc0 00007ff948ea05f0 Phreesia.Redis.Client.RedisClient.get_Subscriber()
000000fc0353cff0 00007ff948ea0540 Phreesia.Redis.Client.RedisClient.Subscribe(System.String, System.Action`2)   // we acquire _redisMessageBus lock here

Simultaneously, thread 0xad8, had first attempted to check for the Redis connection state in its call to CheckForConnect().

Servicing of this call, requires acquisition of a lock, we will call _redisClientLock. We acquire the lock, attempt to fix the Redis connection, if there is a problem, raise an event when the connection has been re-established, then release the lock.

In this case, there was an issue with the Redis connection, so we acquired the lock, _redisClientLock, fixed the connection and then raised the event OnRedisConnectionReestablished, while still holding the lock, _redisClientLock.

000000fc0071cf28 00007ff9b0540c6a [HelperMethodFrame: 000000fc0071cf28] System.Threading.Monitor.Enter(System.Object)    // we are waiting for redisMessageBusLock to be released
000000fc0071d020 00007ff9491de236 Phreesia.Redis.MessageBus.RedisMessageBus.OnRedisConnectionReestablished(System.Object, System.EventArgs)
000000fc0071d110 00007ff947942cd3 [MulticastFrame: 000000fc0071d110] System.EventHandler`1[[System.__Canon, mscorlib]].Invoke(System.Object, System.__Canon)
000000fc0071d170 00007ff9489c6485 Phreesia.Redis.Client.RedisClient.InitializeConnection()
000000fc0071d1c0 00007ff948ea06e7 Phreesia.Redis.Client.RedisClient.CheckForConnect()     // we acquired redisClientLock here

Since the event handling is synchronous, the same thread which raises the event, handles, it.  However, part of handling the event requires acquisition of redisMessageBusLock, which was already acquired by thread ox548.

Based on the above, our lock state looks like this:

Thread       LockAcquired           LockWaitingFor          
0xad8        _redisClientLock        _redisMessageBusLock     
0x548        _redisMessageBusLock    _redisClientLock

which clearly demonstrates out deadlock.

Happy debugging.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s