Monthly Archives: June 2017

Windbg determination of race condition

A report came back from the field indicating that one of our services was not doing what it was designed to do.  Will not accept subscriptions, etc.  No errors were also reported in our Sumo logs.

We decided to take a memory dump of the server and determined there were about 300 threads waiting for something. A quick investigation using Windbg revealed a deadlock or race condition.

First we used !threads to get the following:

0:000> !threads
 ThreadCount: 279
 UnstartedThread: 0
 BackgroundThread: 273
 PendingThread: 0
 DeadThread: 6
 Hosted Runtime: no

Next we used !syncblck to get the following:

0:000> !syncblk
 Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
 33 000000fbfcbf0be8 497 1 000000fc03846780 548 26 000000fbe2724e78 System.Object
 88 000000fbfcbf1458 5 1 000000fc03846780 548 26 000000fbe27248c0 System.Object
 90 000000fbfcbf1368 3 1 000000fc02eeafb0 ad8 23 000000fbe26f80c8 System.Object
 -----------------------------
 Total 200
 CCW 3
 RCW 5
 ComClassFactory 0
 Free 132

The above table does not readily identify a deadlocked situation.  However, it references two threads worth further investigating to see if the are waiting on locks:  0x548 and 0xad8.  To further probe into each thread, we clicked on the link provided under the “Thread” column in the above table, and once the thread information was retrieved, issued !clrstack as demonstrated below.

Let’s start with thread 0x548.

0:000> ~~[548]s
ntdll!NtWaitForMultipleObjects+0xa:
00007ff9`b0540c6a c3 ret
0:026> !clrstack
OS Thread Id: 0x548 (26)
 Child SP IP Call Site
000000fc0353cca8 00007ff9b0540c6a [GCFrame: 000000fc0353cca8] 
000000fc0353ce18 00007ff9b0540c6a [GCFrame: 000000fc0353ce18] 
000000fc0353ce58 00007ff9b0540c6a [HelperMethodFrame_1OBJ: 000000fc0353ce58] System.Threading.Monitor.Enter(System.Object)
000000fc0353cf50 00007ff948ea067f MyCompany.Redis.Client.RedisClient.CheckForConnect()
000000fc0353cfc0 00007ff948ea05f0 MyCompany.Redis.Client.RedisClient.get_Subscriber()
000000fc0353cff0 00007ff948ea0540 MyCompany.Redis.Client.RedisClient.Subscribe(System.String, System.Action`2)
000000fc0353d060 00007ff948e9fe85 MyCompany.Redis.MessageBus.RedisMessageBus.Subscribe(System.String, System.Action`2)
000000fc0353d138 00007ff9a6bcc29c [StubHelperFrame: 000000fc0353d138] 
000000fc0353d190 00007ff948e9f894 ACompany.Redis.WebSocket.Channel.OnDemandMessageListener.Attach(Integration.OnDemand.WebSocket.Channel.IOnDemandMessageHandler)
000000fc0353d230 00007ff948e9e65b ACompany.Redis.WebSocket.Channel.OnDemandWebSocketHandler.OnOpen()
000000fc0353d380 00007ff948e9e0b6 Microsoft.Web.WebSockets.WebSocketHandler+d__9.MoveNext()
000000fc0353d3e0 00007ff948e9df75 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.Web.WebSockets.WebSocketHandler+d__9, Microsoft.WebSockets]](d__9 ByRef)
000000fc0353d490 00007ff948e9dece Microsoft.Web.WebSockets.WebSocketHandler.ProcessWebSocketRequestAsync(System.Web.WebSockets.AspNetWebSocketContext, System.Func`1>)
000000fc0353d550 00007ff949024c4d System.Web.WebSocketPipeline+c__DisplayClass9_0.b__0(System.Object)
000000fc0353d5b0 00007ff9490243f1 System.Web.Util.SynchronizationHelper.SafeWrapCallback(System.Action)
000000fc0353d600 00007ff9490242b6 System.Web.Util.SynchronizationHelper.QueueSynchronous(System.Action)
000000fc0353d660 00007ff949022da1 System.Web.WebSocketPipeline+d__9.MoveNext()
000000fc0353d6f0 00007ff94902268f System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].Start[[System.Web.WebSocketPipeline+d__9, System.Web]](d__9 ByRef)
000000fc0353d7a0 00007ff9490225df System.Web.WebSocketPipeline.ProcessRequestImplAsync()
000000fc0353d860 00007ff949022435 System.Web.WebSocketPipeline.ProcessRequest()
000000fc0353d8b0 00007ff947b6ceb0 System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32)
000000fc0353da60 00007ff947b6c3e4 System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32)
000000fc0353daa0 00007ff947b6b8ab DomainNeutralILStubClass.IL_STUB_ReversePInvoke(Int64, Int64, Int64, Int32)
000000fc0353e2a0 00007ff9a6b07fde [InlinedCallFrame: 000000fc0353e2a0] System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0353e2a0 00007ff948157a7e [InlinedCallFrame: 000000fc0353e2a0] System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0353e270 00007ff948157a7e DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0353e330 00007ff947b6cdc1 System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32)
000000fc0353e4e0 00007ff947b6c3e4 System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32)
000000fc0353e520 00007ff947b6b8ab DomainNeutralILStubClass.IL_STUB_ReversePInvoke(Int64, Int64, Int64, Int32)
000000fc0353e6f8 00007ff9a6b08233 [ContextTransitionFrame: 000000fc0353e6f8]

This thread is waiting on a lock to be released as indicated by this line at the top of its stack trace:

000000fc0353ce58 00007ff9b0540c6a [HelperMethodFrame_1OBJ: 000000fc0353ce58] System.Threading.Monitor.Enter(System.Object)

Let’s perform a similar exercise for thread 0xad8.  It’s stack trace is as follows:

0:026> ~~[ad8]s
ntdll!NtWaitForMultipleObjects+0xa:
00007ff9`b0540c6a c3 ret
0:023> !clrstack
OS Thread Id: 0xad8 (23)
 Child SP IP Call Site
000000fc0071ccc8 00007ff9b0540c6a [GCFrame: 000000fc0071ccc8] 
000000fc0071cef0 00007ff9b0540c6a [GCFrame: 000000fc0071cef0] 
000000fc0071cf28 00007ff9b0540c6a [HelperMethodFrame: 000000fc0071cf28] System.Threading.Monitor.Enter(System.Object)
000000fc0071d020 00007ff9491de236 Phreesia.Redis.MessageBus.RedisMessageBus.OnRedisConnectionReestablished(System.Object, System.EventArgs)
000000fc0071d110 00007ff947942cd3 [MulticastFrame: 000000fc0071d110] System.EventHandler`1[[System.__Canon, mscorlib]].Invoke(System.Object, System.__Canon)
000000fc0071d170 00007ff9489c6485 ACompany.Redis.Client.RedisClient.InitializeConnection()
000000fc0071d1c0 00007ff948ea06e7 ACompany.Redis.Client.RedisClient.CheckForConnect()
000000fc0071d230 00007ff948ea4da0 ACompany.Redis.Client.RedisClient.get_Database()
000000fc0071d260 00007ff9491b817a ACompany.Redis.Common.RedisOnDemandSubscriptionManager.RemoveSubscription(System.String, System.String, System.String)
000000fc0071d370 00007ff9491b7d73 ACompany.Redis.WebSocket.Channel.OnDemandWebSocketHandler.OnClose()
000000fc0071d4b0 00007ff948e9e50c Microsoft.Web.WebSockets.WebSocketHandler+d__9.MoveNext()
000000fc0071d500 00007ff948e9e3ca Microsoft.Web.WebSockets.WebSocketHandler+d__9.MoveNext()
000000fc0071d560 00007ff948e9df75 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.Web.WebSockets.WebSocketHandler+d__9, Microsoft.WebSockets]](d__9 ByRef)
000000fc0071d610 00007ff948e9dece Microsoft.Web.WebSockets.WebSocketHandler.ProcessWebSocketRequestAsync(System.Web.WebSockets.AspNetWebSocketContext, System.Func`1>)
000000fc0071d6d0 00007ff949024c4d System.Web.WebSocketPipeline+c__DisplayClass9_0.b__0(System.Object)
000000fc0071d730 00007ff9490243f1 System.Web.Util.SynchronizationHelper.SafeWrapCallback(System.Action)
000000fc0071d780 00007ff9490242b6 System.Web.Util.SynchronizationHelper.QueueSynchronous(System.Action)
000000fc0071d7e0 00007ff949022da1 System.Web.WebSocketPipeline+d__9.MoveNext()
000000fc0071d870 00007ff94902268f System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib]].Start[[System.Web.WebSocketPipeline+d__9, System.Web]](d__9 ByRef)
000000fc0071d920 00007ff9490225df System.Web.WebSocketPipeline.ProcessRequestImplAsync()
000000fc0071d9e0 00007ff949022435 System.Web.WebSocketPipeline.ProcessRequest()
000000fc0071da30 00007ff947b6ceb0 System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32)
000000fc0071dbe0 00007ff947b6c3e4 System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32)
000000fc0071dc20 00007ff947b6b8ab DomainNeutralILStubClass.IL_STUB_ReversePInvoke(Int64, Int64, Int64, Int32)
000000fc0071e420 00007ff9a6b07fde [InlinedCallFrame: 000000fc0071e420] System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0071e420 00007ff948157a7e [InlinedCallFrame: 000000fc0071e420] System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0071e3f0 00007ff948157a7e DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, System.Web.RequestNotificationStatus ByRef)
000000fc0071e4b0 00007ff947b6cdc1 System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32)
000000fc0071e660 00007ff947b6c3e4 System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32)
000000fc0071e6a0 00007ff947b6b8ab DomainNeutralILStubClass.IL_STUB_ReversePInvoke(Int64, Int64, Int64, Int32)
000000fc0071e878 00007ff9a6b08233 [ContextTransitionFrame: 000000fc0071e878]

Similarly, this thread is waiting for a lock to be released as indicated at the top of its stack trace:

000000fc0071cf28 00007ff9b0540c6a [HelperMethodFrame: 000000fc0071cf28] System.Threading.Monitor.Enter(System.Object)

It gets tricky from this point on with Windbg at one cannot easily identify the lock identifies, which would have helped.  A deadlock means one thread is waiting for a lock to be released, but this thread also has acquiring and is holding a lock which another thread is waiting to be released.

So what can we make from the above?

It starts with thread 0x548 acquiring a lock on its call to RedisClient.Subscribe, let’s call this _redisMessageBusLock, as indicated below:

000000fc0353ce58 00007ff9b0540c6a [HelperMethodFrame_1OBJ: 000000fc0353ce58] System.Threading.Monitor.Enter(System.Object)
000000fc0353cf50 00007ff948ea067f Phreesia.Redis.Client.RedisClient.CheckForConnect()
000000fc0353cfc0 00007ff948ea05f0 Phreesia.Redis.Client.RedisClient.get_Subscriber()
000000fc0353cff0 00007ff948ea0540 Phreesia.Redis.Client.RedisClient.Subscribe(System.String, System.Action`2)   // we acquire _redisMessageBus lock here

Simultaneously, thread 0xad8, had first attempted to check for the Redis connection state in its call to CheckForConnect().

Servicing of this call, requires acquisition of a lock, we will call _redisClientLock. We acquire the lock, attempt to fix the Redis connection, if there is a problem, raise an event when the connection has been re-established, then release the lock.

In this case, there was an issue with the Redis connection, so we acquired the lock, _redisClientLock, fixed the connection and then raised the event OnRedisConnectionReestablished, while still holding the lock, _redisClientLock.

000000fc0071cf28 00007ff9b0540c6a [HelperMethodFrame: 000000fc0071cf28] System.Threading.Monitor.Enter(System.Object)    // we are waiting for redisMessageBusLock to be released
000000fc0071d020 00007ff9491de236 Phreesia.Redis.MessageBus.RedisMessageBus.OnRedisConnectionReestablished(System.Object, System.EventArgs)
000000fc0071d110 00007ff947942cd3 [MulticastFrame: 000000fc0071d110] System.EventHandler`1[[System.__Canon, mscorlib]].Invoke(System.Object, System.__Canon)
000000fc0071d170 00007ff9489c6485 Phreesia.Redis.Client.RedisClient.InitializeConnection()
000000fc0071d1c0 00007ff948ea06e7 Phreesia.Redis.Client.RedisClient.CheckForConnect()     // we acquired redisClientLock here

Since the event handling is synchronous, the same thread which raises the event, handles, it.  However, part of handling the event requires acquisition of redisMessageBusLock, which was already acquired by thread ox548.

Based on the above, our lock state looks like this:

Thread       LockAcquired           LockWaitingFor          
0xad8        _redisClientLock        _redisMessageBusLock     
0x548        _redisMessageBusLock    _redisClientLock

which clearly demonstrates out deadlock.

Happy debugging.

c# volatile demystified -my take

According to MSDN (http://msdn.microsoft.com/en-us/library/6kac2kdh.aspx)
a thread is the basic operating system entity that is used to execute a given task defined by a set of instructions. Therefore, a thread has its context, which includes all information it needs to resume execution such as set of CPU registers and stack of local variables, all in the address space of the thread’s host process.

Consider the following class:


public EncryptionService
{
public static readonly EncryptionService Instance = new EncryptionService();
private EncryptionService(){}
private int _counter = 0;
public int Encrypt(string input)
{
// run some encryption algorithm
return ++counter;
}
}

This class is singleton because it contains a private constructor and you can only access the single Instance. Since there is only a single instance of EncryptionService, there is a chance
that multiple threads could be running Encrypt and updating the local variable _counter.  In other words, the class is not thread safe.

Now, recall the thread context we talked about.  When a thread is created, the CLR allocates it 1MB stack, atleast for x86 CPU architectures.

The stack space is used for passing arguments to a method and for holding local variables defined in the method. Before executing a method the CLR executes prologue code [REF CLR via C#] that initializes the method, such as getting its execution and return address and allocating memory for its local variables on the thread’s stack, which is part of the thread’s execution context. When the thread executes more methods, its stack fills up with local variables from these methods until the method’s epilogue code runs clearing all of the variables that should not [or be] out of scope.

While the unit of execution is a thread, the physical entity which executes code, that have been translated to machine instructions is the CPU.  Naturally, this means that those variables defined on the thread’s stack are stored in the CPU cache.  When a thread is executing code on a specific CPU, its variables are therefore stored in this particular CPU’s cache.

So what exactly happens when multiple threads execute a method on a single instance of our EncryptionService?  We first need to understand how the above C# program is converted to assembly instructions which is what the CPU executes.

The Microsoft C# compiler generates Microsoft Intermediate Language (MSIL).  Which is what is stored in the assembly.  When the CLR loads this assembly and attempts to invoke the Decrypt method, it Just-in-Time (JIT) compilers the code to CPU specific assembly language instructions, storing this in a location in memory, at which point execution transfers to the memory location where the Decrpyt method is saved.

When the executing thread invokes Decrypt, is encounters a local variable called _counter stored at a certain memory location.  This variable has to be loaded from its memory location into the CPU’s register.  If this memory location was not already in any of the CPU’s caches, a cache miss occurs and this data and sorrounding bytes, as defined by the cache line size, is retrieved from memory and saved in the CPU cache, then loaded into the CPU register.  Then the CPU updates the register value, as part of the add instruction, and then stores the changed value in the respective cache. For this new value to become visible to other processes, or threads, it needs to be written back to main memory.  When exactly this happens is defined by what is known as a CPU write policy.

If you have two threads doing the same thing, it could happen that the values of _counter in each CPU’s cache is not consistent, since each CPU was working on its local local copy of the _counter variable in its registers/cache. Based on your use case, this could not be a desirable outcome.

More on Static Variables

When you decorate a variable as static, interesting things start to happen.

A static type is defined as part of the type definition. What this effectively means is that a static field is defined in the the header block of the type definition object.

This means that the bytes backing these static fields are part of the type’s header block. And since there is just one type header block, we have only one static field in the entire application domain.

Since a static field is backed by a memory block defined as part of the object’s type header which is at a fixed location on the managed heap, several threads can potentially override the work of each other.

This is a similar effect to having a s single instance object being operated upon my multiple threads without any synchronization mechanism.

On Chip Cache Memory
As noted before, a static member is defined at a fixed location in the object’s type header block in memory.  This means that instructions executing on the CPU need to fetch this static variable in memory to write or read from it.

To improve performance, today’s CPUs have something called a Cache Memory, which is a set of CPU registers defined on the CPU chip [ref??].  These are called Level caches, based on how close they are to the CPU.  There is an L1 cache, L2 and L3 cache, with the L1 cache resident on the CPU chip. “L” standing for level.   This means that instead of accessing RAM via the memory bus [TODO PICS], the CPU simply refers to its register cache to retrieve a value of a variable.

The first time a thread needs to read a value from memory such as our static variable, the CPU will read this value and various surrounding bytes, also known as the cache-line and store this cache line it in its on-chip cache.  You can imagine an on-chip cache as simply a dictionary of of key/value pairs of cache lines, keyed on memory address or address range.  These address keys are actually known as tag and each value of this key-value pair is a cache entry that comprises of actual data and some sort of flag indicating if this field has been updated from the last time it was read from memory.

 

When the value needs to be updated, the CPU updates its registry cache instead of writing directly to static field in memory. Periodically, the CPU then flushes out all of its content to memory (RAM).

Static Variables Concurrency
With all of this in mind, let us examine what happens with our static variable, within a multi-threaded, multi-core execution context.\


public EncryptionService

{

public static readonly EncryptionService Instance = new EncryptionService();
private EncryptionService(){}
private int _counter = 0;
public int Encrypt(string input)
{
// run some encryption algorithm
return ++counter;
}

public int Decrypt(string input)
{
// run some encryption algorithm
return ++counter;
}
}

Say thread A running on CPU A is about to execute Encrypt so it reads bytes in memory
for _counter variable. This variable is now in CPU A’s cache as shown below:

Now a second thread, thread B running on CPU B, attempts to execute Encrypt and
copys the value of _counter from memory, just before CPU A has had a chance to
flush out its update to the memory, recall this is a singleton.
The content of CPU B’s cache is now different from what is in memory for the value of _counter.

This is a classic cache concurrency problem and happens because CPU flushes happen at an unpredictable time in the future. Although some CPUs provide an ability to control this problem such as the IA64 with volatile read with write acquire semantics.

Volatile Read/Write
Recall a cache concurrency happens when one CPU updates a value in its on chip cache and before it has flushed this change to memory, another CPU has read the previous value from memory.  So the new flushed value is not reflected in the second CPU’s cache.

Volatile write means that a CPU writes a value of a register out to its cache and then flushes the entire content of the CPU cache to memory.   Interesting choice of word volatile.

Volatile read means that a CPU reads a value into memory and invalidates its cache?  Invalidating the cache means that this register value is no longer current and therefore, the next subsequent read of this value will come from memory.  Hence the word volatile, which means liable to change rapidly or unpredictably.  The fact that CPU register read and write is causing rapid flushing and invalidation of the CPU cache is reason why these are referred to as volatile operations, since CPU caches are meant to do otherwise.

There is also a Memory Fence which flushes the cache content to memory and then invalidates the cache.

How does this prevent cache concurrency?


public EncryptionService
{
public static readonly EncryptionService Instance = new EncryptionService();
private EncryptionService(){}
private int _counter = 0;
public int Encrypt(string input)
{
var current = Thread.VolatileRead(ref _counter); //1
Thread.VolatileWrite(ref _counter, current + 1); //2
return _counter;
}

public int Decrypt(string input)
{
// run some decryption algorithm
var current = _counter;
counter = current + 1;
return _counter;
}
}

With these changes, this is what happens now.

When thread A on CPU A executes line 1, it performs a volatile read meaning that is reads the current value of _counter from memory but invalidates it cache, so that the next subsequent read of this value comes from memory.  If thread B now modifies that same memory in its CPU register location, since the field is marked a volatile, CPU B will also flush out the cache line containing this variable to memory.

In C#, this is what methods VolatileRead and VolatileWrite do.

c# volatile To The Rescue

All of the above is nice but cumbersome to use so the C# team provided a simple keyword volatile that simply wraps reading and writing to a variable in volatile read/write.


public EncryptionService
{
public static readonly EncryptionService Instance = new EncryptionService();
private EncryptionService(){}
private volatile int _counter = 0;
public int Encrypt(string input)
{
var current = _counter; //1
_counter, current + 1; //2
return _counter;
}

public int Decrypt(string input)
{
// run some decryption algorithm
var current = _counter;
counter = current + 1;
return _counter;
}
}

This means that a CPU does not cache this variable and reads/writes directly to memory. So while this solves cache concurrency, there is still a problem since multiple threads can attempt to write to this memory location at the same time.

To prevent this you have to use C# thread synchronization constructs,

Note that this problem, while not the same as two threads writing to/reading from the same memory location, manifest the same way as an application problem. This is a problem resulting from running on a multi-core machine.

 

References:

http://computer.howstuffworks.com/computer-memory3.htm

http://arstechnica.com/gadgets/2002/07/caching/2/

http://en.wikipedia.org/wiki/CPU_cache

https://www.amazon.com/CLR-via-Second-Pro-Developer/dp/B002V1H004