Thread Abort and Critical Regions (in .Net)

Mar 182013

I don’t need to see the source code of an API to code against. In fact, I actively discourage against depending (even psychologically) on the inner details of an implementation. The contract should be sufficient. Of course I’m assuming a well-designed API with good (at least decent) documentation. But ~~sometimes~~ often reality is more complicated than an ideal world.

In this writing:

Empty try{}
Thread.Abort and finally
ThreadAbortException weirdness
AppDomain hosting
Rude Abort/Unload/Exit
Critical Finalization and CER
Conclusion
Comments (4)

Empty try{}

While working with System.Diagnostics.Process in the context of Parallel.ForEach things became a bit too complicated. (I’ll leave the gory details to another post.) What prompted this post was a weird pattern that I noticed while browsing Process.cs, the source code for the Process class (to untangle said complicated scenario).

RuntimeHelpers.PrepareConstrainedRegions();
try {} finally {
   retVal = NativeMethods.CreateProcessWithLogonW(
		   startInfo.UserName,
		   startInfo.Domain,
		   password,
		   logonFlags,
		   null,            // we don't need this since all the info is in commandLine
		   commandLine,
		   creationFlags,
		   environmentPtr,
		   workingDirectory,
		   startupInfo,        // pointer to STARTUPINFO
		   processInfo         // pointer to PROCESS_INFORMATION
	   );
   if (!retVal)
	  errorCode = Marshal.GetLastWin32Error();
   if ( processInfo.hProcess!= (IntPtr)0 && processInfo.hProcess!= (IntPtr)NativeMethods.INVALID_HANDLE_VALUE)
	  procSH.InitialSetHandle(processInfo.hProcess);
   if ( processInfo.hThread != (IntPtr)0 && processInfo.hThread != (IntPtr)NativeMethods.INVALID_HANDLE_VALUE)
	  threadSH.InitialSetHandle(processInfo.hThread);
}

This is from StartWithCreateProcess(), a private method of Process. So no surprises that it’s doing a bunch of Win32 native API calls. What stands out is the try{} construct. But also notice the RuntimeHelpers.PrepareConstrainedRegions() call.

Thinking of possible reasons for this, I suspected it had to do with run-time guarantees. The RuntimeHelpers.PrepareConstrainedRegions() call is a member of CER. So why the need to use empty try if we have the PrepareConstrainedRegions call? Regrettably, I confused it with the empty try clause. In reality, the empty try construct has nothing to do with CER and everything with execution interruption by means of ThreadAbortException.

A quick search hit Siddharth Uppal’s The empty try block mystery where he explains that Thread.Abort() never interrupts code in the finally clause. Well, not quite.

Thread.Abort and finally

A common interview question, after going through the semantics of try/catch/finally, is to ask the candidate if finally is always executed (since usually that’s the wording they use). Are there no scenarios where one could conceivably expect their code in finally never to get executed? A creative response is usually when we have an an “infinite loop” in the try or catch (if it gets called). Novice candidates are easily confused when they consider a premature termination of the process (or thread). After all, one would expect that there be some consistency in the behavior of code. So why shouldn’t finally always get executed, even in a process termination scenario?

It’s not difficult to see that there is a struggle of powers between the termination party and the code/process in question. If finally always executes, there would be no guarantees for termination. Yes, we cannot guarantee both that finally will always execute while guaranteeing that termination will always succeed. One or both must have weak guarantees (or at least weaker guarantees than the other). When push comes to shove, we (as users or administrators) want to have full control over our machines, so we choose to have the ultimate magic wand to kill and terminate any misbehaving (or just undesirable) process.

The story is a little bit different when it comes to aborting individual threads, however. Where on the process level the operating system can terminate it with a sweeping gesture, in the managed world things are more controlled. The CLR can see to it that any thread that is about to get terminated (by a call to Thread.Abort()) is done cleanly and respecting all the language and runtime rules. This includes executing finally blocks as well as finalizers.

ThreadAbortException weirdness

When aborting a thread, the apparent behavior is one of an exception thrown from within the thread in question. When another thread invokes Thread.Abort() on our thread, a ThreadAbortException is raised from our thread code. This is called asynchronous thread abort, as opposed to synchronous abort, when a thread invokes Thread.CurrentThread.Abort() (invariantly on itself). Other asynchronous exceptions include OutOfMemoryException and StackOverflowException.

The behavior, then, is exactly as one would expect when an exception is raised. The exception bubbles up the stack, executing catch and finally blocks as one would expect from any other exception. There are, however, a couple of crucial differences between ThreadAbortException and other Exceptions (with the exception of StackOverflowException, which can’t be caught at all). First, this exception can’t be suppressed by simply catching it – it is automatically rethrown right after exiting the catch clause that caught it. Second, throwing it does not abort the running thread (it must be done via a call to Thread.Abort()).

The source for this behavior of ThreadAbortException is the abort requested flag, which is set when Thread.Abort() is invoked (but not when it is thrown directly). CLR then checks for this flag at certain check-points and proceeds to raise the exception, which normally is raised between any two machine instructions. This guarantees that the exception will not get thrown when executing a finally block or when executing unmanaged code.

So the expectation of the novice interviewee (and Mr. Uppal’s) was right after all. Except, it wasn’t. We are back full circle to the problem between the purpose of aborting a thread, and the possibility of an ill behaved code never giving up at all. I am being too generous when I label code that wouldn’t yield to a request to abort as “ill behaved.” Because ThreadAbortException is automatically rethrown from catch causes, the only way to suppress it is to explicitly call Thread.ResetAbort() which clears the abort requested flag. This is intentional as developers are in the habit of writing catching-all clauses very frequently.

AppDomain hosting

So far we’ve assumed that we just might need to terminate a process, no questions asked. But why would one need to abort individual threads within a process? The answer lies with hosting. In environments such as IIS or SQL servers, the server should be both fast and reliable. This led to the design of compartmentalizing processes beyond threads. AppDomain groups processing units within a single process such that spawning new instances is fast (faster than spawning a complete new process,) but at the same time it’s grouped such that they can be unloaded on demand. When an AppDomain instance takes longer than the configured time (or consumes some resource more than it should,) it’s deemed ill-behaved and the server will want to unload it. Unloading includes aborting every thread within the AppDomain in question.

The problem is yet again one of conflict between guarantees. This time, though, the termination logic needs to play along, or else. When terminating a process, the complete process memory is released, along with all its system resources. If the managed code or CLR don’t do that, the operating system will. In a hosted execution environment, the host wants to have full control over the life-time of an AppDomain (with all its threads,) all the while, when it decides to purge of it, it does not want to destabilize the process or, worse, itself or the system at large. When unloading an AppDomain, the server wants to give it a chance to cleanup and release any shared resources, including files and sockets and synchronization objects (i.e. locks,) to name but a few. This is because the process will continue running, hopefully for a very long time. Hence the behavior of ThreadAbortException that calls every catch and finally as it should.

In return, any process that wants to play rough gets to call Thread.ResetAbort() and go on with its life, thereby defeating the control that the server enjoyed. The server invariantly has the upper hand, of course. After a second limit is exceeded, after invoking Thread.Abort(), in the words of Tarantino, the server may “go medieval” on the misbehaving AppDomain.

Rude Abort/Unload/Exit

When a thread requested to abort doesn’t play along, it warrants rudeness. The CLR allows a host to specify escalation policy in similar events, such that the host would escalate a normal thread abort into a rude thread abort. Similarly, a normal AppDomain unload and process exit may be escalated to a rude ones.

But we all know that the server doesn’t want to be too inconsiderate. It wouldn’t want to jeopardize its stability in the wake of this arms race between it and the hosted AppDomain. For that, it wants to have some guarantees. More stringent guarantees from the code in question that it will not misbehave again, when given half a chance. One such guarantee is that the finalization code will not have ill side-effects.

In a rude thread abort event, the CLR forgoes calling any finally blocks (except those marked as Constrained Execution Regions, or CER for short) as well as any normal finalizer. But unlike mere finally blocks, finalizers are a much more serious bunch. They are functions with consequences. Recall that finalization serves the purpose of releasing system resources. In a completely managed environment, with garbage collection and cleanup, the only resources that needs special care are those that aren’t managed. In an ideal scenario, one wouldn’t need to implement finalizers at all. The case is different when we need to wrap a system resource that is not managed (this includes native DLL invoking). All system resources that are represented by the framework are released precisely using a finalizer.

Admittedly, if we are developing standalone applications, as opposed to hosted, we don’t have to worry about the possibility of escalation and rude abort or unload. But then again, why should we worry about Thread.Abort() at all in such a case? Either our code could issue such a harsh request, which we should avoid like the plague and opt to more civil cancellation methods, such as raising events or setting shared flags (with some special care), or, our code is in a library that may be called either from a standalone application or a hosted one. Only in the latter case must we worry and prepare for the possibility of rude abort/unload.

Critical Finalization and CER

Dispose() is called in finally blocks, either manually us via the using clause. So the only correct way to dispose objects during such an upheaval is to have finalizers on these objects. And not just any finalizer, but Critical Finalizers. This is the same technique used in SafeHandle to ensure that native handles are correctly released in the event of a catastrophic failure.

When things get serious, only finalizers marked as safe are called. Unsurprisingly, attributes are used to mark methods as safe. The contract between CLR and the code is a strict one. First, we communicate how critical a function is, in the face of asynchronous exceptions by marking their reliability. Next, we void our rights to allocate memory, which isn’t trivial, since this is done transparently in some cases such as P/Invoke marshaling, locking and boxing. In addition, the only methods we can call from within a CER block are those with strong reliability guarantees. Virtual methods that aren’t prepared in advance cannot be called either.

This brings us full circle to RuntimeHelpers.PrepareConstrainedRegions(). What this call does is it tells the CLR to fully prepare the proceeding code, by allocating all necessary memory, ensuring there is sufficient stack space, JITing the code, which completely loads any assemblies we may need.

Here is a sample code that demonstrates how this works in practice. When the ReliabilityContract attribute is commented out, the try block is executed before the finally block, which fails. However, with the ReliabilityContract attribute, the PrepareConstrainedRegions() call fails to allocate all necessary memory beforehand and therefore doesn’t even attempt to execute the try clause, nor the finally, instead the exception is thrown immediately.

There are three forms to execute code in Constrained Execution Regions (from the BCL Team Blog):

ExecuteCodeWithGuaranteedCleanup, a stack-overflow safe form of a try/finally.
A try/finally block preceded immediately by a call to RuntimeHelpers.PrepareConstrainedRegions. The try block is not constrained, but all catch, finally, and fault blocks for that try are.
As a critical finalizer – any subclass of CriticalFinalizerObject has a finalizer that is eagerly prepared before an instance of the object is allocated.
- A special case is SafeHandle’s ReleaseHandle method, a virtual method that is eagerly prepared before the subclass is allocated, and called from SafeHandle’s critical finalizer.

Conclusion

Normally, CLR guarantees clean execution and cleanup (via finally and finalization code) even in the face of asynchronous exceptions and thread abort. However it does preserve the right to take a harsher measure if the host escalates things. Writing code in finally blocks to avoid dealing with the possibility of asynchronous exceptions, while not the best practice, will work. When we abuse this, by reseting abort requests and spending too long in finally blocks, the host will escalate things to rude unload and will aggressively rip the AppDomain with all its threads, bypassing finally blocks, unless in a CER block, as the above code did.

So, finally blocks are not executed when a rude abort/unload is in progress (unless in CER), when the process is terminated (by the operating system), when an unhandled exception is raised (typically in unmanaged code or in the CLR) or in background threads (IsBackground == true) when all foreground threads have exited.

March 18, 2013
Posted by Ashod Nakashian at 1:41 am
Add comments
Best Practice, Guides, Programming
Tagged with: .Net, Async Exception, Constrained Execution Region, Exception Handling, Thread Abort
Font Size:
A A A

4 Responses to “Thread Abort and Critical Regions (in .Net)”

Async I/O and ThreadPool Deadlock (Part 1) » the Void says:

April 4, 2013 at 3:25 pm