Ashod Nakashian

Nov 112013
 

I have recently built a home NAS / HTPC rig. Here is my experience from research, building it, and experimentation, for the interested in building a similar solution or to improve upon it… or just for the curious. I ended up including more details and writing longer than I first planned, but I guess you can skip over the familiar parts. I’d certainly have appreciated reading a similar review or write up while shopping and researching.

Reds-New

6x 3TB WD RED drives cut from the same stock.

I have included a number of benchmarks to give actual performance numbers. But first, the goals:

Purpose: RAID array with >= 8TiB usable space, to double as media server.

Cost: Minimum with highest data-security.

Performance requirements: Low write loads (backups, downloads, camera dumps etc.), many reads >= 100mbps with 2-3 clients.

TL;DR: ZFS on Linux, powered by AMD A8 APU, 2x4GB RAM and 2x16GB flash drive in Raid-1 boot drive, backed by 6x3TB WD Red in Raidz2 (Raid-6 equivalent) for under $1400 inc. tax + shipping. The setup was the cheapest that met or exceeded my goals and is very stable and performing. AMD’s APUs are very powerful for ZFS and transcoding video, RAM is sufficient for this array size and flash drives are more than adequate for booting purposes (with the caveat of low reliability in the long-run). ZFS is a very stable and powerful FS with advanced and modern features at the cost of learning to self-serve. The build gives 10.7TiB of usable space with double-parity and transparent compression and deduplication, with ample power to transcode video, at a TCO of under $125 / TiB (~2x raw disk cost). Reads/Writes are sustained above 350MB/s with gzip-9 but without deduplication.

This is significantly cheaper than ready NAS solutions, even when ignoring all the advantages of ZFS and generic Linux running on a modern quad core APU (compare with Atom or Celeron that often plague home NAS solution).

Research on hardware vs. software RAID

The largest cost of a NAS are the drives. Unless one wants to spend an arm and a leg on h/w raid cards, that is. Regardless, the drives were my first decision point. Raid-5 gives 1 disk redundancy, but due to high resilver time (rebuilding a degraded array) chances of secondary failure increase substantially during the days and sometimes weeks that resilvering takes. Drive failures aren’t random or independent occurrences, as drives from the same batch, used in the same array, tend to have very similar wear and failure reasons. As such, Raid-5 was not good enough for my purposes (high data-security). Raid-6 gives 2 drive redundancy, but requires a minimum of 5 drives to make it worthwhile. With 4 drives, 2 used for parity, Raid-6 is about as good as a mirror. Mirrors however have the disadvantage of being susceptible to failure if the 2 drives that fail happen to be the same on both sides of the mirror. Raid-6 is immune to this, but IO performance is significantly poorer than mirroring.

Resilver time could be reduced significantly with true h/w raid cards. However they cost upwards of $300 for the cheapest and realistically $600-700 for the better ones. In addition, most give peak performance with backup battery, which costs more and will need some maintenance I expect. At this point I started comparing h/w with s/w performance. My personal experience with software data processing told me that a machine that isn’t under high-load, or is dedicated for storage, should do at least as good as, if not better than, the raid card’s on-board processor. After all, the cheapest CPUs are much more powerful, have ample fast cache and with system RAM that is in the GBs running at least at 1333 Mhz, they should beat h/w raid easily. The main advantage of h/w raid is that, with the backup battery, it can flush cached data to disk even on power failure. But this assumes that the drives still have power! For a storage box, everything is powered from the same source. So the same UPS that will keep the drives spinning will also keep the CPU and RAM pumping data long enough to flush the cache to disk. (This is true provided no new data is being written when on battery power.) The trouble with software raid is that there is no abstraction of the disks to the OS, so it’s much harder to maintain (the admin will maintain the raid in software). Also, resilvering with h/w will probably be lighter on the system as the card will handle the IO without affecting the rest of the system. But, I was to accept the performance penalty I was probably going to meet my goals even during resilvering. So I decided to go with software Raid.

The best software solutions were the following: Linux Raid using MD, BtrFS or ZFS. The first is limited to traditional Raid-5 and Raid-6 and is straightforward to use, is bootable and well-supported, but it lacks any modern features like deduplication, compression, encryption or snapshots. BtrFS and ZFS have these features, but are more complicated to administer. Also, BtrFS is still not production-ready, unlike ZFS. So ZFS it was. Great feedback online on ZFS too. One important note on software raid is that they don’t play well with h/w raid cards. So if there is a raid controller between the drives and the system, it should be set to bypass the devices or work in JBOD mode. I’ll have more to say on ZFS in subsequent posts.

To reach 8TiB with 2 drive redundancy I had to either go with 4x4TB drives or 5x3TB. But with ZFS (RaidZ) growing the array is impossible. The only solutions are to either add larger drives (one at a time and resilver until all have larger capacity and then the new space becomes available to the pool,) or, to create a new vdev with a new set of drives and extend the pool. While simply adding a 6th drive to the 5 existing would have been a sweet deal, when the time comes I could upgrade all drives with larger one and enjoy new drive longevity and extra disk space. But first I had to have some headroom, so it was either 5x4TB = 12TB or 6x3TB=12TB.

Which drives? The storage market is still recovering from the Thailand flood. Prices are coming down, but they are still not great. The cheapest 3TB are very affordable, but they are 7200 rpm. Heat, noise and power bill are all high with 6 drives in an enclosure. They also come with 1 year warranty! Greens are cool and low on power requirements, but they are more expensive. The warranty isn’t much better, if at all longer. WD Red drives cost ~$20 more than the greens, come with all the advantages of 5400 rpm drive, are designed for 24/7 operation and have 3 year warranty. The only disadvantage of 5400 rpm drives is lower IOPS. But 7200 rpm doesn’t make a night-and-day difference anyway. Considering that it’s more likely than not that more than 1 drive will fail during 3 years, the $20 premium is a warranty purchase if not for the advantage of NAS drive vs. home usage. There was no 4 TB Red to consider at the time (Seagate and WD have “SE” 4TB drives that cost substantially more), although the cost per GB would be the same, it was outside my budget (~$800 for the drives) and I didn’t have immediate use for 16TB space of usable to justify the extra cost.

Computer hardware

Inside-Wide

AMD’s APU with its tiny heatsink and 2x 4GB DIMMs. Noticeable is the absence of a GPU (which is on the CPU chip).

I wanted to buy the cheapest hardware I could buy to do the job. I needed no monitor, just an enclosure with motherboard, ram and cpu. The PSU had to be solid to supply clean, stable power to the 6 drives. Rosewill’s Capstone is one of the best on the market at a very good price. The 450W version delivers continuous 450W power, not peak (which is probably in the ~600W range). I only needed ~260W continuous plus headroom for initial spin up. The case had to be big enough for the 6 drives + front fans for keeping the already cool-running drives even cooler (drive temperature is very important for data integrity and drive longevity). Motherboards with > 6 SATA ports are fewer and typically cost significantly more than with 6 or less. With 6 drives in raid, I was missing a boot drive. I searched high and low for a PCI-e SSD, but it seems there was nothing on offer for a good price, not even used ones (even the smallest ones were very expensive). Best price was for WD Blue 250GB (platter) for ~$40, but that was precious port that it would take, or cost me more in motherboard with 7 SATA ports. My solution was to use a flash drive. They are SSD and they come in all sizes and at all prices. I got two A-DATA 16GB drives for $16 each, thinking I’d keep the second one for personal use. It was after I placed the order that I thought I should RAID-1 the two drives to get better reliability.

With ZFS, RAM is a must, especially for deduplication (if desired). It is recommended to have 1-2GB for each TB of storage. So far, I see ~145 bytes / block used on core (RAM) which for ~1.3TiB of user data in 10million blocks = 1382MB of RAM. Those 10million blocks were used by <50K files (yes, mostly documentaries and movies at this point). The per-block requirement goes down with increased duplicate blocks, so it’s important to know how much duplication there are in those 1.3TiB. In this particular case, almost none (there were <70K duplicate bocks in those 10million). So if this was all the data I had, I should disable dedup and save myself RAM and processing time. But I still have ~5TiB of data to load with all of my backups, which sure have a metric ton of dups in them. Bottom line: 1GB of ram per 1TiB of data is a good rule of thumb, but it looks like it’s a worse-case scenario here (and that would leave little room for file caching). So I’m happy to report that my 8GB ram will do OK all the way to 8TiB of user data and realistically much more as I certainly have duplicates (yes, had to go for 8GB as budget and RAM price hike of 40-45% this year alone didn’t help. Had downwards price trends continued from last year, I should have gotten 16GB for almost the same dough). Updates on RAM and performance below.

CPU wise, nothing could beat AMD APU, which includes Radeon GPU, in terms of price/performance ratio. I could either go for dual core at $65 or quad core for $100 and upgrade L2 cache from 1MB to 4MB and better GPU core. I went for the latter to future proof video decoding and transcoding and give ZFS ample cycles for compression, checksum, hashing and deduplication. The GPU in the CPU also loves high-clock RAM. After shopping for a good pair of RAMs that work in dual-pump @ 1600Mhz, I found 1866Mhz ones for $5 more that is reported to clock to over 2000Mhz. So G.Skill wins the day yet again for me as is the case on my bigger machine with 4x8GB @ 1866 G.Skill. I should add that my first choice in both cases had been Corsair as I’ve been a fan for over a decade. But at least on my big build they failed me as they weren’t really quad-pump (certainly not at the reported frequency and G.Skill has overclocked to 2040Mhz from 1866Mhz while the CPU is at 4.6Ghz, but that the big boy, not this NAS/HTPC).

Putting it all together

I got the drives for $150 each and the PC cost me about $350. Two 16GB USB3.0 flash drives are partitioned for swap and rootfs. Swap is on Raid-0, ext4 on Raid-1. Even though I can boot off of ZFS, I didn’t want to install the system on the raid array, in case I need to recover it. It also simplifies things. The flash drives are for booting, really. /home, /usr, and /var should go on ZFS. I can backup everything to other machines and all I’d need is to dd the flash drive image onto a spare one to boot that machine in case of a catastrophic OS failure. Also, I keep a Linux Rescue disk on another flash drive at hand at all times. The rescue disk automatically detects MD partitions and will load mdadm and let me resilver a broken mirror. One good note is to set mdadm to boot in degraded mode and rebuild or send an email to get your attention. You probably don’t want to go in with a rescue disk and a blank screen to resilver the boot raid.

The 6 Red drives run very quietly and are just warmer than the case metal (when ambient is ~22C), enough that they don’t feel metallic-cold at sustained writing thanks to two 120mm fans blowing on them. Besides that, no other fans are used. AMD comes with an unassuming little cooler that is less loud than my 5 year old laptop. A-Data in raid gives upwards of 80MB of sustained reads (average over full drive dd read) and drop to ~19MB of sustained write speed. Bursts can reach 280MB/s and writes a little over 100MB/s. Ubuntu Minimal 13.04 was used (which comes with Kernel 3.8) and kernel upgraded to 3.11, ZFS 28 (the latest available for Linux, Solaris is at 32, which has transparent encryption) installed and 16.2TiB of raw disk space reported after Raidz-2 zpool creation and 10.7TiB of usable space (excluding parity). The system boots faster than the monitor (a Philips 32” TV) turns on (that is to say in a few seconds). The box is connected with a Cat-5 to the router, which has a static IP assigned to it (just for my sanity).

I experimented with ZFS for over a day just to learn to navigate it while reading on it before scratching the pool for the final build. Most info online are out of date (from 2009 when deduplication was all the rage in FS circles) so care must be taken when reading about ZFS. Checking out the code, building and browsing it certainly helps. For example, online articles will tell you Fletcher4 is the default checksum (it is not) and that one should use it if they want to improve dedup performance (instead of the much slower sha256), but the code will reveal that deduplication is defaulted and forced to sha256 checksum and that is the default even for on-disk checksums for integrity checks. Therefore, switching to Fletcher4 will only increase the risk of on-disk integrity checking, without affecting deduplication at all (Fletcher4 was removed from the dedup code when a severe bug was found due to endianness). The speed should only be worse with Fletcher4 if dedup is enabled because now both checksums must be done (without dedup Fletcher4 should improve the performance at the cost of data security as Fletcher4 is known to have a much higher collision rate than sha256).

ZFS administration is reasonably easy and it does all the mounting transparently for you. It also has smb/nsf sharing administration built-in, as well as quotas and acl support. You can set up as many filesystems as necessary anywhere you like. Each filesystem looks like a folder with subfolders (the difference between the root of a filesystem within another or just a plain subfolder is not obvious). The advantage is that each filesystem has its own settings (which are inherited from the parent by default) and statistics (except for dedup stats, which are pool-wide). Raid performance was very good. Didn’t do extensive tests, but sustained reads reached 120MB/s. Ingesting data from 3 external drives connected over USB 2.0 is running at 100GB/hour using rsync. Each drive is writing into a different filesystem on the same zpool. One is copying RAW/Jpg/Tif images (7-8MB each) on gzip-9 and two copying compressed Video (~1-8GB) and SHN/FLAC/APE audio (~20-50MB) on gzip-7. Deduplication is enabled. Checksum is SHA256. ZFS has background integrity check and auto-rebuild of any corrupted data on disk which does have a non-negligible impact on the write rates as the Red drives could do no more than ~50 random read IOPS and ~110 random write IOPS, but for the aforementioned load each levels at ~400 IOPS per drive since most writes are sequential. These numbers fluctuate with smaller files such that the IOPS drops down to 200-250 per drive and average ingestion is 1/3rd at ~36GB/hour. This is mostly due to the FS overhead on reads and writes that force much higher seek rates vs sequential writes. CPU is doing ~15-20% user and ~50% kernel, leaving ~25% idle on each of the 4 cores at peak times and drops substantially otherwise. Reading iostats show about 30MB/s sustained reading rate from the source drives combined and writes on the Reds that average 50MB/s but spike at 90-120MB/s (this includes parity which is 50% of the data and updates of FS structure, checksums etc.)

Back

2x 16GB flash drives in RAID-1 as boot drive and HDMI connection.

UPDATE: Since I wrote the above, it’s been 3 days. I now have over 2TiB of data ingested (I started fresh after the first 1.3TiB). The drives sustain at a very stable ~6000KB/s writes and anywhere between 200 and 500 IOPS (depending on how sequential they are). Typically it’s ~400 IOPS and ~5800KB/s. This translates into ~125GiB/hour (about 85GiB/hour of user data ingestion), including parity and FS overhead. Even though with gzip-9 and highly compressible data the rate or writing goes down, I now am writing from 4 threads and the drives are saturated at the aforementioned rates. So at this point I’m fairly confident the bottleneck of ingestion are the drives. Still, 85GiB/hour is decent for the price tag. I haven’t done any explicit performance tests because that was never in my goals. I’m curious to see the raw read/write performance, but this isn’t a raw Raid setup, so the filesystem overhead is always in the equation and that will be very much variable as data fills up and dedup tables grow. So the numbers wouldn’t be representative. Still, I do plan to do some tests when I ingest my data and have a real-life system with actual data.

Regarding compression, high-compression settings affect only performance. If the data is not compressible, the original data is stored as-is and no penalty for reading it is inured (I read the code,) nor is there extra overhead in storage (incompressible data typically grows a bit when compressed). So for archival purposes the penalty is slower ingestion speed. Unless modification will happen, slow and good compression is a good compromise as it does yield a few % points compression even on mp3 and jpg files.

With Plex Media Server installed on Linux, I could stream full HD movies over WiFi (thanks to my Asus dual-band router) transparently while ingesting data. Haven’t tried heavy transcoding (say HD to iphone) nor have I installed windows manager on Linux (AMD APUs show up in forums with Linux driver issues, but that’s mostly old and for 3D games etc.) Regarding the overhead of dedup, I can disable dedup per filesystem and remove the overhead for folders that don’t benefit anyway. So it’s very important to design the hierarchy correctly and have filesystems around file types. Worst case scenario: upgrade to 16GB RAM, which is the limit for this motherboard (I didn’t feel the need to pay an upfront premium for a 32GB max MB).

I haven’t planned a UPS. Some are religious about availability and avoiding hard power cuts. I’m more concerned about the environmental impact of batteries than anything else. ZFS is very resilient to hard reboots, not least thanks to its journaling and data checksums and background scrubbing (validating checksums and rebuilding transparently). I had two hard recycles that recovered transparently. I also know ztest which is a dev test tool does all sorts of crazy corruptions and kills and it’s reported that in 1million tests no corruptions were found.

Conclusion

For perhaps anything but the smallest NAS solutions, a custom build will be cheaper and more versatile. The cost is the lack of warranties of satisfaction (responsibility is on you) and the possibility of ending up with something underpowered or worse. Maintenance might be an issue as well, but from what I gather ready NAS solutions are known to be very problematic, especially when they show any issues, like failed drive or buggy firmware or management software. ZFS proved, so far at least, to be fantastic! Especially that deduplication and compression really work well and increase the data density without compromising integrity. I also plan to make good use of snapshots, which can be configured to auto-snapshot with a preset interval, for backups and code. I only miss transparent encryption from ZFS on Linux (Solaris got it, but it hasn’t been allowed to trickle down yet.) Otherwise, I couldn’t be more satisfied (except may be with 16GB RAM, or larger drives… but I would settle for 16GB RAM for sure.)

Parts

PCPartPicker part list: http://pcpartpicker.com/p/1GqNv
Price breakdown by merchant: http://pcpartpicker.com/p/1GqNv/by_merchant/
Benchmarks: http://pcpartpicker.com/p/1GqNv/benchmarks/

CPU: AMD A8-5600K 3.6GHz Quad-Core Processor ($99.99 @ Newegg)
Motherboard: MSI FM2-A75MA-E35 Micro ATX FM2 Motherboard ($59.99 @ Newegg)
Memory: G.Skill Sniper Series 8GB (2 x 4GB) DDR3-1866 Memory ($82.99 @ Newegg)
Storage: Western Digital Red 3TB 3.5″ 5400RPM Internal Hard Drive ($134.99 @ Newegg)
Storage: Western Digital Red 3TB 3.5″ 5400RPM Internal Hard Drive ($134.99 @ Newegg)
Storage: Western Digital Red 3TB 3.5″ 5400RPM Internal Hard Drive ($134.99 @ Newegg)
Storage: Western Digital Red 3TB 3.5″ 5400RPM Internal Hard Drive ($134.99 @ Newegg)
Storage: Western Digital Red 3TB 3.5″ 5400RPM Internal Hard Drive ($134.99 @ Newegg)
Storage: Western Digital Red 3TB 3.5″ 5400RPM Internal Hard Drive ($134.99 @ Newegg)
Case: Cooler Master HAF 912 ATX Mid Tower Case ($59.99 @ Newegg)
Power Supply: Rosewill Capstone 450W 80 PLUS Gold Certified ATX12V / EPS12V Power Supply ($49.99 @ Newegg)
Other: ADATA Value-Driven S102 Pro Effortless Upgrade 16GB USB 3.0 Flash Drive (Gray) Model AS102P-16G-RGY
Other: ADATA Value-Driven S102 Pro Effortless Upgrade 16GB USB 3.0 Flash Drive
Total: $1147.89
(Prices include shipping, taxes, and discounts when available.)
(Generated by PCPartPicker 2013-09-22 12:33 EDT-0400)

Share
May 032013
 

I was probably never going cross paths with or hear about the five-year-old boy who shot his two-year-old sister dead, nor any of her parents. Odds are, most people on the planet wouldn’t know about them had it not been for the story that hit the news.

Unbeknownst to me, I had gotten in an argument with a pro-gun who hid his affection rather well, all the while I thought we were having a casual conversation.  As tragic as this is, and as a parent I can identify with the grief of losing a child. But I cannot feel sad any more than I can understand what a parent must feel knowing it wasn’t an accident out of their control, rather it was precisely a consequence of their upbringing. “No, it’s sad. It’s very sad.” I was told. To me sad is that nearly 9 million child dies every single year of malnutrition and other trivially-curable complications or diseases, I answered. “You aren’t sad for the 9 million dying children?”

No, I’m not sad. You know what Stalin said? He said, ‘a single death is sad, millions dead is a statistic.

Yes, that was the response I got.

And before I knew it, I got the argument for guns: Cars kill more people than guns, but you don’t want to ban cars, do you?

Before we get too worked up, let’s separate these two points: Emotions developed on hearing stories of unknown individuals, as tragic as they may be, is one thing, not having good arguments to defend a position and instead repeating bad arguments is a completely different matter.

Hume Rolls in his Grave

Kentucky State Police Trooper, Billy Gregory, said “in this part of the country, it’s not uncommon for a five-year-old to have a gun or for a parent to pass one down to their kid.” Regardless of whether I am pro gun or not, I must recognize one thing: Guns are dangerous.

“Passing down” guns to a five-year-old inherently and inevitably implies taking a certain risk. The risk of the gun going off, whether intentionally or accidentally. Failing to recognize this simple fact is akin to covering one’s face when losing control of their car. There might be multiple ways to resolve a problem, but ignoring it couldn’t be one of them.

If I start drinking and gambling, I shouldn’t expect anyone to be surprised when I lose everything and end up on the streets. I shouldn’t expect anyone to feel sad for my stupidity and bad choices. They might as well laugh at my surprise at the outcome. If I hand my twelve-year-old the car keys, should I or anyone else find it odd when they crash the car and damage people and property? Should I expect pity from others if the car crashes into my house damaging and injuring me?  Similarly, guns and children do not produce an infinite output of combinations: there are very few things that we should expect to happen from the marriage and we only hope it’s going to be playground fun. But hoping is no precaution.

I find it borderline humorous that people systematically give their children guns and then the whole world is gaping at the death of a child. I find it inevitable, unless steps are taken to prevent it. Like everyone else, I have limited energy, both emotionally and otherwise, and I prefer to spend them on preventable causes that affect countless more children, equally fragile, equally lovable and equally rightful to life.

Just because we find it easier to write off millions of deaths to statistics doesn’t make it right. I’m sure Stalin had other apt utterance worthy of quoting in the light of the massacres, deportations and cold-blood killings that he sanctioned. But should we take comfort in the coldness of the indifference that we may feel at the death of millions of children who, like the victim in this case, haven’t yet seen their fifth birthday? Does Stalin’s ludicrous indifference have any bearing on how one feels or should feel?

Stalin’s quote was at once shocking and baffling to me. I didn’t know if using it was an excuse and justification for one’s feelings, or lack thereof, or it was a Freudian slip. Either way, just because something is doesn’t imply what it ought to be, morally speaking. Perhaps we should start feeling sad about these children of have-not parents. The children dying of famine have only nature and our inaction to blame. The five-year-old who killed his sister, in contrast, has his parents to blame for preferring to buy him a gun (or at least allowing him to own one) instead of a multitude other things they could have done, not least buying him a book to read and learn from in the hope of bettering himself and his society.

I cannot feel sad for the decisions of others, any more than I can prevent them from taking these decisions. However the same couldn’t be said of the children dying of malnutrition and lack of clean water. In the later case I can prevent it, and my (collectively our, really) inaction to save a single more child is sad indeed.

At least in one sense he was right, though. We cannot begin to imagine anything in the millions, but a single child with a picture in the news is readily reachable. But that only speaks of our limitations of being human, and hopefully not of our inhumanity.

Guns and Cars

I have heard many decent arguments for crazy things, including keeping slaves and leaving women out of the workforce (and typically in the kitchen) among others. Here “decent” doesn’t mean acceptable or justifiable, rather that the point in an of itself having a merit. They fail because taken in the full context of the issue, a single argument for or against something as complex as these topics doesn’t simply have enough weight.

Slavery had many benefits to slaves, not least steady income, job security and living space. And at least some women will not mind if given half a chance to be relieved of the burden of providing for oneself and their family entails. Women aren’t unique in wishing for an easier lifestyle than working forty-hour-weeks.

But these arguments fail to resolve the issue one way or the other because they are incomplete. They shed light on a single aspect and it’s a very narrow one at that as well. Cars do kill people, perhaps much more people than guns (clearly here we are ignoring wars). I’ve read numbers as high as 500,000 annual deaths from car accidents.

Do I want to ban cars for this huge loss that they cause? Yes, and in a sense we do already. The traffic and car licensing laws have evolved in response to both the dangers that are inherent in driving and the exploding number of cars and motorists. Driving under the influence of alcohol (given a certain allowance, if at all) is a grave offense in many states and countries and can be a felony if others are injured. Multiple offenses typically result in revoking the license and often sentencing to jail.

More importantly, the argument is weak and irrelevant because it appeals to one’s disposition, bias and shortcomings of undermining the perils of cars. Indeed, many of us cringe upon hearing about spiders and snakes, let alone seeing one, but may jaywalk in heavy traffic, sometimes with children.

We should avoid driving whenever we can and we should have better laws, education, responsible drivers and car owners as well as better traffic rules to minimize their risks. But we should also do the same for guns. Giving them to kids should simply be an offense no less sever than letting a minor drive your car. Having a gun gifted to a child, by maker called “My first rifle,” and then pretending that the gun will be locked in a safe is simply avoiding to see things for what they are. Children are attached to their toys and I guess that’s the point of manufacturing guns for them in the first place – they are expected to become loyal gun owners for many more years.

All this pretending that guns have benefits to society on equal footing to cars to justify their risks. I am not willing to give such a blank license to cars and will demand improving the situation to avoid unnecessary injury and loss of life from car accidents. But the onus for proving the benefits of guns to be even remotely comparable to those of cars is certainly not on me. Let alone the benefit of guns to kids.

When someone kills another with their car, they cannot claim that injury is a risk we’ve come to accept, so why prosecute them anymore than a gun owner can claim the same. But let’s not pretend that owning guns is a right without restrictions, because being responsible is all about restrictions, first to oneself before others.

We will never take responsibility if we don’t see the inherent dangers of our choices and if we don’t understand that both car and gun deaths are preventable and they are both of our choosing, and as long as we view our actions vicariously.

Evidently, the grandmother of the now-dead two-year-old has a different understanding of cause and effect than mind. “It was God’s will. It was her time to go, I guess, I just know she’s in heaven right now and I know she’s in good hands with the Lord.” She said.

I feel sorry for the five-year-old for having the parents he has, and I can only hope he will not repeat the same mistakes when the roles are reversed.

Share
Apr 202013
 

I just got my Raspberry Pi a couple of weeks ago from Newark/Element14. First impressions are great!

All I needed was an SD card and the HDMI cable that came with the TV set. Downloaded “wheezy” and placed on the SD card. Hooked my favorite toy, the “remote control” (a.k.a. Logitech K400) and I had a complete system with little effort. I hooked a mini USB cable from the Pi to the TV to power it. Flawless first boot.

Out of the box it has everything essential, including web browsers, Python Idle and a Scratch – a puzzle-like game development platform that the kids love. Hooking a Cat5 to the router and internet. Apt-getting mplayer and Gnash (open flash player). Video is unusable without the hardware acceleration, which requires license fees to enable the codecs, though. However internet radio is very much within reach. Depending on the bitrate and codec, it uses anywhere between 25% to 75% of CPU cycles. Oh, and I had to get the stream IP from the .pls files on shoutcast, but that’s trivial to wrap in a script that wget, sed and spawns mplayer.

As a toy, experimentation lab/playground, Swiss Army knife embedded system, or educational platform, it’s very hard to beat. At $25 for model A and $35 for model B, it does more than it costs.

I couldn’t be happier with exchanging my hard earned cash with the Pi, so you can imagine my surprise when I got a call from Newark to ask about the feedback on the delivery and my order  (both of which were fantastic) and was asked if I minded that if they sent me a discount code on my mail. To top it off, I was encouraged to share them with you.

Here is a snippet of the thank-you email:

Thank you for your most recent order! We appreciate your business and as a thank you we would like to extend to you a 15% off voucher. The Voucher code is THANKS1C and it can be used as many times as you would like through April 30th, 2013. After May 1st, use code THANKS2 which is good through June 1st 2013. Feel free to share with whomever you would like.  When placing an order over our website, please type in the voucher code on the shopping cart page. If you are ordering by phone, simply give the code to the representative processing your order. There are lots of ways to take advantage of this limited offer from our award winning website http://canada.newark.com  to our knowledgeable and friendly team waiting to take your call at 1 800 463 9275. You may also send your quote to quote@newark.com or email a purchase order to order@newark.com.

THANKS1C (through April 30th, 2013)

THANKS2 (from May 1st through June 1st 2013)

Note:

Discount applies to the first price-break quantities only. Discount cannot be combined with other offers, promotions, quantity discounts, or contract pricing. Contractual considerations with a small number of manufactures may reduce or prevent a voucher discount on selected items including test equipment; Cannot be used on Raspberry PI. call us with questions relating to voucher exclusions. Non-catalog items are not subject to a voucher discount.

It’s reasonable that the discount doesn’t apply to the Pi, considering that it’s a non-profit and sold at cost price. So, share them, use them and hope you give Raspberry Pi a spin and let me hear your thoughts. It’s well worth it.

Share
Apr 082013
 

In part 4 we’ve replaced Parallel.ForEach with Task. This allowed us run the processes on the ThreadPool threads, but waited only in the main thread. So the worker threads were never blocked on waits. They executed the child process, then processed the output callbacks and exited promptly. All the while, the main thread is waiting for everything to finish completely. To do this, we needed to both break the Run logic from the Wait. In addition, we had to keep the Process instances alive for the callbacks to work and we can detect the end correctly.

This worked well for us. Except for the nagging problem that we can’t use Parallel.ForEach. Or rather, if we did, even accidentally, we’d deadlock as we did before. Also, our wrapper isn’t readily reusable in other classes without explicitly separating the Run call from the Wait on separate threads as we did. Clearly this isn’t a bulletproof solution.

It might be tempting to think that the Process class is a thin wrapper around the OS process, but in fact it’s rather complex. It does a lot for us, and we shouldn’t throw it away. It’d be great if we could avoid the problem with ThreadPool altogether. Remember the reason we’re using it was because the synchronous version deadlocked as well.

What if we could improve the synchronous version?

Synchronous I/O Take 2

Recall that the deadlock happened because to read until the end of the output stream, we need to wait for the process to exit. The process, in its turn, wouldn’t exit until it has written all its output, which we’re reading. Since we’re reading only one stream at a time (either StandardOutput or StandardError,) if the buffer of the one we’re not reading gets full, we deadlock.

The solution would be to read a little from each. This would work, except for the little problem that if we read from a stream that doesn’t have data, we’d block until it gets data. This is exactly like the situation we were trying to avoid. A deadlock will happen when the other stream’s buffer is full, so the child process will block on writing to it, while we are waiting to read from the other stream that has no data yet.

Peek to the Rescue?

The Process class exposes both StandardOutput or StandardError streams, which have a Peek() function that returns the next character without removing it from the buffer. It returns -1 when we have no data to read, so we postpone reading until Peek() returns > -1.

Albeit, this won’t work. As many have pointed out, StreamReader.Peek can block! Which is ironic, considering that one would typically use it to poll the stream.

It seems we have no more hope in the synchronous world. We have no getters to query the available data as in NetworkStream.DataAvailable and length will throw an exception as it needs a seekable stream (which we haven’t). So we’re back to the Asynchronous world.

The Solution: No Solution!

I was almost sure I found an answer with direct access to the StandardOutput or StandardError streams. After all, these are just wrappers around the Win32 pipes. The async I/O in .Net is really a wrapper around the native Windows async infrastructure. So, in theory, we don’t need all the layers that Process and other class add on top of the raw interfaces. Asynchronous I/O in Windows works by passing an (typically manual-reset) event object and a callback function. All we really need is the event to get triggered. Lo and behold, we are also get a wait-handle in the IAsyncResult that BeginRead of Stream returns. So we could wait on it directly, as these are triggered by the FileSystem drivers, after issuing async reads like this:

  var outAsyncRes = process.StandardOutput.BaseStream.BeginRead(outBuffer, 0, BUFFER_LEN, null, null);
  var errAsyncRes = process.StandardError.BaseStream.BeginRead(errBuffer, 0, BUFFER_LEN, null, null);
  var events = new[] { outAsyncRes.AsyncWaitHandle, errAsyncRes.AsyncWaitHandle };
  WaitHandle.WaitAny(events, waitTimeMs);

Except, this wouldn’t work. There are two reasons why this doesn’t work, one blame goes to the OS and one to .Net.

Async Event Not Triggered by OS

The first issue is that Windows doesn’t always signal this even. You read that right. In fact, a comment in the FileStream code reads:

                // Consider uncommenting this someday soon - the EventHandle 
                // in the Overlapped struct is really useless half of the
                // time today since the OS doesn't signal it. [...]

True, the state of the event object is not changed if the operation finishes before the function returns.

.Net Callback Interop via ThreadPool

Because the event isn’t signaled in all cases, .Net needs to manually signal this event object when the Overlapped I/O callback is invoked. You’d think that would save the day. Albeit, the Overlapped I/O callback doesn’t call into managed code. The CLR handles such callbacks in a special way. In fact it knows about File I/O and .Net wrappers aren’t written in pure P/Invoke, but rather by support from the CLR as well as standard P/Invoke.

Because the system can’t invoke managed callbacks, the solution is for the CLR to do it itself. Of course this needs to be one in a responsive fashion, and without blocking all of CLR for each callback invocation  What better solution than to queue a task on the ThreadPool that invokes the .Net callback? This callback will signal the event object we got in the IAsyncResult and, if set, it’ll call our delegate that we could pass to the BeginRead call.

So we’re back 180 degrees to ThreadPool and the original dilemma.

Conclusion

The task at hand seemed simple enough. Indeed, even after 5 posts, I still feel the frustration of not finding a generic solution that is both scalable and abstract. We’ve tried simple synchronous I/O, then switched to async, only to find that our waits can’t be satisfied because they are serviced by the worker threads that we are using to wait for the reads to complete. This took us to a polling strategy, that, once again, failed because the classes we are working with do not allow us to poll without blocking. The OS could have saved the day but because we’re in the belly of .Net, we have to play with its rules and that meant the OS callbacks had to be serviced on the same worker threads we are blocking. The ThreadPool is another stubbornly designed process-wide object that doesn’t allow us to gracefully and, more importantly, thread-safely, query and modify.

This means that we either need to explicitly use Tasks and the ProcessExecutor class we designed in the previous post or we need to roll our own internal thread pool to service the waits.

It is very easy to overlook a cyclic dependency such as this, especially in the wake of abstraction and separation of responsibilities. The complex nature of things was the perfect setup to overlook the subtle assumptions and expectations of each part of the code: the process spawning, the I/O readers, Parallel.ForEach and ultimately, the generic and omnipresent, ThreadPool.

The only hope for solving similar problems is to first find them. By incorrectly assuming (as I did) that any thread-safe function can be wrapped in Parallel.ForEach, and patting oneself for the marvels and simplicity of modern programming languages and for being proud of our silently-failing achievement, we only miss the opportunity to do so. By testing our code and verifying our assumptions, with skepticism and cynicism, rather than confidence and pride, do we stand a chance at finding out the sad truth, at least on the (hopefully) rare cases that we fail. Or abstraction fails, at any rate.

I can only wonder with a grin about other similar cases in the wild and how they are running slower than their brain-dead, one-at-a-time versions.

Share
Apr 082013
 

Apparently there is an ongoing war between FEMEN, a feminist organization that protests topless, and a group calling themselves Muslim Women Against Femen, among others.

What is of interest to me is the rhetoric and politics at play. FEMEN is not exactly known for taking the diplomatic road to getting their voice across. Whatever their methods, whether one agrees with them or not, or whether they are effective, considering that so far they got huge backlash and won controversy more than anything, the response is certainly interesting.

The row ignited when Amina Tyler, a 19-year-old FEMEN activist, posted topless pictures of herself on FEMEN-Tunesia’s Facebook page with “Fuck your morals” in English and “My body is my own, not anyone’s honor” in Arabic. A certain preacher called Adel Almi was quoted by Tunisian newspaper Kapitalis that she deserves 80 to a 100 lashes according to the sharia law, but added that due to the gravity of her actions, which he believed may encourage other women to do the same and bring an “epidemic” and “catastrophes,” merits death by stoning.

FEMEN has reacted by protesting, as any feminist human should do. A photo of a man apparently kicking a topless FEMEN member protesting in front of the Great Mosque of Paris was posted in a piece on The Guardian‘s Comment is Free section. A video of that protest, with the man in question, is available on Vimeo.

Muslim Women Against Femen has responded with an open letter with eight-point objections of Muslimahs (Muslim women) to FEMEN. In addition, a number of women (including a child) have posted their photos on Facebook holding slogans to the same effect and interviews with supporters on The Huffington Post.

What is amazing is that between the preacher, Muslim Women Against Femen, and the women with the slogans, not a single reference is made to the young woman responsible for the debate in the first place. Amina Tyler is missing from the picture and there is good reason to fear for her life.

The Muslimahs who took to themselves the right and initiation to defend their rights and religion, some of whom are self-identified feminists, apparently forgot to do Tyler even a lip-service to denounce any harm that could be inflicted on her. Instead, they took the opportunity to first generalize and speak for all “Muslim women, women of colour and women from the Global South,” as if appointed spokespersons. Second, the opportunity to lash at “racist, imperialist, capitalist, white-supremacist, colonialists” was not missed, never mind that they are irrelevant.

Some of the slogans in the self-photos read “Do I look oppressed?” and a child of perhaps five held one that read “Shame on ‘FEMEN’ Hijab is my right!” I do not see anyone taking anyone’s rights from them, if anything the rights of Amina are the ones at risk. Least of all the rights of a child, which, I should add, beyond their welfare, health and education they can barely demand any rights that adults don’t give them. Which raises the question of who, if not her parents, gave her the “right” to wear a hijab when she clearly is in no position to choose to wear one anymore than the blue-jeans that she has on. Besides, if these women are not oppressed, they need not answer to the calling. (Although I do wonder if that will remain to be the case if and when they decide to change their outfit.)

The open-letter doesn’t offer any more wisdom from the thirteen or so university students who wrote it. Rife with ignorant accusations, misguided rhetoric, contradictions and ad-hominem attack on the protesters. If they are to be believed, “FEMEN is a colonial, racist rubbish disguised as “women’s liberation”,” and “rubbing shoulders with far-right, racist and Islamophobic groups is just ANTI-FEMINIST beyond bounds” and “EXTREMELY DANGEROUS.” And members of FEMEN “don’t really care about violence and harm being inflicted upon women, you only care about that when it is perpetrated by brown men with long beards who pray five times a day.” Which is quite interesting considering that the group had focused most of its activity to protest against “sex tourists, religious institutions, international marriage agencies, sexism and other social, national and international topics.” In fact, had the authors check their facts they would have run into FEMEN’s four-point goals on their MySpace page explicitly targets “Ukrainian women” and “Ukraine” explicitly. Not to mention that if anything Ukraine was a victim of -Russian- colonialism. (But I guess all “white” people are “racist” and “colonialist.”)

Completely oblivious to the irony that it was a Muslim (male) preacher who called for the lashing and stoning of Amina Tyler, the authors declare “we don’t have to conform to your customs of protest to emancipate ourselves. Our religion does that for us already, thank you very much.”

In a particularly off-topic and a below-the-belt swing, the epistle writers pulled a punch on the physique of the topless protesters saying that “not all of us are white, skinny, physically non-disabled [sic] and willing to whip off our tops merely for press attention.” Perhaps alluding to both being white and skinny as undesirable attributes in women (by who?). To add insult to injury, they advised them to “check yourselves before you go into the streets again.”

Perhaps of the more salient remark is that the authors “understand that it’s really hard for a lot of you white colonial ‘feminists’ to believe, but- SHOCKER! – Muslim women and women of colour can come with their own autonomy, and fight back as well!” The takeaway seems to be that while encouraging Muslim women to protest against oppression is “racist,” to say “white colonial ‘feminists'” is not. (Again, never mind that Islam is not a race, but white is.)

They advise the protesters to “Take aim at male supremacy, not Islam.” Perhaps oblivious to the fact that some male supremacists are also Muslim. They conclude the letter with the banner “SMASH THE WHITE SUPREMACIST CAPITALIST PATRIARCHY! POWER TO THE MUSLIMAHS!”

One has to wonder, what has colonialism and racism anything to do with defending the rights of a woman to post her own topless photos online without getting stoned to death?

One could give them the benefit of doubt and assume they simply didn’t understand what the issue was, had these women not been from Birmingham and other “western,” “colonial,” “capitalist” and “imperial” cities, instead of the “Global South” and the middle-east. Not only did they fail to address the subject matter, that of a teenage woman getting death threats for no other reason that posting her own photos online, but they muddled the “messed up world” that “we live in” (as they called it) even further. Silence in the face of injustice and oppression is a form of injustice and oppression. Do they really believe Amina deserves to be stoned to death? That she is a “white colonial racist” herself to be fought? Or is anyone who doesn’t agree with their ultra-conservative views is an enemy to be chased away?

Muslim Women Against Femen are not oppressed, but they wouldn’t lift a finger to support a sister under death treat. They wouldn’t recognize her right to control her body and her outfit. They wouldn’t even speak out against the preacher who is openly encouraging others to take matters into their own hands and apply the harsh law of lashing, nay, stoning her. Instead, they took to grinding the ax of “white colonialism.” Like the preacher, they feel threatened, and fear a “epidemic” of sorts.

“Nudity DOES NOT liberate me and I DO NOT need saving” read on of the slogans. Perhaps, but you aren’t implying that Amina doesn’t need or deserve saving either, are you?

What feminist ignores the need of another woman under attack by male preachers calling for her stoning? Apparently those who believe the protests of FEMEN is a “crusade.”

Meanwhile, a petition to prosecute those who threatened Amina Tyler’s life has already collected more than 110,000 signatures. At least there are still some left who differentiate between agreeing with someone’s opinion and agreeing to their right of having an opinion.

Share
Apr 052013
 

In part 3 we found out that executing Process instances in Parallel.ForEach could starve ThreadPool and cause a deadlock. In this part, we’ll attempt at solving the problem.

What we’d like to have is a function or class that executes and reads the outputs of an external process such that it could be used concurrently, in Parallel.ForEach or with Tasks without ill-effects or without the user taking special precautions to avoid these problems.

A Solution

A naïve solution is to make sure there is at least one thread available in the pool when blocking on the child process and its output. In other words, the number of threads in the pool must be larger than the MaxDegreeOfParallelism of the ForEach loop. This will only work when there are no other users of the ThreadPool, then we can control these two numbers to guarantee this inequality. Even then, we might potentially need a large number of threads in the pool. But the problem is inescapable as we can’t control the complete process at all times. To make matters worse, the API for changing the number of workers in the pool are non-atomic and will always have race conditions when changing these process-wide settings.

Broadly speaking, there are three solutions. The first is to replace Parallel.ForEach with something else. The second is to replace the Process class with our own, such that we have a more flexible design that avoid the wasting a thread to wait on the callback events. The third is to replace the ThreadPool. The last can be done by simply having a private pool for waiting on the callback events. That is, unless we can find a way to make the native Process work with Parallel.ForEach without worrying about this issue.

Of course the best solution would be for the ThreadPool to be smarter and increase the number of threads available. This is the case when we set our wait functions to wait indefinitely. But that takes way too long (in my case about 12 seconds) before the ThreadPool realizes that no progress is being made, and it still has some tasks schedules for execution. It (correctly) assumes that there might be some interdependency between the running-but-not-progressing threads and those tasks waiting for execution.

The third solution is overly complicated and I’d find very little reason to defend it. Thread pools are rather complicated beasts and unless we can reuse one, it’d be an overkill to develop one. The first two solution look promising, so let’s try them out.

Replacing Parallel.ForEach

Clearly the above workarounds aren’t bulletproof. We can easily end up in situations where we timeout, and so it’d be a Red Queen’s Race. A better solution is to avoid the root of the problem, namely, to avoid wasting ThreadPool threads to wait on the processes. This can be done if we could make the wait more efficient, by combining the waits of multiple processes together. In that case, it wouldn’t matter if we used a single ThreadPool thread, or a dedicated one.

To that end, we need to do two things. First, we need to convert the single function into an object that we can move around, because we’ll need to reference its locals directly. Second, we need to separate the wait from all the other setup code.

Here is a wrapper that is disposable, and wraps cleanly around Process.

        public class ProcessExecutor : IDisposable
        {
            public ProcessExecutor(string name, string path)
            {
                _name = name;
                _path = path;
            }

            public void Dispose()
            {
                Close();
            }

            public string Name { get { return _name; } }
            public string StdOut { get { return _stdOut.ToString(); } }
            public string StdErr { get { return _stdErr.ToString(); } }

            // Returns the internal process. Used for getting exit code and other advanced usage.
            // May be proxied by getters. But for now let's trust the consumer.
            public Process Processs { get { return _process; } }

            public bool Run(string args)
            {
                // Make sure we are don't have any old baggage.
                Close();

                // Fresh start.
                _stdOut.Clear();
                _stdErr.Clear();
                _stdOutEvent = new ManualResetEvent(false);
                _stdErrEvent = new ManualResetEvent(false);

                _process = new Process();
                _process.StartInfo = new ProcessStartInfo(_path)
                {
                    Arguments = args,
                    UseShellExecute = false,
                    RedirectStandardOutput = true,
                    RedirectStandardError = true,
                    ErrorDialog = false,
                    CreateNoWindow = true,
                    WorkingDirectory = Path.GetDirectoryName(_path)
                };

                _process.OutputDataReceived += (sender, e) =>
                {
                    _stdOut.AppendLine(e.Data);
                    if (e.Data == null)
                    {
                        var evt = _stdOutEvent;
                        if (evt != null)
                        {
                            lock (evt)
                            {
                                evt.Set();
                            }
                        }
                    }
                };
                _process.ErrorDataReceived += (sender, e) =>
                {
                    _stdErr.AppendLine(e.Data);
                    if (e.Data == null)
                    {
                        var evt = _stdErrEvent;
                        lock (evt)
                        {
                            evt.Set();
                        }
                    }
                };

                _sw = Stopwatch.StartNew();
                _process.Start();
                _process.BeginOutputReadLine();
                _process.BeginErrorReadLine();
                _process.Refresh();
                return true;
            }

            public void Cancel()
            {
                var proc = _process;
                _process = null;
                if (proc != null)
                {
                    // Invalidate cached data to requery.
                    proc.Refresh();

                    // Cancel all pending IO ops.
                    proc.CancelErrorRead();
                    proc.CancelOutputRead();

                    Kill();
                }

                var outEvent = _stdOutEvent;
                _stdOutEvent = null;
                if (outEvent != null)
                {
                    lock (outEvent)
                    {
                        outEvent.Close();
                        outEvent.Dispose();
                    }
                }

                var errEvent = _stdErrEvent;
                _stdErrEvent = null;
                if (errEvent != null)
                {
                    lock (errEvent)
                    {
                        errEvent.Close();
                        errEvent.Dispose();
                    }
                }
            }

            public void Wait()
            {
                Wait(-1);
            }

            public bool Wait(int timeoutMs)
            {
                try
                {
                    if (timeoutMs < 0)
                    {
                        // Wait for process and all I/O to finish.
                        _process.WaitForExit();
                        return true;
                    }

                    // Timed waiting. We need to wait for I/O ourselves.
                    if (!_process.WaitForExit(timeoutMs))
                    {
                        Kill();
                    }

                    // Wait for the I/O to finish.
                    var waitMs = (int)(timeoutMs - _sw.ElapsedMilliseconds);
                    waitMs = Math.Max(waitMs, 10);
                    _stdOutEvent.WaitOne(waitMs);

                    waitMs = (int)(timeoutMs - _sw.ElapsedMilliseconds);
                    waitMs = Math.Max(waitMs, 10);
                    return _stdErrEvent.WaitOne(waitMs);
                }
                finally
                {
                    // Cleanup.
                    Cancel();
                }
            }

            private void Close()
            {
                Cancel();
                var proc = _process;
                _process = null;
                if (proc != null)
                {
                    // Dispose in all cases.
                    proc.Close();
                    proc.Dispose();
                }
            }

            private void Kill()
            {
                try
                {
                    // We need to do this in case of a non-UI proc
                    // or one to be forced to cancel.
                    var proc = _process;
                    if (proc != null && !proc.HasExited)
                    {
                        proc.Kill();
                    }
                }
                catch
                {
                    // Kill will throw when/if the process has already exited.
                }
            }

            private readonly string _name;
            private readonly string _path;
            private readonly StringBuilder _stdOut = new StringBuilder(4 * 1024);
            private readonly StringBuilder _stdErr = new StringBuilder(4 * 1024);

            private ManualResetEvent _stdOutEvent;
            private ManualResetEvent _stdErrEvent;
            private Process _process;
            private Stopwatch _sw;
        }

Converting this to use Tasks is left as an exercise to the reader.

Now, we can use this in a different way.

        public static void Run(List<KeyValuePair<string, string>> pathArgs, int timeout)
        {
            var cts = new CancellationTokenSource();
            var allProcesses = Task.Factory.StartNew(() =>
            {
                var tasks = pathArgs.Select(pair => Task.Factory.StartNew(() =>
                {
                    string name = Path.GetFileNameWithoutExtension(pair.Key);
                    var exec = new ProcessExecutor(name, pair.Key);
                    exec.Run(pair.Value);
                    cts.Token.Register(exec.Cancel);
                    return exec;
                })).ToArray();

                // Wait for individual tasks to finish.
                foreach (var task in tasks)
                {
                    if (task != null)
                    {
                        task.Result.Wait(timeout);
                        task.Result.Cancel();
                        task.Result.Dispose();
                        task.Dispose();
                    }
                }
            });

            // Cancel, if we timed out.
            allProcesses.Wait(timeout);
            cts.Cancel();
        }

This time, we fire each process executor in a separate thread (also a worker on the ThreadPool,) but we wait in a single thread. The worker threads will run, and they will have their async reads queued, which will run once the exec.Run() functions return, and free that thread, at which point the async reads will execute.

Notice that we are creating the new tasks in a worker thread. This is so that we can limit the wait on all of them (although we’d need to have a timeout that depends on their number. But for illustration purpose the above code is sufficient. Now we have the luxury of waiting on all of them in a single Wait call and we can also cancel all of them using the CancellationTokenSource.

The above code is typically finishing in about 5-6 seconds (and as low as 4006ms) for 500 child processes on 12 cores to echo a short string and exit.

One thing to keep in mind when using this code is that if we use ProcessExecutor in another class, as a member, that class should also separate and expose the Run and Wait functions separately. That is, if it simply calls Run followed by Wait in the same function, then they will both execute on the same thread, which will result in the same problem if a number of them get executed on the ThreadPool, as we did. So the abstraction will leak again!

In addition, this puts the onus in the hands of the consumer. Our ProcessExecutor class is not bulletproof on its own.

In the next and final part we’ll go back to the Process class to try and avoid the deadlock issue without assuming a specific usage pattern.

Share
Apr 042013
 

In part 2 we discovered that by executing Process instances in Parallel.ForEach we are getting reduced performance and our waits are timing out. In this part we’ll dive deep into the problem and find out what is going on.

Deadlock

Clearly the problem has to do with the Process class and/or with spawning processes in general. Looking more carefully, we notice that besides the spawned process we also have two asynchronous reads. We have intentionally requested these to be executed on separate threads. But that shouldn’t be a problem, unless we have misused the async API.

It is reasonable to suspect that the approach used to do the async reading is at fault. This is where I ventured to look at the Process class code. After all, it’s a wrapper on Win32 API, and it might make assumptions that I was ignoring or contradicting. Regrettably, that didn’t help to figure out what was going on, except for initiating me to write the said previous post.

Looking at the BeginOutputReadLine() function, we see it creating an AsyncStreamReader, which is internal to the .Net Framework and then calls BeginReadLine(), which presumably is where the async action happens.

        [System.Runtime.InteropServices.ComVisible(false)] 
        public void BeginOutputReadLine() {

            if(outputStreamReadMode == StreamReadMode.undefined) { 
                outputStreamReadMode = StreamReadMode.asyncMode;
            } 
            else if (outputStreamReadMode != StreamReadMode.asyncMode) {
                throw new InvalidOperationException(SR.GetString(SR.CantMixSyncAsyncOperation));
            }

            if (pendingOutputRead)
                throw new InvalidOperationException(SR.GetString(SR.PendingAsyncOperation)); 

            pendingOutputRead = true;
            // We can't detect if there's a pending sychronous read, tream also doesn't. 
            if (output == null) {
                if (standardOutput == null) {
                    throw new InvalidOperationException(SR.GetString(SR.CantGetStandardOut));
                } 

                Stream s = standardOutput.BaseStream; 
                output = new AsyncStreamReader(this, s, new UserCallBack(this.OutputReadNotifyUser), standardOutput.CurrentEncoding); 
            }
            output.BeginReadLine(); 
        }

Within the AsyncStreamReader.BeginReadLine() we see the familiar asynchronous read on Stream using Stream.BeginRead().

        // User calls BeginRead to start the asynchronous read
        internal void BeginReadLine() { 
            if( cancelOperation) {
                cancelOperation = false; 
            } 

            if( sb == null ) { 
                sb = new StringBuilder(DefaultBufferSize);
                stream.BeginRead(byteBuffer, 0 , byteBuffer.Length,  new AsyncCallback(ReadBuffer), null);
            }
            else { 
                FlushMessageQueue();
            } 
        }

Unfortunately, I had incorrectly assumed that this async wait was executed on one of the I/O Completion Port threads of the ThreadPool. It seems that this is not the case, as ThreadPool.GetAvailableThreads() always returned the same number for completionPortThreads (incidentally, workerThreads value didn’t change much as well, but I didn’t notice that at first).

A breakthrough came when I started changing the maximum parallelism (i.e. maximum thread count) of Parallel.ForEach.

        public static void ExecAll(List<KeyValuePair<string, string>> pathArgs, int timeout, int maxThreads)
        {
            Parallel.ForEach(pathArgs, new ParallelOptions { MaxDegreeOfParallelism = maxThreads },
                             arg => Task.Factory.StartNew(() => DoJob(arg.Value.Split(' '))).Wait() );
        }

I thought I should increase the maximum number of threads to resolve the issue. Indeed, for certain values of MaxDegreeOfParallelism, I could never reproduce the problem (all processes finished very swiftly, and no timeouts). For everything else, the problem was reproducible most of the time. Nine out of ten I’d get timeouts. However, and to my surprise, the problem went away when I reduced MaxDegreeOfParallelism!

The magic number was 12. Yes, the number of cores at disposal on my dev machine. If we limit the number of concurrent ForEach executions to less than 12, everything finishes swiftly, otherwise, we get timeouts and finishing ExecAll() takes a long time. In fact, with maxThreads=11, 500 process executions finish under 8500ms, which is very commendable. However, with maxThreads=12, every 12 process wait until they timeout, which would take several minutes to finish all 500.

With this information, I tried increasing the ThreadPool limit of threads using ThreadPool.SetMaxThreads(). But it turns out the defaults are 1023 worker threads and 1000 for I/O Completion Port threads, as reported by ThreadPool.GetMaxThreads(). I was assuming that if the available thread count was lower than the required, the ThreadPool would simply create new threads until it reached the maximum configured value.

ThreadPool-Deadlock

Diagram showing deadlock (created by www.gliffy.com)

Putting It All Together

The assumption that Parallel.ForEach executes its body on the ThreadPool, assuming said body is a black-box is clearly flawed. In our case the body is initiating asynchronous I/O which needs their own threads. Apparently, these do not come from the I/O thread pool but the worker thread pool. In addition, the number of threads in this pool is initially set to that of the available number of cores on the target machine. Even worse, it will resist creating new threads until absolutely necessary. Unfortunately, in our case it’s too late, as our waits are timing out. What I left until this point (both for dramatic effect and to leave the solution to you, the reader, to find out) is that the timeouts were happening on the StandardOutput and StandardError streams. That is, even though the child processes had exited a long time ago, we were still waiting to read their output.

Let me spell it out, if it’s not obvious: Each call to spawn and wait for a child process is executed on a ThreadPool worker thread, and is using it exclusively until the waits return. The async stream reads on StandardOutput and StandardError need to run on some thread. Since they are apparently queued to run on a ThreadPool thread, they will starve if we use all of the available threads in the pool to wait on them to finish. Thereby timing out on the read waits (because we have a deadlock).

This is a case of Leaky Abstraction, as our black box of a “execute on ThreadPool” failed miserably when the code executed itself depended on the ThreadPool. Specifically, when we had used all available threads in the pool, we left none for our code that depends on the ThreadPool to use. We shot ourselves in the proverbial foot. Our abstraction failed.

In the next part we’ll attempt to solve the problem.

Share
Apr 032013
 

In part 1 we discovered a deadlock in the synchronous approach to reading the output of Process and we solved it using asynchronous reads. Today we’ll parallelize the execution in an attempt to maximize efficiency and concurrency.

Parallel Execution

Now, let’s complicate our lives with some concurrency, shall we?

If we are to spawn many processes, we could (and should) utilize all the cores at our disposal. Thanks to Parallel.For and Parallel.ForEach this task is made much simpler than otherwise.

        public static void ExecAll(List<KeyValuePair<string, string>> pathArgs, int timeout)
        {
            Parallel.ForEach(pathArgs, arg => ExecWithAsyncTasks(arg.Key, arg.Value, timeout));
        }

Things couldn’t be any simpler! We pass a list of executable paths and their arguments as KeyValuePair and a timeout in milliseconds. Except, this won’t work… at least not always.

First, let’s discuss how it will not work, then let’s understand the why before we attempt to fix it.

When Abstraction Backfires

The above code works like a charm in many cases. When it doesn’t, a number of waits timeout. This is unacceptable as we wouldn’t know if we got all the output or part of it, unless we get a clean exit with no timeouts. I first noticed this issue in a completely different way. I was looking at the task manager Process Explorer (if not using it, start now and I promise not to tell anyone,) to see how amazingly faster things are with that single ForEach line. I was expecting to see a dozen or so (on a 12-core machine) child processes spawning and vanishing in quick succession. Instead, and to my chagrin, I saw most of the time just one child! One!

And after many trials and head-scratching and reading, it became clear that the waits were timing out, even though clearly the children had finished and exited. Indeed, because typically a process would run in much less time than the timeout, it was now slower with the parallelized code than with the sequential version. This wasn’t obvious at first, and reasonably I suspected some children were taking too long, or they had too much to write to the output pipes that could be deadlocking (which wasn’t unfamiliar to me).

Testbed

To troubleshoot something as complex as this, one should start with clean test-case, with minimum number of variables. This calls for a dummy child that would do exactly as I told it, so that I could simulate different scenarios. One such scenario would be not to spawn any children at all, and just test the Parallel.ForEach with some in-proc task (i.e. just a local function that does similar work to that of a child).

using System;
using System.Threading;

namespace Child
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length < 2 || args.Length % 2 != 0)
            {
                Console.WriteLine("Usage: [echo|fill|sleep|return] ");
                return;
            }

            DoJob(args);
        }

        private static void DoJob(string[] args)
        {
            for (int argIdx = 0; argIdx < args.Length; argIdx += 2)
            {
                switch (args[argIdx].ToLowerInvariant())
                {
                    case "echo":
                        Console.WriteLine(args[argIdx + 1]);
                        break;

                    case "fill":
                        var rd = new Random();
                        int bytes = int.Parse(args[argIdx + 1]);
                        while (bytes-- > 0)
                        {
                            // Generate a random string as long as the .
                            Console.Write(rd.Next('a', 'z'));
                        }
                        break;

                    case "sleep":
                        Thread.Sleep(int.Parse(args[argIdx + 1]));
                        break;

                    case "return":
                        Environment.ExitCode = int.Parse(args[argIdx + 1]);
                        break;

                    default:
                        Console.WriteLine("Unknown command [" + args[argIdx] + "]. Skipping.");
                        break;
                }
            }
        }
    }
}

Now we can give the child process commands to change its behavior, from dumping data to its output to sleeping to returning immediately.

Once the problem is reproduced, we can narrow it down to pin-point the source. Running the exact same command in the same process (i. e. without spawning another process) results in no problems at all. Calling DoJob 500 times directly in Parallel.ForEach finishes in under 500ms (often under 450ms). So we can be sure Parallel.ForEach is working fine.

        public static void ExecAll(List<KeyValuePair<string, string>> pathArgs, int timeout)
        {
            Parallel.ForEach(pathArgs, arg => Task.Factory.StartNew(() => DoJob(arg.Value.Split(' '))).Wait() );
        }

Even executing as a new task (within the Parallel.ForEach) doesn’t result in any noticeable different in time. The reason for this good performance when running the jobs in new tasks is probably because the ThreadPool scheduler does fetch the task to execute immediately when we call Wait() and executes it. That is, because both the Task.Factory.StartNew() call as well as the DoJob() call are executed ultimately on the ThreadPool, and because Task is designed specifically to utilize it, when we call Wait() on the task, it knows that it should schedule the next job in the queue, which in this case is the job of the task on which we executed the Wait! Since the caller of Wait() happens to be running on the ThreadPool, it simply executes it instead of scheduling it on a different thread and blocking. Dumping the Thread.CurrentThread.ManagedThreadId from before the Task.Factory.StartNew() call and from within DoJob shows that indeed both are executed in the same thread. The overhead of creating and scheduling a Task is negligible, so we don’t see much of a change in time over 500 executions.

All this is great and comforting, but still doesn’t help us resolve the problem at hand: why aren’t our processes spawned and executed at the highest possible efficiency? And why are they timing out?

In the next part we’ll dive deep into the problem and find out what is going on.

Share
Apr 022013
 

I’ve mentioned in a past post that it was conceived while reading the source code for the System.Diagnostics.Process class. This post is about the reason that pushed me to read the source code in an attempt to fix the issue. It turned out that this was yet another case of Leaky Abstraction, which is a special interest of mine.

As it turned out, this post ended being way too long (even for me). I don’t like installments, but I felt that it is something that is worth trying as the size was prohibitive for single-post consumption. As such, I’ve split it up on 5 parts, so that each part would be around a 1000 words or less. I’ll post one part a day.

To give you an idea of the scope and subject of what’s to come, here is a quick overview. In part 1 I’ll lay out the problem. We are trying to spawn processes, read their output and kill if they take too long. Our first attempt is to use simple synchronous I/O to read the output and discover a deadlock. We solve the deadlock using asynchronous I/O. In part 2 we parallelize the code and discover reduced performance and yet another deadlock. We create a testbed and set about to investigate the problem at depth. In part 3 we will find out the root cause and we’ll discuss the mechanics (how and why) we hit such a problem. In part 4 we’ll discuss solutions to the problem and develop a generic solutions (with code) to fix the problem. Finally, in part 5 we see whether or not a generic solution could work before we summarize and conclude.

Let’s begin at the very beginning. Suppose you want to execute some program (call it child), get all its output (and error) and, if it doesn’t exit within some time limit, kill it. Notice that there is no interaction and no input. This is how tests are executed in Phalanger using a test runner.

Synchronous I/O

The Process class has conveniently exposed the underlying pipes to the child process using stream instances StandardOutput and StandardError. And, like many, we too might be tempted to simply call StandardOutput.ReadToEnd() and StandardError.ReadToEnd(). Albeit, that would work, until it doesn’t. As Raymond Chen noted, it’ll work as long as the data fits into the internal pipe buffer. The problem with this approach is that we are asking to read until we reach the end of the data, which will only happen for certainty when the child process we spawned exits. However, when the buffer of the pipe which the child writes its output to is full, the child has to wait until there is free space in the buffer to write to. But, you say, what if we always read and empty the buffer?

Good idea, except, we need to do that for both StandardOutput and StandardError at the same time. In the StandardOutput.ReadToEnd() call we read every byte coming in the buffer until the child process exits. While we have drained the StandardOutput buffer (so that the child process can’t be possibly blocked on that,) if it fills the StandardError buffer, which we aren’t reading yet, we will deadlock. The child won’t exit until it fully writes to the StandardError buffer (which is full because no one is reading it,) meanwhile, we are waiting for the process to exit so we can be sure we read to the end of the StandardOutput before we return (and start reading StandardError). The same problem exists for StandardOutput, if we first read StandardError, hence the need to drain both pipe buffers as they are fed, not one after the other.

Async Reading

The obvious (and only practical) solution is to read both pipes at the same time using separate threads. To that end, there are mainly two approaches. The pre-4.0 approach (async events), and the 4.5-and-up approach (tasks).

Async Reading with Events

The code is reasonably straight forward as it uses .Net events. We have two manual-reset events and two delegates that get called asynchronously when we read a line from each pipe. We get null data when we hit the end of file (i.e. when the process exits) for each of the two pipes.

        public static string ExecWithAsyncEvents(string path, string args, int timeoutMs)
        {
            using (var outputWaitHandle = new ManualResetEvent(false))
            {
                using (var errorWaitHandle = new ManualResetEvent(false))
                {
                    using (var process = new Process())
                    {
                        process.StartInfo = new ProcessStartInfo(path);
                        process.StartInfo.Arguments = args;
                        process.StartInfo.UseShellExecute = false;
                        process.StartInfo.RedirectStandardOutput = true;
                        process.StartInfo.RedirectStandardError = true;
                        process.StartInfo.ErrorDialog = false;
                        process.StartInfo.CreateNoWindow = true;

                        var sb = new StringBuilder(1024);
                        process.OutputDataReceived += (sender, e) =>
                        {
                            sb.AppendLine(e.Data);
                            if (e.Data == null)
                            {
                                outputWaitHandle.Set();
                            }
                        };
                        process.ErrorDataReceived += (sender, e) =>
                        {
                            sb.AppendLine(e.Data);
                            if (e.Data == null)
                            {
                                errorWaitHandle.Set();
                            }
                        };

                        process.Start();
                        process.BeginOutputReadLine();
                        process.BeginErrorReadLine();

                        process.WaitForExit(timeoutMs);
                        outputWaitHandle.WaitOne(timeoutMs);
                        errorWaitHandle.WaitOne(timeoutMs);

                        process.CancelErrorRead();
                        process.CancelOutputRead();

                        return sb.ToString();
                    }
                }
            }
        }

We certainly can improve on the above code (for example we should make the total wait limit <= timeoutMs) but you get the point with this sample. Also, no error handling or killing the child process when it times out and doesn’t exit.

Async Reading with Tasks

A much more simplified and sanitized approach is to use the new System.Threading.Tasks namespace/framework to do all the heavy-lifting for us. As you can see, the code has been cut by half and it’s much more readable, but we need Framework 4.5 and newer for this to work (although my target is 4.0, but for comparison purposes I gave it a spin). The results are the same.

        public static string ExecWithAsyncTasks(string path, string args, int timeout)
        {
            using (var process = new Process())
            {
                process.StartInfo = new ProcessStartInfo(path);
                process.StartInfo.Arguments = args;
                process.StartInfo.UseShellExecute = false;
                process.StartInfo.RedirectStandardOutput = true;
                process.StartInfo.RedirectStandardError = true;
                process.StartInfo.ErrorDialog = false;
                process.StartInfo.CreateNoWindow = true;

                var sb = new StringBuilder(1024);

                process.Start();
                var stdOutTask = process.StandardOutput.ReadToEndAsync();
                var stdErrTask = process.StandardError.ReadToEndAsync();

                process.WaitForExit(timeout);
                stdOutTask.Wait(timeout);
                stdErrTask.Wait(timeout);

                return sb.ToString();
            }
        }

Again, a healthy doze of error-handling is in order, but for illustration purposes left out. A point worthy of mention is that we can’t assume we read the streams by the time the child exits. There is a race condition and we still need to wait for the I/O operations to finish before we can read the results.

In the next part we’ll parallelize the execution in an attempt to maximize efficiency and concurrency.

Share
Mar 182013
 

I don’t need to see the source code of an API to code against. In fact, I actively discourage against depending (even psychologically) on the inner details of an implementation. The contract should be sufficient. Of course I’m assuming a well-designed API with good (at least decent) documentation. But sometimes often reality is more complicated than an ideal world.

Empty try{}

While working with System.Diagnostics.Process in the context of Parallel.ForEach things became a bit too complicated. (I’ll leave the gory details to another post.) What prompted this post was a weird pattern that I noticed while browsing Process.cs, the source code for the Process class (to untangle said complicated scenario).

RuntimeHelpers.PrepareConstrainedRegions();
try {} finally {
   retVal = NativeMethods.CreateProcessWithLogonW(
		   startInfo.UserName,
		   startInfo.Domain,
		   password,
		   logonFlags,
		   null,            // we don't need this since all the info is in commandLine
		   commandLine,
		   creationFlags,
		   environmentPtr,
		   workingDirectory,
		   startupInfo,        // pointer to STARTUPINFO
		   processInfo         // pointer to PROCESS_INFORMATION
	   );
   if (!retVal)
	  errorCode = Marshal.GetLastWin32Error();
   if ( processInfo.hProcess!= (IntPtr)0 && processInfo.hProcess!= (IntPtr)NativeMethods.INVALID_HANDLE_VALUE)
	  procSH.InitialSetHandle(processInfo.hProcess);
   if ( processInfo.hThread != (IntPtr)0 && processInfo.hThread != (IntPtr)NativeMethods.INVALID_HANDLE_VALUE)
	  threadSH.InitialSetHandle(processInfo.hThread);
}

This is from StartWithCreateProcess(), a private method of Process. So no surprises that it’s doing a bunch of Win32 native API calls. What stands out is the try{} construct. But also notice the RuntimeHelpers.PrepareConstrainedRegions() call.

Thinking of possible reasons for this, I suspected it had to do with run-time guarantees. The RuntimeHelpers.PrepareConstrainedRegions() call is a member of CER. So why the need to use empty try if we have the PrepareConstrainedRegions call? Regrettably, I confused it with the empty try clause. In reality, the empty try construct has nothing to do with CER and everything with execution interruption by means of ThreadAbortException.

A quick search hit Siddharth Uppal’s The empty try block mystery where he explains that Thread.Abort() never interrupts code in the finally clause. Well, not quite.

Thread.Abort and finally

A common interview question, after going through the semantics of try/catch/finally, is to ask the candidate if finally is always executed (since usually that’s the wording they use). Are there no scenarios where one could conceivably expect their code in finally never to get executed? A creative response is usually when we have an an “infinite loop” in the try or catch (if it gets called). Novice candidates are easily confused when they consider a premature termination of the process (or thread). After all, one would expect that there be some consistency in the behavior of code. So why shouldn’t finally always get executed, even in a process termination scenario?

It’s not difficult to see that there is a struggle of powers between the termination party and the code/process in question. If finally always executes, there would be no guarantees for termination. Yes, we cannot guarantee both that finally will always execute while guaranteeing that termination will always succeed. One or both must have weak guarantees (or at least weaker guarantees than the other). When push comes to shove, we (as users or administrators) want to have full control over our machines, so we choose to have the ultimate magic wand to kill and terminate any misbehaving (or just undesirable) process.

The story is a little bit different when it comes to aborting individual threads, however. Where on the process level the operating system can terminate it with a sweeping gesture, in the managed world things are more controlled. The CLR can see to it that any thread that is about to get terminated (by a call to Thread.Abort()) is done cleanly and respecting all the language and runtime rules. This includes executing finally blocks as well as finalizers.

ThreadAbortException weirdness

When aborting a thread, the apparent behavior is one of an exception thrown from within the thread in question. When another thread invokes Thread.Abort() on our thread, a ThreadAbortException is raised from our thread code. This is called asynchronous thread abort, as opposed to synchronous abort, when a thread invokes Thread.CurrentThread.Abort() (invariantly on itself). Other asynchronous exceptions include OutOfMemoryException and StackOverflowException.

The behavior, then, is exactly as one would expect when an exception is raised. The exception bubbles up the stack, executing catch and finally blocks as one would expect from any other exception. There are, however, a couple of crucial differences between ThreadAbortException and other Exceptions (with the exception of StackOverflowException, which can’t be caught at all). First, this exception can’t be suppressed by simply catching it – it is automatically rethrown right after exiting the catch clause that caught it. Second, throwing it does not abort the running thread (it must be done via a call to Thread.Abort()).

The source for this behavior of ThreadAbortException is the abort requested flag, which is set when Thread.Abort() is invoked (but not when it is thrown directly). CLR then checks for this flag at certain check-points and proceeds to raise the exception, which normally is raised between any two machine instructions. This guarantees that the exception will not get thrown when executing a finally block or when executing unmanaged code.

So the expectation of the novice interviewee (and Mr. Uppal’s) was right after all. Except, it wasn’t. We are back full circle to the problem between the purpose of aborting a thread, and the possibility of an ill behaved code never giving up at all. I am being too generous when I label code that wouldn’t yield to a request to abort as “ill behaved.” Because ThreadAbortException is automatically rethrown from catch causes, the only way to suppress it is to explicitly call Thread.ResetAbort() which clears the abort requested flag. This is intentional as developers are in the habit of writing catching-all clauses very frequently.

AppDomain hosting

So far we’ve assumed that we just might need to terminate a process, no questions asked. But why would one need to abort individual threads within a process? The answer lies with hosting. In environments such as IIS or SQL servers, the server should be both fast and reliable. This led to the design of compartmentalizing processes beyond threads. AppDomain groups processing units within a single process such that spawning new instances is fast (faster than spawning a complete new process,) but at the same time it’s grouped such that they can be unloaded on demand. When an AppDomain instance takes longer than the configured time (or consumes some resource more than it should,) it’s deemed ill-behaved and the server will want to unload it. Unloading includes aborting every thread within the AppDomain in question.

The problem is yet again one of conflict between guarantees. This time, though, the termination logic needs to play along, or else. When terminating a process, the complete process memory is released, along with all its system resources. If the managed code or CLR don’t do that, the operating system will. In a hosted execution environment, the host wants to have full control over the life-time of an AppDomain (with all its threads,) all the while, when it decides to purge of it, it does not want to destabilize the process or, worse, itself or the system at large. When unloading an AppDomain, the server wants to give it a chance to cleanup and release any shared resources, including files and sockets and synchronization objects (i.e. locks,) to name but a few. This is because the process will continue running, hopefully for a very long time. Hence the behavior of ThreadAbortException that calls every catch and finally as it should.

In return, any process that wants to play rough gets to call Thread.ResetAbort() and go on with its life, thereby defeating the control that the server enjoyed. The server invariantly has the upper hand, of course. After a second limit is exceeded, after invoking Thread.Abort(), in the words of Tarantino, the server may “go medieval” on the misbehaving AppDomain.

Rude Abort/Unload/Exit

When a thread requested to abort doesn’t play along, it warrants rudeness. The CLR allows a host to specify escalation policy in similar events, such that the host would escalate a normal thread abort into a rude thread abort. Similarly, a normal AppDomain unload and process exit may be escalated to a rude ones.

But we all know that the server doesn’t want to be too inconsiderate. It wouldn’t want to jeopardize its stability in the wake of this arms race between it and the hosted AppDomain. For that, it wants to have some guarantees. More stringent guarantees from the code in question that it will not misbehave again, when given half a chance. One such guarantee is that the finalization code will not have ill side-effects.

In a rude thread abort event, the CLR forgoes calling any finally blocks (except those marked as Constrained Execution Regions, or CER for short) as well as any normal finalizer. But unlike mere finally blocks, finalizers are a much more serious bunch. They are functions with consequences. Recall that finalization serves the purpose of releasing system resources. In a completely managed environment, with garbage collection and cleanup, the only resources that needs special care are those that aren’t managed. In an ideal scenario, one wouldn’t need to implement finalizers at all. The case is different when we need to wrap a system resource that is not managed (this includes native DLL invoking). All system resources that are represented by the framework are released precisely using a finalizer.

Admittedly, if we are developing standalone applications, as opposed to hosted, we don’t have to worry about the possibility of escalation and rude abort or unload. But then again, why should we worry about Thread.Abort() at all in such a case? Either our code could issue such a harsh request, which we should avoid like the plague and opt to more civil cancellation methods, such as raising events or setting shared flags (with some special care), or, our code is in a library that may be called either from a standalone application or a hosted one. Only in the latter case must we worry and prepare for the possibility of rude abort/unload.

Critical Finalization and CER

Dispose() is called in finally blocks, either manually us via the using clause. So the only correct way to dispose objects during such an upheaval is to have finalizers on these objects. And not just any finalizer, but Critical Finalizers. This is the same technique used in SafeHandle to ensure that native handles are correctly released in the event of a catastrophic failure.

When things get serious, only finalizers marked as safe are called. Unsurprisingly, attributes are used to mark methods as safe. The contract between CLR and the code is a strict one. First, we communicate how critical a function is, in the face of asynchronous exceptions by marking their reliability. Next, we void our rights to allocate memory, which isn’t trivial, since this is done transparently in some cases such as P/Invoke marshaling, locking and boxing. In addition, the only methods we can call from within a CER block are those with strong reliability guarantees. Virtual methods that aren’t prepared in advance cannot be called either.

This brings us full circle to RuntimeHelpers.PrepareConstrainedRegions(). What this call does is it tells the CLR to fully prepare the proceeding code, by allocating all necessary memory, ensuring there is sufficient stack space, JITing the code, which completely loads any assemblies we may need.

Here is a sample code that demonstrates how this works in practice. When the ReliabilityContract attribute is commented out, the try block is executed before the finally block, which fails. However, with the ReliabilityContract attribute, the PrepareConstrainedRegions() call fails to allocate all necessary memory beforehand and therefore doesn’t even attempt to execute the try clause, nor the finally, instead the exception is thrown immediately.

There are three forms to execute code in Constrained Execution Regions (from the BCL Team Blog):

  • ExecuteCodeWithGuaranteedCleanup, a stack-overflow safe form of a try/finally.
  • A try/finally block preceded immediately by a call to RuntimeHelpers.PrepareConstrainedRegions. The try block is not constrained, but all catch, finally, and fault blocks for that try are.
  • As a critical finalizer – any subclass of CriticalFinalizerObject has a finalizer that is eagerly prepared before an instance of the object is allocated.
    • A special case is SafeHandle’s ReleaseHandle method, a virtual method that is eagerly prepared before the subclass is allocated, and called from SafeHandle’s critical finalizer.

Conclusion

Normally, CLR guarantees clean execution and cleanup (via finally and finalization code) even in the face of asynchronous exceptions and thread abort. However it does preserve the right to take a harsher measure if the host escalates things. Writing code in finally blocks to avoid dealing with the possibility of asynchronous exceptions, while not the best practice, will work. When we abuse this, by reseting abort requests and spending too long in finally blocks, the host will escalate things to rude unload and will aggressively rip the AppDomain with all its threads, bypassing finally blocks, unless in a CER block, as the above code did.

So, finally blocks are not executed when a rude abort/unload is in progress (unless in CER), when the process is terminated (by the operating system), when an unhandled exception is raised (typically in unmanaged code or in the CLR) or in background threads (IsBackground == true) when all foreground threads have exited.

Share
QR Code Business Card