22 February 2015

The System Programmer's Creed

10    The Kernel is my supervisor, I shall not enter Ring Zero.

20    It maketh me to write well-formed code; it leadeth me to respect the hardware.

30    It restoreth my stability: it leadeth me down the paths of protected memory for the sake of its drivers.

40    Yea, though I walk through the valley of the shadow of panics, I shall fear no segfault: for the Kernel is with me; its spinlocks and its syscalls comfort me.

50    It preparest an API before me in the presence of mine kludges: it escalates my privileges; my nice value is negative twenty.

60    Surely uptime and free memory shall follow me all the days of my career; and I will code in harmony with the Kernel forever.

21 February 2015

The Death of Hardware RAID... Good Riddance!

There was a point in the not-too-distant past when hardware-backed RAID storage was an indispensable part of the toolkit of any server administrator.  The basic idea of RAID (Redundant Array of Inexpensive Disks, for the uninitiated) is to mitigate the risk of data loss problems associated with hard disk failures.  You have a set of disks, i.e. an array, and if one (or sometimes if more than one, depending on the configuration) fails, your data is still available... often in a manner transparent to the operating system and/or end user.

Like everything else in computing, RAID moved in generations, with different configurations in different generations having different capabilities:

  • RAID 1: The earliest arrangements that could be considered RAID were shipped in 1983, though the actual term RAID wouldn't be coined for several years.  These earliest configurations consisted of mirrored pairs of drives, i.e. two drives with the same contents that appeared as a single drive to the system.  For two drives of the same size, the total amount of available storage is that of a single drive.  RAID 1, also known as mirroring, is still in common use today for some scenarios.
  • RAID 2: This early implementation was a highly-performant mode that striped individual bits across drives, and increased the available capacity while still providing error correcting via Hamming code CRCs on one or more dedicated drives.
  • RAID 3: A rare implementation using a dedicated parity drive, in which bytes (as opposed to individual bits as in RAID-2) are striped across drives, this level was never deployed widely.
  • RAID 4: Very similar to both RAID-2 and RAID-3, except that entire blocks, instead or bits or bytes, are the unit checksummed and striped across drives
  • RAID 5: The de facto RAID mode for server admins for many years is novel in striping both data and parity across all drives, which improves performance and ensures that all drives wear evenly, unlike levels 2 through 4.  A minimum of three drives is required for RAID 5, with the capacity of approximately two drives available for data.  RAID 6 is almost identical to RAID 5 save that parity is doubly-redundant, allowing up to two drives to fail without compromising data.
There have also been various stacked / combined levels, one of the most notable of which is referred to as RAID 10, a stripe of mirror pairs.

All of these methods also have some serious shortcomings:
  • While all of these systems address the problem of data availability, only the most advanced (and therefore expensive) implementations give a tinker's d*** about data integrity.  A recent well-written article by Jim Salter, including some handy experiments, documents the problems of RAID and bit rot.  The short version: RAID protects against complete disk failure, not against subtle disk corruption.
  • Hardware RAID is (or was, until very recently) both expensive to implement, requiring dedicated controllers, and tedious to maintain, requiring experienced personnel and complex drivers for every supported platform.  It was formerly a hobby of hardware and software manufacturers to gang up on IT professionals by only allowing certain hardware to work with certain software in certain supported deployments.  Of course, those days are long gone, right? *wink, wink, nudge, nudge*
  • Software RAID, while virtually free to implement and much easier than hardware to maintain, suffers from significant performance problems.  It's also very rarely portable across operating systems.
After all of these years, the relationship between IT folks and RAID has turned sour like a shotgun marriage...

So what's the alternative?  Enter next-generation file systems.  That's right- file systems.  Traditionally, file systems just organized files, and it was up to volume management solutions like RAID to keep the file systems consistent and happy.  As the problems with RAID were exposed, and personal computing devices (where RAID is rarely an option) became more widely used, file systems themselves became more resilient to compensate.

The next generation of file systems is a bit different.  The solutions combine both volume management, typically including RAID-like capability, and file systems into a single all-encompassing data storage mechanism.  This allows them to do all sorts of amazing things around data storage, with tricks to both save space and protect data integrity, all while inducing a negligible performance penalty when properly provisioned.  Examples include:
  • The cross-platform ZFS, originally developed by Sun Microsystems and sent out in an open-source lifeboat before the company sank, was the pioneer of next-generation filesystems.  It supports mirroring, striping, and RAID-Z (like RAID-5) and also supports nesting of those different types.  It also supports block checksums, in-line block-level deduplication, live snapshots and rollback, filesystem clones, transparent compression, transparent encryption in some implementations, differential backup and restore, etc. ad nauseam... Short version: it's awesome, there's a Free & Open Source version available via the OpenZFS project, and it's available on most *nix platforms.
  • Linux's Btrfs has a similar feature set, with integrated volume management and RAID-like capabilities, plus snapshots, plus transparent compression (notably with file-level granularity).  Deduplication is done outside of the write path, to conserve memory.  Btrfs also has a special mode of operation for solid-state drives which reduces unnecessary writes, in addition to the TRIM command also supported by some ZFS implementations.  While not as mature as ZFS, Btrfs is quite stable and usable and most environments, and seems to be preferred over the former for Linux usage.
  • A lesser-known option, HAMMER (no connection to the nefarious baddies in the Marvel universe), is available only on DragonflyBSD at the present time.  It has a somewhat limited feature set compared to the previous two options, but still does deduplication, checksums, etc.  RAID-like functionality is not yet implemented directly, but rather through a streaming backup mechanism.
These next-generation filesystem-and-volume-manager combos all attempt to address the shortcomings of the traditional filesystem / volume manager model, and do so rather effectively.  But hang on a second, you ask: we've heard about Solaris, the BSDs, and Linux.  What about the two major consumer operating systems: Mac OS X and Windows?

In terms of major consumer operating systems, while Apple's default HFS+ file system is practically a geriatric patient in technological terms (having not changed much structurally since the mid-90s), there is a stable, well-maintained, and free build of OpenZFS available for Mac OS X.  Recently some work has even been done to make it bootable.  It's not officially blessed (pun intended- please tell me that somebody gets it) by Apple, but it does the trick for those who want and/or need it.

Windows users are almost out of luck!  The most capable file system offered by Microsoft at this time is ReFS which, when combined with Windows Storage Spaces, isn't terrible... but it's also nothing amazing either.  Even an article by a Microsoft Kool-Aid drinker fanboy advocate admits (starting at paragraph 16, for those looking for it) that its feature set is unimpressive compared to those other solutions.  The only selling point that he can offer over ZFS is reduced memory footprint... which any server admin knows is practically a farce when talking about Windows- quite possibly the most notorious modern OS when it comes to memory thrashing.  Its also still a discrete filesystem / volume manager solution, with all of the inherent limitations of that design.  Nevertheless, it still has significant advantages over hardware RAID, like the ability to detect bit-flips with opt-in (seriously?  you have to turn it on manually, from a command line?) block-level checksums.

While there are those out there still defending hardware RAID, they must do so with the caveat that it should no longer be considered the panacea as which it was regarded for many years.  If you're a server admin in 2015, and you're still using hardware RAID as your go-to redundancy solution without any hesitation, you should probably take a good, long look in the mirror and reevaluate your life.  Start by asking which of those solutions above could bring amazing data integrity goodness to your microcosm.

09 February 2015

Selecting the Right Virtualization

In the not-too-distant past, talking about a product "ecosystem" around virtualization would have been like talking about snow in the Sahara- it just didn't exist.  In 2015, though, things have changed.  There are so many different types of virtualization that making the right selection for a particular usage need can be a good way to wind up in the mental ward of your local hospital.  Broadly speaking, here are the different types "summed up," though there is much more that could be said:

  • Imitate hardware - a complete "computer within a computer", high overhead
    • Emulation - instruction-for-instruction imitation of CPU, devices, memory, etc.
      • Pros: very accurate, reliable, and secure
      • Cons: very slow
    • Virtualization - take some shortcuts where possible, imitate the rest
      • Pros: reasonably fast
      • Cons: can sometimes be breached, not all hardware can be used
  • Imitate an operating system - just enough isolation to fool programs, low overhead
    • Full environment - "container"
      • Pros: highly flexible and "multi-purpose"
      • Cons: difficult to set up and maintain
    • Single application - "sandbox"
      • Pros: simple maintenance and deployment
      • Cons: keeping many of them organized is a chore in its own right
There are also several technologies that combine both categories, such as QEMU's "user mode" wrappers that provide architecture instruction set translation like emulators, but don't emulate any hardware and so behave like sandboxes.

I've tried just about every solution under the sun at this point (pun not intended; if you get the reference then you're my new best friend), and the following flowchart is based on my own experiences for what works and what doesn't in any given situation.  Note that I do not attempt to account for very special-purpose cases like cloud computing infrastructure- these are orders of magnitude greater in complexity, and would require a chart far more complex that what can fit here:


Feel free to use / distribute / wipe your a** with the chart as you choose.  It's also worth noting that two of my personal favorites, FreeBSD jails and Solaris containers, are not listed here.  They require some higher-level specialist and/or institutional knowledge that your typical organization will likely not have on staff.  They're also more powerful than any of the other container / sandbox solutions mentioned above, with the *possible* exception of Parallels Virtuozzo.  Sorry, lxc and Docker- you're just not there quite yet, in my own humble opinion.

08 February 2015

Comparison of OpenSolaris Derivatives

After Oracle effectively neutered the OpenSolaris project, many spin-offs of the public source code popped up, like oh-so-many mushrooms in a fallow field.  Many short-lived derivatives have come and gone, and there have been some surprising changes in the ecosystem.  I decided that it was time for a re-evaluation of the available variants.

It should be noted that the goal here is to compare these various OpenSolaris spin-offs against each other, not against other FOSS operating systems.  As such, I focus primarily on the quantifiable traits that differentiate them.  Things that they all have in common are omitted.  Traits like the software packaging system in use are not quantitative, only qualitative, so that's not of interest here (though it is a serious consideration for usability).  Similarly, capabilities relative to other operating system families like GNU/Linux, the BSDs, or Windows are outside the scope of this comparison.

In considering each variant, I looked at the following attributes:

  • Download ease - could a download be quickly and readily obtained from the variant's main page?
  • Version maintenance - has an official release been made within the last 365 days?
  • Documentation availability - could installation and setup documentation be found easily from the download page?
  • Documentation maintenance - does the available documentation cover up to the most recent official release?
  • Boot capability - does the variant support EFI boot on x86-64 "out of the box"
  • Disk label recognition - can the variant read GPT disk labels, at least in a non-boot capacity?
  • VirtIO support as a KVM guest - VirtIO block devices, network devices, memory ballooning, CPU hotplugging, and serial devices are all considered
  • Ability to act as a KVM host - pretty self-explanatory, all via the Joyent illumos-kvm project

Each attribute affords a total of one point- all are boolean, one or zero, with the exception of VirtIO support, with each of the five components counting for 1/5 of a point.  The total ranking is then expressed as a percentage and a letter grade.  Here is the summary (full results available in this PDF):

NameTOTALGradeVersion Tested
SmartOS85%B3783
OpenSXCE85%B2014.05
OmniOS85%B151012
Oracle Solaris65%D11.2
DilOS60%D1.3.7
OpenIndiana58%F151a
Dyson58%F1327
illumian58%F1
Belenix0%IncompleteN/A
Non-Solaris Comparisons:
Fedora100%A21
FreeBSD88%B10.1

I've included the same criteria applied to Fedora 21 and FreeBSD 10.1 for reference only- their capabilities aren't intended to be a part of this analysis.

As we can see, the clear leaders under this evaluation are OpenSXCE, Joyent's SmartOS, and OmniTI's OmniOS, all with a "B" grade at 85%.  Next with a "D" grade are stock Oracle Solaris and DilOS.  OpenIndiana, OS Dyson, and Illumian are close behind those two, though still technically with failing grades.  Belenix could not be evaluated due to the main website being unavailable- it's apparently in the process of moving to a GitHub-hosted website, and the move is not yet complete.

While there are a few stars shining out in this bunch, to somebody who remembers the "golden days" of the OpenSolaris project, I can only reach one conclusion: the FOSS Solaris movement has been fractured.  The best contenders receive a "B" on the evaluation, and two of those three have significant backing from large IT companies.  It seems that no single OpenSolaris-derived project has the community backing required to bring it to premier status, and I'm concerned that without such support, the only FOSS System V derivative may be quickly headed towards fork-and-die oblivion...

03 February 2015

Write once, run anywhere: or, how to ruin a great idea

This is another one of those ideas in technology that sounds absolutely wonderful until you read the fine print.  The concept of "write once, run anywhere" was put forth by Sun (albeit with slightly different wording) in the late paleolithic, i.e. January of 1990, to coincide with the first public release of Java.

In theory, an application could be coded once, compiled into a platform-neutral executable form, then distributed as a bundle.  The "pre-compiling" of the application would also allegedly give significant performance improvements over an interpreted solution.  Here's where Java went horribly, miserably, and disastrously wrong, though- the language and compiled byte-code were platform-neutral, but the virtual machines in which they would eventually be run were decidedly not platform-neutral.

There are dozens of different Java virtual machines out there, from the stock Sun (now Oracle) "J2SE," enterprise "J2EE," and embedded "J2ME" to IBM's in-house Java VM used by applications like the Lotus suite, to OpenJDK which is part Sun / Oracle work and part clean-room reverse engineering.  Then, of course, there's the non-Java VM for which one writes in Java: Dalvik (the main runtime environment on Android devices until 5.0 'Lollipop').  Predictably, each of these supports a slightly (or drastically) different feature set in byte-compiled programs, to say nothing of differences between versions of the same VM.

At this point, you're probably asking yourself, "hey, why does Guy hate Java?"  The answer is straightforward- I don't.  The language itself has quite a bit to recommend it.  The language solution, however, is practically a perfect parable of how not to deploy a suite of application development tools.

So what's the alternative?  For client / server or distributed applications, develop a good cross-platform standardized, interpreted API for the client-side part, and leave the compiled code server-side.  For purely client-side applications, write them in an interpreted language (try Python).

JavaScript and HTML 5 are far from perfect, but they're at least heading in the right direction.  Implementation incompatibilities between browsers are shrinking weekly, and the HTML 5 ecosystem is moving towards a reasonably harmonious and tolerable state.  Yeah, you still have to account for IE vs. Firefox vs. Safari vs. Chrome in your code, but the differences are becoming fewer in number.  That's progress!  Our favorite WORA suite, in contrast, sinks farther into oblivion with each new release.  It's in such bad shape, in fact, that recent versions include a mechanism to automagically launch obsolete VM versions for specific sites (think IE compatibility mode for Java).  It uses deployment rulesets to establish URL <-> VM pairs for the Java browser plugin to reference.

In summary: if you're primarily a Java developer, don't go anywhere.  We do need and will continue to need you for years to come as we sort out this mess.  If you're in that category, however, and you're less than 50 years old, make sure to learn another language as well.  I hear that Fortran has a good market...