Thursday, March 15, 2007

Enduring bits - Unix

Over the past few decades, certain software creations have withstood the test of time. They have not only retained their original charm but also resulted in many derivatives (and imitations :-)) around their basic architectural elements. Unix is one of them. It gave us the concept of unified namespace, files as simple sequence of bytes, standard input/output, pipes, a rich job control language in the form of shell, and a simple syscall interface wrapped around a standard C library.

The software came with a complete toolchain that any programmer could use to look into the innards, extend it or modify it in various ways. Back in the early eighties when I first got my hands on the Unix distribution, the kernel was only about 60,000 lines of code and quite remarkable for its elegant implementation of architectural elements. I am still amazed that just a couple of hours of reading Kernighan and Pike's book on Unix Programming Environment is sufficient for someone to put together complex pipelines for searching, editing and sorting text.

While the hardware capability has gone up a thousand fold, the kernel and the syscall libraries continue to remain simple, flexible and elegant. Of course, the number of variants have grown into thousands today but that is a testament to the strong conceptual integrity and sound architectural foundation.

There is one quirk in Unix that has never been fixed. The unified namespace that works so well with serial, parallel, sound, storage and other devices breaksdown when it comes to network interfaces. Network devices dont register themselves in the standard device tables. They use their own set of tables. There is no special file type that caters to network devices or ports. There is no /dev/eth0 or /dev/tcp/53 or /dev/udp/80. Network devices use their own set of i/o system calls.

I can transfer a file to disk by "cp hello.txt /media/sda/hello.txt" and not have to worry about the physical structure on the disk, but I cannot stream a file to another machine over a network with "cp hello.txt /dev/tcp/9000". I can restrict a bunch of users from using a CDROM, but there is not /dev/eth1 whose permissions I can set on a owner or group basis.

Is it because the network stuff was designed in Berkeley on the West Coast while the namespace stuff was done by Unix Systems Lab on the East Coast? When teams cannot (or should not) communicate amongst themselves on a regular basis, architectural convergence is difficult to achieve (see Conway's Law).

The Video drivers and X11 is another sub-system that is horribly divergent from the clean namespace interface. Taking a screenshot should have been as simple as copying /dev/window to a disk file. Now we have hundreds of system calls and each sub-system comes with its own set of tools.

Plan 9 set out to fix the problem, but it was hobbled by a restrictive license till a few years back. It continues to plod along. I guess Unix is so well-entrenched in the market that people are willing to put up with its quirks rather than take up something radically different. But then, the distributed operations intrinsic to Plan 9 could tilt the balance in its favor with the arrival of multi-core chips.

Perhaps, somewhere, some student could be cooking up a disruptive innovation with Plan 9.

Monday, March 12, 2007

Buzzwords and oxymoron

Of late, there is a deluge in buzzwords and oxymorons in technical press. Here is an example from EETimes (emphasis mine):
Intel on Monday introduced its first solid-state drive, a device that uses NAND flash memory for common PC or embedded application operations, instead of the slower spinning platters common in traditional hard-disk drives.

The Z-U130 offers a faster storage alternative for locating boot code, operating systems, and commonly accessed libraries. The drive, which has no moving parts and is available in 1-Gbyte to 8-Gbyte densities....
A solid-state memory is built with silicon chips and doesn't have a motor, so where is the question of a drive or a disk ? This is no printer's devil. The article goes on to talk about a drive with no moving parts. What exactly were the news editors doing when checking this article?

I can understand if companies use terms like pen drives or thumb drives to push their flash memory sticks. They are trying to sell into a market dominated by disk drives. But, professional journals like IEEE and ACM have no excuse for using terms like USB drive for purely solid-state memory sticks. The disease seems to afflict only flash memory with USB interfaces. Others terms like SecureDigital or CompactFlash or MMC are treated propertly.

Mmmm... should a flash stick develop i/o errors, will a few drops of mustard oil get it going again smoothly :-)?