2023-02-21

Pipelines Considered Harmful

Today, we’re gonna have a closer look at pipes and what i consider “harmful” about them. Because i just love spicy takes that make a certain group of people with unreasonably strong opinions about computers really angry.

UNIX pipelines, or just pipes for short, have been around for over half a century. The fact alone that they still see widespread use today is a manifestation of the genius behind their idea. However, they have some limitations that we apparently just accepted to have to live with.

What Is A Pipe?

Before we dive in, though, let’s briefly go over what pipelines are and what makes them so useful, just to make sure we’re all on the same page. I know that you know what a pipe is, but i promise the following paragraphs are relevant to my point.

Pipelines are an IPC mechanism that (at least in this form) first appeared in the UNIX operating system. They consist of a chain of multiple processes, where the standard output of the first one feeds into the standard input of the next one, and so on.

In conjunction with the basic UNIX shell tools like grep(1) or sort(1), and even powerful stream processors like awk(1), they enable composing sophisticated scripts within just a single line. This embracing of modularity over monolithic designs is a core aspect of the UNIX philosophy. And it makes perfect sense from the perspective of a programmer like Douglas McIlroy (the dude who came up with pipes) because programming just so happens to be all about abstraction and keeping things modular.

Plaintext Is Simple, Stupid

An interesting aspect of UNIX command-line tools is that they operate predominantly on plaintext. And this is no coincidence. Peter H. Salus summarized Ilroy’s documentation of the UNIX philosophy as follows:

Make programs that do one thing and do it well.
Write programs to work together.
Write programs to handle text streams, because that is a universal interface.

What i’m obviously referring to is the third point, the one about plaintext. And i’m not saying it’s wrong, but i want you to take a moment to think about this. At the time, the most complicated data you probably ever had to deal with was the output of ls -l. What’s more, the vastness of different kinds of data has exploded since then.

To give a practical example of where i’m headed, i have a small script that runs whenever i open up a new shell. It fetches my canary message using curl(1), and extracts its last-modified HTTP header. It then compares this header with the current time and, if it is more than six days in the past, prints a message to the standard output reminding me to update the canary.

Sounds simple, right? I must admit there is probably a more elegant solution, but the entire point of pipelines is to avoid having to spend too much time with such seemingly simple tasks. Also, the version below is somewhat simplified to make it easier to read; my original version is a true one-liner. Anyway, here goes:

SIX_DAYS=518400
CANARY_DATE=`curl -I https://fef.moe/canary.txt |
	grep -i last-modified |
	cut -c 16-` # cut off the "last-modified: " part
[ $(echo `date +%s` - `date -d "$CANARY_DATE" +%s` | bc) -ge $SIX_DAYS ] &&
	echo "update your canary ffs"

This works. However, i believe it is unnecessarily complex. It is the year 2023, and for some reason we still use the same shells as in the 1970s. Sure, zsh makes your life way easier, but no matter what you use, you’re still limited to piping text from one program to another.

Yes, HTTP response headers can be represented as plaintext; here is the output from the curl command in the example above:

$ curl -I https://fef.moe/canary.txt
HTTP/2 200
server: nginx/1.22.1
date: Tue, 21 Feb 2023 15:11:30 GMT
content-type: text/plain; charset=utf-8
content-length: 3347
last-modified: Tue, 21 Feb 2023 01:35:24 GMT
vary: Accept-Encoding

(remaining headers omitted for brevity)

The goto Of Data

But do we have to do it like that?

The thing that makes pipelines “harmful” to me is that plaintext is just too dynamic. It could be anything. Parsing is a wildly complex problem of computer science, so why would we do that if there was an easier way?

In his original article, Dijkstra considered the goto statement harmful because it lacks the clear structure of an if statement. This might be a little far-fetched, but you could view plaintext as the “goto of data”.

Getting Spicy

Computers have become powerful enough that performance pretty much isn’t an issue for command-line tools anymore. Except for dnf; that one is just horrible.

Microsoft have been exploring other ways of piping data from one program to another: objects. Now, i don’t really like PowerShell because it has several other poor design decisions, and the fact that it is a Microsoft invention is generally a gigantic red flag. However, i do believe that we could learn something from it. Why can’t my shell script look something like this:

SIX_DAYS=518400
CANARY_DATE=`curl -I https://fef.moe/canary.txt`.headers['last-modified']
[ $(echo `date +%s` - `date -d "$CANARY_DATE" +%s` | bc) -ge $SIX_DAYS ] &&
	echo "update your canary ffs"

I’m not saying this syntax is good, what i care about is the idea. That program output can have a machine-readable structure, kind of like a JSON object. If someone were to design a specification for UNIX objects and included a way to transform these objects back into plaintext for backwards compatibility, i believe we would have a much easier time dealing with computers. Basically, something like this:

struct attribute {
	char *name;
	char *value;
};

struct object {
	struct attribute *attributes;
	char *(*to_string)(struct object *);
};

The to_string method would simply transform the object into the old-school textual representation. An actual implementation would of course have to deal with nested attributes and so on, but again, this is about the basic idea and not the details.

Computer Science Is A Process

Don’t get me wrong. As i clearly stated at the beginning of this article, i find the idea of pipelines as they exist today nothing short of ingenious. However, the field of programming is for some reason incredibly conservative. Why wouldn’t we challenge ideas of the past, even if they work well for most use-cases? All that this “if it ain’t broke, don’t fix it” mentality has given us is an inability to spot shortcomings and limitations of existing technology. And that is, in my IMO, just sad.

I’m not saying we should throw away everything. Just that maybe we should reevaluate our design decisions a little more often, and not be afraid of trying wild new concepts.

tags: tech