fuzzy notepad

[blog] The curious case of the switch statement

Sometimes, I lie awake at night thinking about programming languages.

That’s all the intro I’ve got here, sorry. I felt like writing about the switch statement for some reason.

Some history

1958: ALGOL 58

The earliest incarnation I can find is in ALGOL 58. The original description of ALGOL 58 is a fascinating read — it’s written like a math paper, with literal text in italics and heavy use of subscripts. Character classes are even named by Greek letters, with λ representing letters and so on. The example program at the end is completely incomprehensible, with almost every variable being a single letter and labels forming their own entire column on the left side. I guess that last bit came from FORTRAN.

I’m not surprised by this, what with ALGOL’s being originally conceived as a universal way to express mathematical algorithms — “algebraic” is right in the name, after all — but seeing it in practice is a little surreal. I can’t imagine a programming language being described this way nowadays.

ALGOL 58’s version of switch is a little different from the one we’ve all come to know and love. It’s not a statement at all; it’s a declaration. Switches are their own kind of thing, in the same way that variables and labels are different kinds of thing. (Though, curiously, it seems you can return a label — but not a switch! — from a procedure in ALGOL 58.)

1
switch I := (D₁, D₂, ..., Dₙ)

A switch is used with the go to statement, along with a subscript indicating which Dᵢ to select. go to I[2] would thus be equivalent to go to D₂, except that the subscript could be computed at runtime.

Each Dᵢ is a “designational expression”, which is a funny way of saying “a thing you can use with go to” — either a label or another switch-plus-subscript.

This very first appearance of switch was a very simple form of computed goto: a list of places to jump to that you could index at runtime.

Please also enjoy the two example uses of the go to statement from the original report.

go to HADES

go to exit [(i ↑ 2 ↓ -j ↑ 2 ↓ +I)/2], where exit refers to the declaration
switch exit :=(D₁, D₂, ~~~, Dₙ)

1960: ALGOL 60

ALGOL 58 was soon superseded by ALGOL 60, which has a slightly more modern-looking description. The basic idea behind switch as a declaration stayed the same, but it learned a few new tricks.

Inline if ... then ... else ... became a form of arithmetic expression, meaning it could be used as a subscript, even on a switch. Along the same lines, anywhere a “designational expression” was expected — including within a switch declaration or as the argument to go to — an inline if could be substituted. The above-linked report includes these examples:

switch S:=S1,S2,Q[m], if v>-5 then S3 else S4

switch Q:=p1,w

It also clarifies something I wasn’t sure about from the ALGOL 58 report: the expressions in a switch are evaluated when the jump is made, not when the switch is declared!

Finally, both switches and individual labels could be passed around as arguments. Don’t see that much any more.

1963: CPL

CPL — originally Cambridge Programming Language, then later Combined Programming Language when London joined in — was heavily inspired by ALGOL 60, but sought to expand beyond mathematical algorithms and into more general programming. It was described somewhat informally by a paper in 1963, but never saw much widespread use.

CPL dropped switches entirely, in favor of having labels themselves be a kind of storable variable. The paper gives an example, where “Switch” is the name of a variable and not a keyword:

label Switch = x < 0 → L1, L2

go to Switch

(Here, →, is effectively the ternary operator, the equivalent to C’s ?:. a → b, c means b if a is true, c otherwise.)

Since the first statement is a regular assignment, the right-hand side is evaluated immediately. The paper doesn’t delve very deeply into this, but since labels were first-class values, you could presumably choose a label using whatever logic you wanted and then jump to it. Arrays could contain any type, too, so I assume the direct equivalent to ALGOL 60’s switch was merely an array of labels. Certainly much more flexible. If you’re cringing right now, don’t worry — you can do this in C, too! Keep readin’.

CPL also allowed referring to (and thus jumping to) a label in the middle of a different scope, as long as the label was prefixed with the containing block’s label, as BLOCK at LABEL. How convenient!

1966: ALGOL W

ALGOL W was a big step towards something recognizable as a modern language. It brought us such revolutionary innovations as null, a value that crashes your program.

It also dropped the switch kind in favor of a new compound statement, case, which was sort of like the older switch without the labels. case looked like this:

1
2
3
4
5
case X of begin
    WRITE("X is 1");
    WRITE("X is 2");
    WRITE("X is 3")
end

Yes, case was a regular block, and the expression indicated which one of its statements to run. Any individual statement could of course be a bare begin ... end block wrapping multiple sub-statements, so a case wasn’t limited to only executing one literal line of code. Still, that seems like it would be impossible to read with more than two or three cases.

Similar to ALGOL 60’s inline if ... else ..., ALGOL W had an inline case ... of:

1
case EXPR of (A, B, C, ...)

This evaluated to one of A, B, etc., as determined by the value of EXPR. In most modern languages, you’d do this by indexing an array literal.

1967: BCPL

CPL was a “large” language for the time, which is a strange thing to say when the paper describing it is the shortest I’ve read today. I can’t very well look at it through 1963 eyes, so I’ll take Wikipedia’s word that it was too big to be fully implemented until a good few years later.

In the meantime, someone else at Cambridge thought there were some good ideas in there, so he lopped off all the parts that were hard to implement and called the result “BCPL”, for “Bootstrap CPL” (as the original compiler was written in BCPL) and later “Basic CPL”.

BCPL was the origin of a few interesting ideas we now take for granted: it was the first language to delimit blocks with curly braces (or the $( and $) digraphs), and possibly the first to compile to an intermediate object code before compiling to machine code (which made porting much easier, and is roughly how LLVM and GCC work). It also treated everything as integers and had no type-checking, so, make of that what you will.

As far as I can tell, BCPL was also the origin of the modern switch statement. The BCPL reference manual describes the following block:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
switchon EXPR into {
    ...
    ...
    case CONST:
    ...
    ...
    default:
    ...
    ...
}

Boy, doesn’t that look familiar! It works exactly as you’d expect, and the manual even mentions that the exact compilation may vary depending on the number of cases. I don’t think break worked in switchon, but since the behavior is described solely as a jump and nothing else, you’d have needed something to prevent fallthrough.

Curiously, BCPL also had label-type variables, so it seems that switchon was a novel addition. BCPL was first implemented only a year after ALGOL W arrived with the case statement, but I can’t find any further details on what inspired this particular design.

1969: B

Now we’re nearing the history that most computer people are familiar with. B confused the next three generations’ worth of budding programmers by changing assignment from := to =, but made up for it by introducing compound assignment operators, but then ruined it again by inventing post/pre-increment/decrement.

The B manual describes the switch statement as “the most complicated statement in B”. The general syntax is switch EXPR STATEMENT — the statement doesn’t even have to be a block, but may be. I believe it would look like this:

1
2
3
4
5
6
7
8
switch X {
    case 1;
    printf("X is 1*n");
    case 2;
    printf("X is 2 (or 1)*n");
    case 3;
    printf("X is 3 (or 2, or 1)*n");
}

Most notably, case seems to have become a statement all its own, rather than a funny kind of label. At least, I think so. The manual wasn’t scanned particularly well, and the punctuation after case is underlined, so it might be a semicolon or a colon with a bit of lint under it.

In a step backwards from BCPL, I see no mention of a default label; if none of the cases match, the entire block is just skipped.

Again, since these are just jumps and break hasn’t been overloaded to work here yet (in fact, it didn’t seem to exist in the language at all!), there’s unavoidable fallthrough. The manual contains an example program that uses a switch, and every case ends with an explicit goto out of the block.

Oh, and B used * for escape characters in strings. Imagine what might’ve been.

1972: C

And then we had C and that’s the end of the story. I don’t own a copy of The C Programming Language First Edition (gasp!), but several people have confirmed to me that even the original version of C had the modern switch statement, complete with the break overload.

The modern switch statement

As we’ve seen, the canonical switch statement has had basically the same form for 49 years. The special case labels are based on syntax derived directly from fixed-layout FORTRAN on punchcards in 1957, several months before my father was born.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
switch (EXPR) {
    case 1:
        ...
        break;
    case 2:
        ...
        break;
    case 3:
        ...
        break;
    default:
        ...
}

I hate it.

Okay, “hate” is a strong word. I’m not a huge fan. From a language design standpoint, it sticks out like a sore thumb. I’ve never seen a particularly pleasing design for it, and it baffles me that we still have dedicated syntax for this weird thing — and frequent requests to add it to languages that lack it.

Some objections

It’s not like anything else in the language. Consider how almost every other kind of block works, especially in languages like C where a block and a single statement are interchangeable.

1
2
3
4
stmt ... {
    ...;
    ...;
}

if, do, while, for, even function definitions all work pretty much the same way: the block is just some stuff, and the exact contents are irrelevant. The semantics are always of the form: something happens, then the block runs, then something else happens.

switch changes the rules. The block introduced by a switch is allowed to use special syntax that can’t appear anywhere else. A programmer sprinkles this unique syntax — which sorta-kinda-half slices the block into multiple blocks — throughout, and the switch header inspects the block’s contents. It’s invasive in a way nothing else is.

It splits an expression in half. Semantically, a switch asks: which of these values is equal to this expression? That is, for some expression x and a set of values a, b, c, …, which of x == a, x == b, x == c, … is true?

Those comparisons are lost in a switch, obscured by a construct designed to obscure them. This is C, remember, the language that makes you explicitly write out all the parts of a for, yet it has a statement whose sole purpose is to hide some == from you. That’s weird!

Overloading break is clumsy. Granted, C overloads just about every keyword it has; static alone has at least thirteen meanings.

Normally, break means to immediately jump to the end of a looping block and ignore its loop condition. That distinguishes it from continue. A switch is not a loop.

This would make perfect sense if you were allowed to break out of any arbitrary block. Perl 5, for example, has roughly equivalent next and last statements that you can use even inside bare blocks (though, curiously, not inside an if). If Perl 5 had a switch, last should sensibly jump out of it, since that’s the default behavior. In C, this is a special case… to fix a problem that shouldn’t exist in the first place:

Fallthrough by default is a horrible idea. The vast majority of the time, you don’t want fallthrough. So why is fallthrough the default, and the common case needs opting into?

Yes, I know switch is fundamentally just a jump. But it’s a jump to a thing that looks like a label yet has a non-opaque value in it… unless such a label doesn’t exist, in which case it’s a jump to a label with a reserved name… unless that doesn’t exist either, in which case it’s a jump past the end of the block.

Oh! A jump past the end of the block. switch can already do that! Why not do it automatically after each case?

It’s so useful for parsers!” I hear you cry. Hang on! This behavior wouldn’t even affect 99% of cases that want fallthrough. Consider this tiny tokenizer in a hypothetical world where break isn’t necessary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
switch (ch) {
    case ' ':
    case '\t':
    case '\n':
        tokentype = WHITESPACE;

    case '(':
        tokentype = OPENPAREN;

    ...
}

This would still work exactly as you want. When ch be a space, control would jump to tokentype = WHITESPACE;. Why? Because labels can only mark a statement, not other labels — and case is a label, not a statement! Jumping to a label always means jumping to the next statement that appears after it. Effectively, that assignment has three labels.

How would you create an empty case, then? The same way you create any other empty case: a lone semicolon.

It’s not really all that useful. The C-style switch can only match one thing at a time. If you want to match multiple literals, you can use multiple cases. That’s all you get.

It’s handy if you’re writing a parser, I suppose. By hand. Via recursive-descent. In C. In 1972.

It might also be useful for dealing with really big enums that indicate a lot of different behavior.

Offhand, I can’t think of anything else. I sympathize with the frustrations of people who reflexively want to reach for it in a language that doesn’t have it, but I find it hard to believe that it’s the best tool for the job very often.

It’s not even useful as an optimization any more; I’m pretty sure our compilers are plenty smart enough by now to recognize an if tree and turn it into a jump table or binary search or whatever. It’s one of those things that feels like it’ll make a program faster, but almost certainly won’t make any difference whatsoever.

It spread. A few language designers in the 90s made a point of inheriting all of C’s mistakes: nulls, bitwise operators with nonsense precedence, C-style for loops, and of course the switch statement.

I’m looking at you, Java and JavaScript.

What’s particularly curious is that neither language has goto — so why does either language need special computed-jump syntax? I’ve seen a switch in JavaScript maybe once.

Also, just like C, both languages have had preposterous amounts of effort put into their compilation. They also both have better “string” primitives than C and regexes built in, making switch not nearly as useful for writing parsers; JavaScript doesn’t even have enums, and Java only got them fairly recently.

Potentially fixing it

None of those are intended as huge complaints, and I’m not proposing we start a petition to remove switch from the next version of C or anything. I merely object… aesthetically. (And if we were really going to change C to match my aesthetic preferences, we’d start with something much more important, such as ditching the braces.)

People occasionally ask for it in Python, and that gets me thinking about how it might be made to fit there. Python doesn’t have labels, so presumably this would have entire blocks.

1
2
3
4
5
6
7
switch x:
    case 1:
        print("x is 1")
    case 2:
        print("x is 2")
    default:
        print("dunno")

The double indentation here is awful. I also don’t much like the nebulous scope inside switch. What is that? Can anything other than a case block go there? Every other kind of block in Python can contain completely arbitrary code, so this would be like nothing else in the language.

Okay, so let’s try outdenting the cases, like a try with multiple except clauses.

1
2
3
4
5
6
7
switch x:
case 1:
    print("x is 1")
case 2:
    print("x is 2")
default:
    print("dunno")

This is more consistent with the rest of the language, but now the block immediately following switch is empty, which is disallowed everywhere else. Maybe the default case could go there?

1
2
3
4
5
6
switch x:
    print("dunno")
case 1:
    print("x is 1")
case 2:
    print("x is 2")

I could kind of get behind this, but it’s backwards from how most people think about switch (the default case is virtually always written last) as well as backwards from elif and except, the other two kinds of block that can be stacked this way.

A bigger problem is fallthrough — these are now distinct blocks, not labels, so my previous argument doesn’t apply. A couple languages have the convention of using continue to mean explicit fallthrough, which would give us:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
switch x:
    print("dunno")
case 1:
    print("x is 1")
case 2:
    print("x is 2")
case 3:
    continue
case 4:
    continue
case 5:
    print("x is 3, 4, or 5")

If the goal here was to save typing and/or vertical space, I don’t think we’re succeeding.

Alternatively, a completely empty case block could be treated as meaning fallthrough, and a no-op block could use pass. That would leave us with empty blocks again, though.

Now I’m stuck. I only had so much syntax to work with in the first place, and it doesn’t fit the weird expectations of switch. Which is probably why Python doesn’t have a switch block.

Here’s another question: what order is the comparison done in? That is, given switch s and case c, do we check s == c or c == s? You might argue the former because s appears first, and I wouldn’t disagree, but that’s not a particularly solid reason. And what if you wanted it the other way? It shouldn’t matter, but it can, because Python has operator overloading.

Alternatives

A few languages have put interesting spins on switch, and they probably have better ideas than I do. Here are some I happen to know about.

I note that not a single one of these alternatives has default fallthrough.

Bourne shell

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
case $x in
1)
    echo 'x is 1'
    ;;
2)
    echo 'x is 2'
    ;;
3|4|5)
    echo 'x is 3, 4, or 5'
    ;;
*)
    echo 'dunno'
    ;;
esac

The syntax is totally weird, but perhaps no more weird than anything else in shell.

I believe each case must have a terminator, so there’s no risk of accidental fallthrough. The use of | to separate multiple patterns takes care of the multiple-value case.

* isn’t a special symbol meaning “default”; each case is actually a pattern, so the * is a wildcard that will match anything not matched thusfar. I approve.

Visual Basic

This is .NET, but I believe it’s been the same since before then.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Select Case x
    Case 1
        Debug.WriteLine("x is 1")
    Case 2
        Debug.WriteLine("x is 2")
    Case 3 To 5
        Debug.WriteLine("x is 3, 4, or 5")
    Case Else
        Debug.WriteLine("dunno")
End Select

Leaving aside Visual Basic’s insistence on Writing Everything In Title Case, wow, this is weird.

The case expressions have their own subsyntax that I don’t think exists anywhere else in the language. Each one is a comma-separated list of one or more expressions, a To b range expressions, or Is <= expression comparison thing. That last one is extra weird, especially since the Is is optional, so you can just write Case > 12. It’s convenient, sure, but what a strange thing to implement solely for one kind of block.

Case Else is probably the worst way to spell that. And it looks like the Case after Select is optional? What is even going on here.

Ruby

Ruby has a case statement, or maybe it’s a method, who can even tell.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
case x
when 1
    puts "x is 1"
when 2
    puts "x is 2"
when 3, 4, 5
    puts "x is 3, 4, or 5"
else
    puts "dunno"
end

I heartily approve of the use of else here.

when has only slightly special syntax; listing multiple values separated by commas allows matching against any of them. You can do some other tricks, though, because Ruby doesn’t actually test with normal equality; it uses the terribly-named === operator, which is a slightly fuzzy kind of matching deemed to be helpful sometimes.

For example, the last when above could be written:

1
2
when 3..5
    puts "x is 3, 4, or 5"

The .. operator constructs a Range, and Range implements handwave-equality by checking for inclusion, as well.

This seems a little hokey to me — what if you want to case on a Range? — but I can’t find an exhaustive list of every non-equality use of === in core Ruby. Maybe it works out okay in practice.

Perl 6

Perl 5, historically, did not have a switch statement. Perl 6, in its quest to include literally every single language feature ever conceived, sought to remedy this.

As the documentation explains,

The given statement is Perl 6’s topicalizing keyword in a similar way that switch topicalizes in languages such as C.

Yes, exactly.

Perl 6’s answer to switch is composed of three parts.

given EXPR { ... } assigns EXPR to $_. You might know $_ from Perl 5, where it’s the “default” variable in a number of cases. In Perl 6, it acts much the same way, except that it became much more defaultier. For example, you can do a method call without an invocant: .foo is equivalent to $_.foo.

when EXPR { ... } performs a “smart match” of $_ against EXPR using the ~~ operator. As in Ruby, the precise semantics vary, and are about what you’d expect except when they’re not. If the match succeeds, the block runs, and then control breaks out of the containing block. If the match fails, nothing happens.

default { ... } runs a block and then breaks out of the containing block.

Used together, you get:

1
2
3
4
5
6
given $x {
    when 1 { say "x is 1" }
    when 2 { say "x is 2" }
    when 3 | 4 | 5 { say "x is 3, 4, or 5" }
    default { say "dunno" }
}

The magical thing about this is that none of these parts are required to be used together. A when or default block can appear anywhere, and when will merrily match against whatever happens to be in $_. Any arbitrary code can appear inside a given. Even that 3 | 4 | 5 thing? That’s not special when-only syntax; that’s a junction, a first-class type of value that is all three numbers simultaneously. Perl 6 is truly the realization of Perl 5’s mission: to be startlingly consistent, and also just plain startling.

If you need more complex conditions, Perl 6 has a kind of implicit lambda thing when you write an expression containing * as an atom. I don’t remember exactly how it works, but you can write when * > 12.

Fallthrough is possible, too: a proceed statement will immediately leave a when or default block and continue on after the block, rather than leaving the enclosing block. It’s only allowed inside those two blocks, alas.

Perl 5

Not to be outdone by… itself?, Perl 5 has two switch statements.

The first is a module in the standard library, Switch. Much like Perl 6’s smartmatch, it hardcodes a number of different comparisons depending on the shapes of the operands — including automatically calling functions and considering it a match if the function returns true. Otherwise, it looks fairly similar to the C construct:

1
2
3
4
5
6
switch ($x) {
    case 1 { say "x is 1" }
    case 2 { say "x is 2" }
    case [3..5] { say "x is 3, 4, or 5" }
    else { say "dunno" }
}

The [3..5] is the regular Perl range operator .., inside a regular Perl arrayref. One of the hardcoded comparisons is that if one side is an arrayref and the other is a scalar, case tests for array containment.

Since this isn’t an actual language feature and is implemented in a module… you don’t want to know how it works. Really. Don’t ask. (It rewrites your source code when you import it.)

The native equivalent came with Perl 5.10, the first of the regular Perl 5 releases. You have to opt into it by asking for a Perl version of 5.10 or later — use v5.20; or similar — and then you have pretty much the same given, when, and default as Perl 6. Perl 5 even has smartmatch.

It appears that all three blocks, as well as smartmatch itself, were later marked “experimental” and produce warnings when you try to use them. I’ve heard quiet grumblings about smartmatch’s unpredictability since it was first added to Perl 5.10, which is why Ruby’s similar thing gave me pause. Seriously, y’all, just add an in operator; it’ll solve most of this problem.

Rust

Rust has a match block with its own sub-syntax:

1
2
3
4
5
6
match x {
    1 => println!("x is 1"),
    2 => println!("x is 2"),
    3 | 4 | 5 => println!("x is 3, 4, or 5"),
    _ => println!("dunno"),
}

Like most block constructs in Rust, match can be used as an expression as well. Each statement can likewise be a braced block instead of a single statement.

The stuff on the left side of the => is a pattern, a special syntax with a few bells and whistles. Matching against any of several values with | is one, but you can also give a range with ..., do destructuring assignment, express more complex conditions, and… some other stuff.

A really nice thing here is that _ isn’t special-cased to mean “default”; it’s only “destructuring” to a single variable, which can’t possibly fail. (The name _ is special in the language overall, in that Rust will never create a variable named _, but this isn’t unique to patterns; you can use _ in other forms of destructuring to mean “I don’t actually need this part”.)

My favorite thing about Rust’s match is that it forces the matching to be exhaustive — every possible value of the expression must match one of the patterns you give. If you try to match an enum but forget one of the possible values, your code will not compile.

My least favorite thing is the sprinkle of commas down the right side, which is a travesty.

And others

A lot of languages have switch statements, but most of them are fairly similar and I’m not going to go through them all! I’m sure plenty of commenters will mention their favorites.

In particular, the functional programming sphere makes very heavy use of pattern matching, often much more powerfully than Rust. I believe Ruby’s case came from Pascal, the Betamax to C’s VHS, so it’s a glimpse at what we could’ve had.

Duff's device

Finally, I can’t very well talk about switch without mentioning Duff’s device, the greatest argument against C’s switch syntax. If you haven’t seen it before, have fun unraveling this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
send(to, from, count)
register short *to, *from;
register count;
{
    register n = (count + 7) / 8;
    switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }
}

If you like when I write words, you can fund future wordsmithing (and other endeavors) by throwing a couple bucks at my Patreon!

(illus. by Rumwik)

Comments