August 23, 2011
|
3
A note from the Editor in Chief:
Scientific American is celebrating its 166th year. With its history as the longest-continuously published magazine in the U.S., it’s probably no surprise that it has touched the lives and career paths of many readers—including the scientists who write articles for us and whose work we cover. So, as often happens, when I met Peter Norvig, director of research for Google, while we were serving as judges for the Google Science Fair, we got to chatting about Scientific American. He mentioned how influential the magazine had been for him personally. And while the most inspiring article to him proved right in many ways, it also ended up being wrong in others. I said something like, “That’s really interesting. I’d like to know what those are—and I’ll bet others would, too. Would you like to write about that for us?”
I am delighted to share Norvig’s fascinating and detailed response. If the programming details whet your appetite for more of his thinking, you may even want to consider signing up for his fall introduction-to-artificial-intelligence course at Stanford, which he and colleague Sebastian Thrun are offering for free online.
From time to time, we’ll offer other insights about Scientific American—both from scientists who are working in the research world that the magazine shares with its readers, as well as from our staff and contributors, who can give you the inside scoop on new projects we’re working on. As always, we’d love to have your comments and feedback. And if you’ve been inspired by Scientific American, please share your story with us: submit@sciam.com
— Mariette DiChristina
Systems Analysis and Programming: Thoughts from the Attic
As a teenager in the early 1970s, I enjoyed going up to the attic and looking through old stacks of Scientific American magazines. (We didn’t have Wikipedia back then.) Most coveted were the September issues, which were dedicated to a single topic area. I remember reading issues about Life (1961), The Universe (1956) and Mathematics (1964).
![]() Life (1961) |
![]() The Universe (1956) |
![]() Mathematics (1964) |
![]() Information (1966) |
But the issue that had the biggest effect on me was the September 1966 on Information (which I read about 40 years ago). The issue featured a terrific collection of authors who are now acknowledged as pioneering leaders in computer science: Evans and Sutherland explaining computer hardware; Fano and Corbato on operating systems; Tony Oettinger describing his natural language parser; and the two giants of my own subfield (Artificial Intelligence), McCarthy and Minsky, on Information Theory and AI. If I had somehow been able to comprehend everything in this issue, I could have cut a decade’s time off my education in Computer Science.
In this essay I’ll concentrate on one article from the issue: Christopher Strachey‘s contribution on “System Analysis and Programming.” At the time I had seen only a few snippets of BASIC code—nothing more than a few lines. This short article by Strachey was my first introduction to a high-level programming language and the first non-trivial program I’d ever read: a checkers-playing program. When I rediscovered this article recently, I was surprised to find two things: (1) the programming language and programming style are thoroughly modern, and (2) there is a serious mismatch between the design and the implementation, or the systems analysis and programming as Strachey calls it. Let’s look at what Strachey did surprisingly well, what he got wrong, and what he didn’t include at all.
The Good, The Bad and the Missing
Although Strachey wrote his article more than 40 years ago, it was surprisingly prescient and forward-thinking in many ways. We’ll start with some mostly good parts:
“The trouble, I think, is that so many educational processes put a high premium on getting the correct answer the first time. If you give the wrong answer on an examination question, you lose your mark and that is the end of the matter. If you make a mistake in writing your program—or indeed, in many other situations outside a classroom—it is by no means a catastrophe; you do, however, have to find your error and put it right; maybe it would be better if more academic teaching adopted this attitude also.”
How much better would the world be today if everyone had adopted Strachey’s approach 45 years ago?
Now we’ll focus a bit more on some of the negatives, which mostly have to do with a mismatch between the description of the program in the text and the implementation in the code:
For example, here’s how Strachey implements ChosenPosition, the function that chooses the best position to move to from the current position P:
ChosenPosition(P) = value of
§ let L = LegalPositionsFrom(P)
if Null(L) then Resign
let (p,v) = BestPosition(NIL, - ∞,L,0)
result is p §
BestPosition(P,V,L,d) = Null(L) → (P,V), value of
§ let (p, l) = Next(L)
let v = - PositionValue(p,d + 1)
result is (v > V) → BestPosition(p,v,l,d),
BestPosition(P,V,l,d) §
A reader needs to recognize that the first two arguments to BestPosition are holding the best position and value found so far, and the third argument, L, is a list of the remaining positions left to consider. After studying the function for a bit, one realizes that it is computing the maximum-valued position according to the function PositionValue (with the complication that PositionValue gives the value for the player to move, so you need to negate the value of the position to your opponent). It takes some mental effort to arrive at this understanding.
I would prefer an approach that makes it immediately obvious that ChosenPosition is computing the maximum according to PositionValue, so I would code it as follows:
ChosenPosition(P) = Max(LegalPositions(P), PositionValue)
Here I am assuming a slightly different representation of positions where the depth is encoded as part of the position (and you can handle the negation of opponent scores by noting whether the depth is odd or even). The function Max takes two arguments: a list of items and a function that computes the value of an item. Max returns the item with the highest value according to this function. I think this version of ChosenPosition is easier to understand. The function max in this form is built-in to the language Python, but could easily be defined in CPL as follows:
Max(Items, ValueFunction) = value of
§ (Best, BestVal) = (NIL, -∞)
while Items do §
(Item, Val) = (Head(Items), ValueFunction(Head(Items)))
if Val > BestVal then (Best, BestVal) := (Item, Val)
Items := Rest(Items) §
result is Best §
My implementation does two things differently than Strachey. First, separation of concerns: I’ve abstracted out the idea of finding a maximum from the specifics of findings the legal position with the maximum value, while Strachey conflates the two ideas. Second, re-use: Once I have defined Max, I can use it again and again.
Now let’s shift gears a bit and consider, from a modern viewpoint, what Strachey left out that would be covered in a discussion of systems analysis today.
Now we’ll dive right into Strachey’s Checkers Program itself.
The Checkers Program
Here is the code as presented by Strachey on page 117. I have done my best to copy the original verbatim (including copying the inconsistent usage of spaces around punctuation):
ChosenPosition(P) = value of
§ let L = LegalPositionsFrom(P)
if Null(L) then Resign
let (p,v) = BestPosition(NIL, - ∞,L,0)
result is p §
BestPosition(P,V,L,d) = Null(L) → (P,V), value of
§ let (p, l) = Next(L)
let v = - PositionValue(p,d + 1)
result is (v > V) → BestPosition(p,v,l,d),
BestPosition(P,V,l,d) §
PositionValue(P,d) = Terminal(P,d) → TerminalValue(P), value of
§ let L = LegalPositionsFrom(P)
let (p,v) = BestPosition(NIL,- ∞,L,d)
result is v §
LegalPositionsFrom(P) = value of
§ let L = RemainingPositionList(P,Capture,5)
result is Null(L)→RemainingPositionList(P,NonCapture,5),L §
RemainingPositionList(P,C,s) =
PartialPositionList(P,C,s,FindMoveList(P,C,s))
PartialPositionList(P,C,s,L) =
Null(L)→( (s = -5)→NIL,
RemainingPositionList(P,C,NextMoveDirection(s) ),
value of
§ let Φ = SingleDigitFrom(L)
let Ip = MakeMove(P,C,s,Φ)
let l = (C = Capture)→CaptureTree(Ip),
FinalPosition(Ip)
result is Join(l,PartialPositionList(P,C,s, L - Φ))§
NextMoveDirection(s) = (s = 5) → 4, ( (s = 4) → - 4, -5)
FindMoveList(P,C,s) = value of
§ let (X,Y,K,σ) = P
let Empty = ~ X ∧ ~ Y ∧ Board
let ψ = (C = Capture) → (Shift(Empty,σs) ∧ Y), Empty
let Φ = Shift(ψ, σs) ∧ X
result is (s > 0) → Φ, Φ ∧ K §
MakeMove(P,C,s,Φ) = value of
§ let (X,Y,K,σ) = P
let ψ = (C = Capture) → Shift(Φ, - σs), NIL
let θ = (C = Capture) → Shift(ψ, - σs),
Shift(Φ, - σs)
let Xk = Null(Φ ∧ K) → (θ ∧ LastRows), (θ - Φ)
result is ((X - Φ + θ), (Y - ψ), (K - ψ ∧ K + Xk),σ,θ)§
FinalPosition(Ip) = value of
§ let (X,Y,K,σ,Φ) = Ip
result is (Y,X,K, - σ)§
CaptureTree(Ip) = value of
§ let L = PartialCapturePositionList(Ip)
result is Null(L) → (FinalPosition(Ip) ),
CombineCaptureTrees(L) §
PartialCapturePositionList(Ip) = value of
§ let (X,Y,K,σ,Φ) = Ip
let P = (X,Y,K,σ)
result is MinList(PCP(P,Φ,5),PCP(P,Φ,4),
PCP(P,Φ ∧ K, - 4), PCP(P,Φ ∧ K, - 5) )§
PCP(P,Φ,s) = value of
§ let (X,Y,K,σ) = P
let ψ = Shift(Φ, - σs) ∧ Y
let Empty = ~ X ∧ ~ Y ∧ Board
let θ = Shift(ψ, - σs) ∧ Empty
let Xk = Null(Φ ∧ K) → (θ ∧ LastRows), (θ - Φ)
result is Null(θ) → NIL,
((X - Φ + θ),(Y - ψ),(K - ψ ∧ K + Xk),σ,θ)§
CombineCaptureTrees(L) = Null(L) → NIL, value of
§ let (Ip,l) = Next(L)
result is Join(CaptureTree(Ip),CombineCaptureTrees(l) )§
Representing Positions
Strachey uses a clever representation for board positions, one that goes back to Arthur Samuel’s 1959 program and article. He numbers the positions as follows:

Note that this representation is not dense; some numbered squares (13, 22, 31) are not on the board. The advantage of this approach is that every square is either 4 or 5 positions away from its diagonal neighbors. If he had used the standard checkers notation, with squares numbered 1 to 32, then some squares would have neighbors 4 or 5 positions away, and some 3 or 4.
Given this numbering, a board position is a four-tuple (X, Y, K, σ), where σ is the player to move, and X,Y and K are bit patterns with a 1 in the square number where there is a piece. The X bit pattern represents the player’s pieces. Y represents the opponent’s pieces; and K represents the kings (of either player).
A Translator for CPL
Strachey didn’t have a CPL compiler when he wrote the article in 1966. The first CPL compiler emerged around 1970, and by the 1980s they were gone. So I had to create my own before I could run and debug the checkers program. I had three choices to make: (1) What CPL syntax is legal and what does it mean? (2) What parser-generator system should I use to create the CPL translator? (3) What language should I translate into? A full compiler—a translator into machine language—is a rather complex project, but translating into a high-level language that is very similar to CPL is much easier.
(1) Most of CPL is straightforward, but I got some help by reading the 1963 and 1968 articles on CPL. I learned that “A block consists of one or more definitions followed by one or more commands, the whole being enclosed within section brackets.” The checkers program, however, does not follow this definition of block, because in the first function definition, ChosenPosition, we have a block that has a definition (let L), followed by a command (if Null(L)), followed by another definition. According to the 1963 paper, we should have a second block for this second definition. So either Strachey made a mistake in the checkers program or he was working with a different version of CPL then. I decided to accept definitions anywhere, not just at the start of blocks. That means that “value of” and “result is” are no-ops; the result of a function is always the last expression. I’m not sure if that was intended for CPL, but this interpretation works for all the code in the checkers program.
Another thing I learned from the 1963 paper is that CPL has two kinds of variable names: small names, which are single lower-case letters (or Greek letters, I presume from the checkers program), and large names, which start with an upper-case letter and continue with any letters. Small names need not be delimited by spaces, and it seems that implicit multiplication is allowed, so “σs” is equivalent to “σ * s.”
(2) For the parser generator, I settled on YAPPS2 (Yet Another Python Parsing System), mostly because the author of the system, Amit Patel, is a friend who sits nearby, so if I ran into trouble I knew I’d be able to walk down the hall and get help. (It turns out his system was very easy to use, and I didn’t need any help. Good job, Amit!)
(3) CPL is a flexible mostly-functional language, so it seems that Lisp would be a good target language. But CPL uses infix notation; I’d have to get all the operator precedence right to generate Lisp code with the parens in the right place. So I decided to generate Python code instead, and to ignore the problem of operator precedence: I would translate the CPL code “(K – ψ ∧ K + Xk)” into the Python code “(K – psi & K + Xk)” and hope that the operator precedence just works out. But there is one problem in using Python: it has a strict separation between expressions and statements (statements cannot appear inside expressions), and CPL does not have this strict distinction. Therefore, I generate code in a subset of Python that uses only expressions. For example, given the CPL function:
LegalPositionsFrom(P) = value of
§ let L = RemainingPositionList(P,Capture,5)
result is Null(L)→RemainingPositionList(P,NonCapture,5),L §
the most straightforward translation to Python is:
def LegalPositionsFrom(P):
L = RemainingPositionList(P, Capture, 5)
return (RemainingPositionList(P, NonCapture, 5) if Null(L) else L)
That would work fine for this function. But in general, a CPL block can be an expression embedded inside another expression. Python statements (such as “L = ...” and “return ...”) can’t appear inside expressions. So we won’t use them; instead we will translate a CPL block into a Python lambda, which we will then immediately call, using the syntax “(lambda …)().” No arguments are passed in this call to the lambda (because it is just a block of code); we can, however, introduce new local variables within the block by defining optional arguments to the lambda expression, with their default values. So we have:
def LegalPositionsFrom(P):
return (lambda L=RemainingPositionList(P, Capture, 5):
(RemainingPositionList(P, NonCapture, 5) if Null(L) else L))()
Notice that there is one exception to the “everything is an expression” rule: every function definition is a statement of the form “def f(x): return expr.”
The 1963 article also made it clear that CPL has a logical data type that is distinct from the integer data type. In languages like C and Python, the bit pattern 0b1111 is identical to the integer 15; they are both of type int. But in CPL the two are different. I implemented the CPL logical data type with a Logical class in Python. The reason I need this is that the expression int(0) - int(1) should be -1 (because here the minus sign is ordinary integer subtraction), but Logical(0) - Logical(1) should be Logical(0) (because here the minus sign is the logical-bitset-difference operator). The Logical class has the usual set of operators (and, or, not, etc.) and has a constructor that takes as input either a bit pattern or a list of set bits. Thus, we can use either Logical(0b1111) or Logical([0,1,2,3]) to represent the logical bitset in which the four rightmost bits are true.
With those choices made, I was ready to write the grammar of CPL in the YAPPS formalism, with the translation into Python. The original code uses Greek letters and math symbols (§,V, ∧, →); I chose to use HTML entities for these (rather than Unicode). You can see the grammar in the file cpl.g.
The Rest of the Checkers Program
Scientific American limited Strachey in his page count, and so didn’t show the whole program. He left out some of the basic primitives of CPL (such as Shift, which does a logical shift of a bit string) and some of the parts of the checkers program that didn’t seem crucial (such as printing the board). You can see these missing parts in the file checkers.py or as an annotated web page.
Debugging the Checkers Program
Here are the bugs/misunderstandings I discovered as I implemented the CPL translator and checkers program:
Overall, this is a very impressive piece of software for 1966. Other than the trivial missing parenthesis (and I’m thinking maybe that was the fault of the Scientific American typesetter, not of Strachey), there is only one real error in the code, with the handling of kings. (And it is possible that Strachey’s expression is in fact correct, but I have the wrong impression of what he meant.)
The correspondence between software and documentation is sloppy, however. The number of arguments on functions is documented wrong; not all documented functions are implemented; there is confusion between what is a list and what is a string of bits. If only Strachey the analyst and Strachey the programmer had iterated a few times, the article would have been even more of a work of beauty!
Add a Comment
You must log in or register as a ScientificAmerican.com member to submit a comment.
“checkers has about 1020 positions to evaluate”?
I trust you meant 10**20 (10 to the 20th), although a more precise estimate, not including kings, would be something less 3**32 which is about 10**15 (each of the 32 board positions can be either white, black or empty). Some percentage of the 3**32 board positions are illegal or impossible, so perhaps “only” about 10**10 valid ones. Toss kings into the mix – you now have 5**32 (10**22) positions – and you’re in the 10**20 range after tossing out the illegal ones.
As an aside, I wonder if there is a set of moves that allow black and white to fully reverse their starting positions, and thus would not have any jumps?
Link to thisIt was a curious article even at the time. A significant split between Systems Analysis and Software was (and still is) convenient for large projects, but the fashion for a complete split never really took hold except for large business projects. (The most curious aspect of the fashion in that form is that, contrary to expectations, it appeared to inhibit software reuse rather than enable it as might have been expected – but that’s another large nut that still remains to be cracked). Similarly, the over-emphasis restricting software design to “object orientation” is a fashion that came rather later.
Link to thisFortran’s lack of Loops as such is of course ‘only’ a documentation issue – a pain but not a limitation. But it is extremely surprising that a non-Fortran language should have eschewed loops, as Algol had loops since before 1960, and even pre-release versions of Basic (at that stage purely designed as a teaching language) had loops by 1961 when I first met this.
To my mind, the article was prescient in two major respects: acceptance that code would always need debugging, and (perhaps) in setting an exemplar for incomplete adoptions of known good practice.
On mmason’s aside:
Link to this“Reversing the starting position” requires co-operation, but is straightforward. As an example, you could start by swapping the pairs (two black, two white) at the right side of the board (as viewed by black).
Black keeps his pair as close to the edge of the board as possible, white brings his pair as near the centre as possible. Once these are swapped you can start on the next pair.