the read, eval, print loop

This page describes the procedure at the heart of the Lisp interpreter, the READ-EVAL-PRINT loop. While understanding is not strictly necessary to use little b, this material will help you make sense of the syntax, and give you a good foundation for Lisp programming. This section has a lot of technical details which you may wish to skip over for the time being. If so, skim this page and proceed to the next section where we get down to brass tacks and see some actual little b code.

Lisp is presented rather differently here than is usual. Certain "advanced" topics are deliberately presented first. We aim to demystify. The goal is to allow you to understand how a sequence of characters is converted, step by step, into computational meaning. This explanation would not be possible were the mechanisms of Lisp not so clear, simple and meaningful.

We only need to understand 3 transformations, each meaningful and useful in themselves: how text is converted to data structures (reading), how a data structure describes a computation (evaluation), and how the result of computation is converted back to text (printing).

Other languages hide the reading step and the data structures which represent code. In Lisp, however, we can control reading and manipulate code before it is executed. This enables a powerful technique called metaprogramming, in which complex problems are solved by writing programs which generate programs. Many other languages barely dream of such power. However, in Lisp this is not just possible - it's easy.

reading

Reading is a procedure for breaking up a sequence of characters into meaningful structures. It is implemented as a Lisp function, named READ, which consumes characters from a source and creates a corresponding data structure.α

READ produces two types of objects: lists and atoms. Lists are of fundamental importance. They are a simple, incredibly flexible data structure. Atoms are the rest of the computational universe: strings, numbers, symbols, you name it. If it's not a list, it's an atom.

The amazing innovation of Lisp is to use lists also to represent code. (Lisp is an acronym for LISt Processor.) Thus metaprogramming is a simple matter of manipulating lists, as we shall see later.

In the example, an average is computed of the variable VAL and the number 4. We will concentrate on the second line, the text "(avg val 4)" which describes a list containing 3 atoms: the two symbols, AVG and VAL, and the number 4.β First we will see how READ generates this list. Next, we will see how this list describes a computation (the average of VAL and 4).

the readtable The reader knows what kind of entity to construct based on information in a data structure called a READTABLE. The table associates a function (a reader function) with every character. READ reads a character, consults the READTABLE, and calls the associated reader function. As it does so, it continuously updates the position from which reading occurs. The next read position is always the subsequent character (never a previous one).

You can think of each character as an instruction which tells the reader to create a certain type of data structure. For example, the alphanumeric characters (A-Z, a-z,0-9) and several others (~!@$%^&*-_+<>) are associated with a function called a token reader. In the "average" example, when READ encounters the character r, it invokes the token reader. (We'll take up the issue of what happens when the initial parenthesis is read in a just moment.)

the token reader The token reader is a function responsible for creating either symbols or numbers. It collects all characters until it encounters a terminator, a character which signals the end of the token. Whitespace, parentheses and a few other characters are token terminators. It then examines the collected characters to determine whether they represent a symbol or a number. For symbols, it converts the characters to uppercase, and creates (or finds) a symbol object, a special Lisp data structure, with that name. For example, "AVG", "val", "+", "1+", "/ab/1" all result in symbol objects being constructed. "1", "3/4", "+1", "3.4E+12" are numbers.

In the example, "avg" becomes the symbol AVG, and "val" becomes the symbol VAL. Symbols are found or created based on the name read by the token reader. Since "val" was just read in the previous line, in "(defvar val 10)", when the reader encounters "val" in the second line, it finds, rather than creates, the symbol object named "VAL".

The token reader, like all other reader functions stops reading when certain characters (whitespace and others) are encountered. Each reader function terminates differently. For example, the string reader which is triggered when the double quote, ", is encountered, does not stop on whitespace, but continues until it encounters another double quote. On completion, a reader function returns a data structure representing the thing that was read.

the list reader During its operation, a reader function can transfer control to another reader function. In fact, this is just how lists are built. In the example, READ detects the left parenthesis and invokes the list reader. The list reader, in turn, repeatedly invokes the READ function until it encounters a right parenthesis. Thus on the second line, the list reader is invoked with the rest of the characters in the stream: "avg val 4)", it encounters a, which is associated with the token reader. The token reader, which stops at whitespace, returns the symbol AVG, and transfers control back to the list reader, which stores the result, and calls READ again. Thus, the token reader is called a total of 3 times, returning AVG, VAL and the number 4.

The list reader constructs a list which holds these objects; when it encounters the closing parenthesis, it finishes and returns the new list. The read step, which began with the open parenthesis at the beginning of the second line, is complete, and the REPL proceeds to the evaluation step. The figure below shows the details of how lists are constructed and represented.

list structure and syntax

This section explains how lists are represented. We will approach the problem from the perspective of the reader: how to construct a data structure from a sequence of characters. Developing a facility for visualizing and mentally manipulating list structures is very useful for effective Lisp programming. It takes some practice, but in time you will be able to "see" the objects implied by the syntax (as in the adjacent diagram) much like you can't help but see the words implied by the pixels of this page.

CONS Lists are made of data structures called CONSes or CONS3 pairs which are linked together in daisy chain fashion. These objects have two storage cells called the FIRST and REST - or "CAR" and "CDR" - of the list.4 The diagram on the right uses =R=> to show how the reader converts text to objects.

Unless a closing parenthesis is encountered, a new CONS is created each time a new item is read. The new item is stored in the CAR (aka FIRST) of the CONS pair. When "(1" is read, a CONS pair is created with 1 in CAR. The CDR (aka REST) will store a pointer to the next CONS pair, which will in turn hold an item in its CAR, another CONS pair in its CDR, and so on. The CDR always holds the rest of the list. See how the daisy chain is constructed? The reader creates CONSes and links them as it goes along.

NIL When a closing parenthesis is encountered, no new CONS is created; instead, a very special Lisp object, NIL, is stored in the CDR. In this context, NIL represents the end of the list. In fact, it represents the empty list. Conceptually, when the rest of the list is empty, we've reached the end of the list. If the list reader encounters a closing parenthesis before reading any items, NIL is returned. That is, () =R=> NIL. In fact, NIL is the only representation of an empty list; it is the empty list.

NIL is also a symbol. In fact, it is a constant symbol, and its value is itself. That is, NIL=>NIL (=> indicates evaluation). Branching logic in Lisp (i.e., the if special operator) treats NIL as false, and all other objects as true. All comparisons (>,<,>=,etc.) and any functions you will write return NIL to indicate a test has failed, or any other object to indicate true: a list, a number (even 0), an array, etc. The special constant symbol T (and T=>T), is used indicate the canonical boolean value true. A number of Lisp functions return a list representing objects which satisfy some property; when nothing satisfies the function, the empty list is returned, so these functions nicely double as predicates.

To summarize, NIL is both the empty list, a symbol which indicates FALSE, and the list terminator. You can test this for yourself using the type predicate, TYPEP. Both (TYPEP NIL 'SYMBOL) and (TYPEP NIL 'LIST) return T, a constant representing a boolean true value.

the dot When a dot appears in a list, after at least one item has been read, it indicates that the next item read should be placed in the CDR of the current CONS. In this context, it is referred to as "the consing dot". It is an error for a dot to appear as the first item in a list, or alone. Thus, (1 . 2), represents a single CONS pair, with 1 in the CAR and 2 in the CDR. (See the diagram). (1 2 . 3) represents 2 CONS pairs; the second contains 2 in the CAR, and 3 in the CDR. A series of linked CONS pairs in which the last CDR contains a non-NIL object is referred to as a dotted list because it is represented with a dot between the last two elements.

list functions Although we've looked at this process from the perspective of the reader, it is important to understand that lists can also be created using Lisp functions. In fact, this is how the reader itself creates them. LIST and LIST* construct a lists and dotted lists, respectively; CONS makes a CONS pair.

Note that reading (LIST 1 2) only creates a list containing a symbol and 2 numbers - like reading (AVG VAL 4) computes a list, not the number 3; evaluating (LIST 1 2) creates a new list containing the numbers 1 and 2. Let us now turn our attention to evaluation to see how this works.

evaluation

The result of reading is the generation of a data structure. In the REPL, this is referred to as a "form", a technical term which means that the data structure is to be evaluated. Evaluation is a procedure which treats the form as code to be executed; execution results in some object(s) (the value(s)) being computed. This object is said to be the value(s) of the form. For example, 1 is the value of (AVG VAL 4).

The point of evaluation is to substitute an object (the form) with what it stands for - its value. Symbols and lists are very important forms which may have values other than themselves, and thus represent means of computing something from something else. Every other type of object (e.g., numbers, strings, arrays, user-defined data structures) is considered self-evaluating.

self-evaluation Self-evaluating forms are the simplest - the value of such a form is itself. That is, the evaluation procedure does nothing but return the object that was read. For example 1=>1, "hello"=>"hello", etc. (I use the double stem arrow to indicate evaluation; I'll sometimes add a letter to indicate a different process - e.g., =R=> for reading. The left hand side corresponds to what is typed in at the command prompt, and the right hand side to the object printed out as a result.) Interestingly, early Lisp implementors toyed with the idea of alternate evaluation mechanisms for some objects like vectors, but this idea was eventually dropped (link).

symbol evaluation When a symbol is used to hold a value, it is called a variable. A value may be any object, even another symbol or the symbol itself (NIL and T for example). In the example, the symbol VAL is defined as a variable which stores the value 2. In other words, evaluating VAL will yield 2: VAL=>2. Nothing too complicated.

how a symbol is associated with a value

assignment The most straight-forward way to assign a value to a symbol is to use the SETQ special operator. As in (SETQ symbol value).

You will usually encounter the SETF macro, (SETF place value), which provides a generalized interface to a wide variety of value assignment operations. Under the hood, SETF uses SETQ when place is a symbol. The little b infix assignment operator, {place := value} is the same as the corresponding SETF form.

It is correct, though usually not required in the interpreted environment, to declare a variable before using it. This is done with the macros DEFVAR, DEFPARAMETER or DEFCONSTANTδ. These macros establish what is called a top-level binding for the variable, and assign a value to the variable.

binding A binding is "an association between a name and that which the name denotes", according to the Lisp hyperspec. You may think of it as a space (like the CAR and CDR of CONSes) where a symbol stores an object. The space where a symbol's value is stored is called its value binding.

However, unlike the CAR and CDR storage cells of CONSes, Lisp allows you to establish bindings to be temporary. Variables may be manipulated in a local context without altering values assigned in an external context. This is done with the special operators LET, LET*, MULTIPLE-VALUE-CALL, and PROGV or macros built upon them.

LET and LET* are probably the ones you will encounter most frequently. In the adjacent code, LET is used to create a value binding for a variable, X. Inside the LET block X may be manipulated, without affecting the value outside the block. Another type of binding which we will encounter shortly is the function binding, which you may think of as a separate storage area where a symbol stores a function.

We say that the symbol VAL is bound (to the value 2), whereas the symbol AVG is unbound. Trying to evaluate it would cause an error since it has no associated value. AVG does, however, have a function binding, which comes into play when the list (AVG VAL 4) is evaluated. We say that AVG is "fbound" to the average function. Next, we'll see how lists are used to represent code.

list evaluation List evaluation depends on the first element of the list, called the operator. (This is usually a symbol, but may also be a list called a lambda expression.) The rest of the elements of the list are called the arguments:

(operator ...arguments...)

The operator governs how the list is evaluated. There are three kinds of operators: functions, special operators, and macros, each specifying a different evaluation mechanism. The operator must be associated with a piece of code, via a function binding (discussed in the figure below). If not, an error is triggered.

how a symbol is associated with a function

Symbols may be associated with functions in much the same way that they are associated with values via value bindings. This association is called a function binding. It may be regarded as a storage location in the symbol where a function is stored. A symbol which has a function binding is said to be FBOUND, and is referred to as an operator.

definition The usual way to define a function is to use the DEFUN form:

(DEFUN name lambda-list body)

DEFUN is analagous to the DEFVAR and DEFPARAMETER forms used for defining a value bindings. Defun creates a top-level function binding, creating a block of code which is stored in the symbol's function storage cell.

binding Like value bindings, function bindings can be temporarily over-ridden. This is done with the Lisp special operators, FLET and LABELS. Like LET and LET*, these operators associate functions with symbols within a block of code.

The LABELS operator has the same syntax as FLET, with the additional feature that functions may call themselves recursively, as well as calling functions defined later in the LABELS block.

function form evaluation Lists beginning with a symbol (the operator) which names a function are called function forms. The evaluator uses a simple mechanism to compute the value: each of the arguments is evaluated in sequence from left to right, and the code associated with the operator is invoked with those values. Functions are of fundamental importance in Lisp, so it will be worthwhile taking the effort to understand how they work.

definition DEFUN is the workhorse which defines functions. You should become familiar with it. The AVG function from our example could be defined as follows. The DEFUN form associates the AVG symbol with a function object, code which performs the addition of two numbers followed by division by 2:

(DEFUN AVG (X Y)
  (/ (+ X Y) 2)

We use the function by evaluating the list (AVG VAL 4). The function evaluation mechanism dictates that arguments will be evaluated in left to right order, so VAL and 4 are evaluated in turn to yield 2 and 4.

DEFUN is a macro. Don't worry about the mechanism of how the list (DEFUN ...) is evaluated. For now, just concentrate on its meaning: a function is being defined. The symbol AVG is being associated with a block of code: (/ (+ X Y) 2). This is called the body of the function. Though there is only one form here, DEFUN allows any number of Lisp forms. The value of the last form is the result.

lambda list The second argument, (X Y), is called the "lambda list". It names variables which will hold the values of the arguments when a function is invoked. In our example, X and Y are bound to 2 and 4 when (AVG VAL 4) is computed. AVG is fbound to a function object created by DEFUN. To be clear about nomenclature, we say X and Y are parameters, VAL and 4 are arguments, and 2 and 4 are the values of the arguments. For functions, "the parameters are bound to the values of the arguments".

"Invoking the function" means that the parameters are bound to the argument values, and the body is evaluated, in a manner equivalent to:

(LET* ((X VAL)
       (Y 4))
  (/ (+ X Y) 2)))

evaluation order The forms in the function body are evaluated in order. For AVG, there is only a single form, (/ (+ X Y) 2). The symbol / is also a function, so the arguments (+ X Y) and 2 are evaluated in left to right order. The symbol + names another function (like /, this is also defined by the Lisp environment), so X and Y are also each evaluated in left to right order, and the addition function is invoked with input of 2 and 4, giving 6. Note that the entire addition is completed before evaluating the second argument of /, the number 2. Numbers are self-evaluating, so 2=>2. In the final step the / function is invoked with 6 and 2, yielding 3. A function returns the value(s) of the last form of the body. So (AVG VAL 4) returns 3. The adjacent figure shows this sequence of steps.

semantics Notice that evaluation proceeds from the inner-most elements of the list to the outermost. That is, X and Y are evaluated before (+ X Y) which is evaluated before (/ (+ X Y) 2). This is called eager evaluation, inner tree reduction or call-by-value semantics. The arguments must be determined before the function is called. Other languages, notably Haskell, take the opposite approach whereby computations are deferred until the last possible moment. This alternate mechanism, called lazy evaluation, outer tree reduction or call-by-name semantics, provides some elegant solutions to problems involving recursive data structures, though computational efficiency is compromised.

Each mechanism is incomplete for practical purposes, and requires some lazy or eager facilities, respectively to make things work. The IF special operator, which we will see in a moment, is a good example of a lazy mechanism in Lisp: arguments are evaluated on if a test succeeds. All this is unimportant for the practical goal of using Lisp, but is a good example of how choice of mechanism lies at the heart of language design. Macros and special operators provide complementary capabilities to the strict eager evaluation mechanism of functions.

computation as substitution The function evaluation mechanism is simple, yet enormously powerful. It is based on a fundamental model of computation called the lambda calculus, which describes rules for substitution.

Substitution of one thing for another is arguably what computation is all about. You may think of Lisp as a series of substitutions: text is substituted with data structures which represent code (reading), code with other code (macroexpansion), code with values (evaluation), and finally objects are converted to text and output (printing).

The lambda calculus is a surprisingly deep mathematical result. One practical and powerful technique it makes possible is higher-order programming, whereby functions can return other functions as values, a topic we won't discuss here.

AVG is defined in terms of other Lisp functions. However, there is nothing that prevents them from being defined in some other language. In fact, the + and / functions might well be described in machine code for sake of efficiency. In principle, the body of a function is opaque to the user. What matters is that arguments are evaluated from left to right, and one or more results is returned.

special form evaluation Lisp's function evaluation mechanism, though powerful, is incomplete. And it is for this reason that Lisp has special operators. There are few of them; and each is designed to add some specific capability that could not otherwise be achieved using other mechanisms of the language.

When the symbol at the beginning of a list is defined as a special operator, a special syntax or evaluation mechanism (or both) is used. This is defined on a case-by-case basis in the Lisp spec. It is not possible for the user to define new special operators - they are part of the underlying language definition. We have already seen an examples of a few special operators - LET, and SETQ earlier. The complete set is shown in the adjacent figure.

Branching logic also cannot be described using function semantics. For this reason, IF is also a special operator. It is written (IF test then else), where ELSE is optional. IF always evaluates test. If test is true, then is evaluated. Otherwise, else is evaluated. There is no way to achieve this with the mechanisms of function evaluation, so IF must be defined as a special behavior of the language.

Another very important special operator is the QUOTE operator. Quote takes a single argument which it prevents from being evaluated. The quote reader macro provides a shorthand for writing quoted forms using the ' character: 'X is converted to (QUOTE X). For example, (QUOTE MYVAR) => MYVAR, and (QUOTE (* 1 2)) => (* 1 2).

macro form evaluation Macros allow you to substitute a form with another form before the value is calculated. This procedure, which I denote with =M=>, is termed macro expansion, and may be defined for both symbols and lists. The former is termed a symbol macro and the latter a compound macro, or just "macro" (since symbol macros are rarely used).

symbol =M=> form => value
(operator ...args...) =M=> form => value

Note that the form computed by macro expansion could itself be a macro form, in which case it would need to be macroexpanded in turn before the value could be calculated.

Macros are defined much the way variables and functions are. In fact, it is even possible to have local macro definitions (analogous to local function and variable bindings with LET and FLET, for example). However, an explanation really requires space of its own, so we'll deal with macros on the next page.

printing

The result of evaluation is some value, a Lisp object, which could be a number, a string, a list or other data structure. When evaluation of a top-level form (one that is directly processed by the REPL) completes, the result is printed. Results of non-toplevel forms (i.e., subforms of top-level forms, or the forms comprising the bodies of functions) are not automatically printed. Printing is the simplest of the three parts of the REPL. Each object has associated with it a function which determines what text to display. The output is sent to a place called the standard-output. If the REPL interacts with a command line, the value is printed right after the user's input. When the REPL is processing a text file, there's usually some standard location that the results are dumped to (e.g., an output window).

One of the principles of Lisp printing is that, if it's possible, the printed text, when read (i.e., by the reader) would produce a similar object. For instance, evaluating (LIST 1 2 3) produces a list of three elements, printed (1 2 3). If the reader were to read "(1 2 3)", it would produce a similar (but not identical) object. The object which was read is not identical because in this case the CONS pairs which make up the list would actually be different objects. In some cases, objects have no READable representation - that is, there is no reader syntax which could be used to produce the object.

An example of this might be the READTABLE object, which can be printed by evaluating READTABLE. In this case, an "unreadable" form is printed. This is text beginning with #< and ending with >. On Lispworks and CLisp, the readtable object prints out something like #<READTABLE 201076B4>. The hexadecimal number is some information identifying the particular object, perhaps its memory address.

read similar printing Readable objects are things which the reader could generate. Symbols are printed as the uppercase text which would produce the symbol. Lists as the text which would produce a similar list. "(list 'a)" evaluates to a list containing the symbol A, which is printed as (A). I refer to this type of print semantics as "reader-similar", since the object is similar to to what the reader would produce if it read the printed representation of the object.

eval identical printing Little b objects, called concepts, follow a different printing philosophy, which is called eval identical printing. This means that the printed representation of the object is text which if read, then evaluated would produce the same object.

As an example, the eval identical printing of the symbol A might be 'A or (QUOTE A) (as opposed to just A), since evaluating 'A or (QUOTE A) would produce the symbol object, whereas evaluating A would return the value of the symbol. There is no way to define eval identical printing for lists, since the CONS pairs representing the list will always be different, each time text is evaluated.

Eval similar printing is another possibility - like how Lisp provides read similar printing. For example, the list produced by evaluating (LIST 'A) could be printed as (LIST 'A) or '(A) or `(A), etc. Observe that evaluating any of these forms would produce another list containing only the symbol A. The new list has the same structure, although it is consists of a different CONS pair object. For this reason we say it is similar, but not identical/.

In order to preserve consistency with existing Lisp usage, little b does not implement eval similar/identical printing for symbols and lists. Currently, only concept objects are printed eval identically.

The value of eval identical/similar printing is that the computer prints back in code which can be used at the command line. The user and interpreter enter a dialogue in which they share the same language. Amongst other things, this means the user can examine and manipulate objects produced by the interpreter by cutting and pasting bits of printed text and evaluating at the command prompt. We'll cover this printing philosophy in more detail when we come to names.

summary: evaluation of forms

form type value examples
self-evaluating the form itself 1 => 1,
"ABC" => "ABC",
#(1 2) => #(1 2)
symbol the value to which the symbol is bound VAL => 10,
 when VAL is bound to 10
function
(fn ...args)
result of evaluating each argument (once only), and applying the function to the arguments (APPEND (LIST 1 2) (LIST 3 4)), (1)
 arguments are evaled in order:
 (LIST 1 2) => (1 2)
 (LIST 3 4) => (3 4)
 therefore,
(1) => (1 2 3 4)
special
(special-op ...args)
value, and whether arguments are evaluated, is determined on a case-by-case basis for each special operator (IF T 1 (ERROR "TEST FAILED")) => 1,
(IF NIL 1 (ERROR "TEST FAILED")) causes an error,
(QUOTE X) => X
macro
(macro ...args)
the form is subjected to macro expansion (=M=>), and the result is evaluated (=>). (WHEN (DO-TEST) (DO-X) (DO-Y) (DO-Z)) (1)
=M=> (IF (DO-TEST) (PROGN (DOX) (DOY) (DOZ))) (2)
=> result of evaluating (2)

Now, let's take a look at some little b code.

αThe Lisp reader is conceptually similar to readers for markup languages like XML and HTML. For instance, when a browser reads the sequence of characters "<html><body>...</body></html>", it creates connected data structures: an html structure which contains a body structure. The matching angle bracket tags mark the beginning and end of each structure. Lisp uses parenthesis - e.g., (html (body ...)) - rather than matching tags, which makes the syntax terser. And unlike markup languages, Lisp has a second phase of processing, called evaluation, in which the data is treated as code, and is executed. Thus, Lisp can be thought of as a data language and as a programming language because in Lisp, code is data.

βNotice that reading does not compute the value implied; that is handled by a later step called evaluation.

γCONS is derived from the word construct, and is also the name of the function which constructs CONS pairs.

δ Many other languages - Prolog, ML, Haskell, for example - use this approach for representing lists, and sometimes refer to the "head" and "tail", rather than the FIRST and REST or CAR and CDR.

6Paul Graham has written some excellent essays on Lisp and other topics. Beating the averages makes the case for why Lisp macros are unique in the world of computing - and always will be. His book, Ansi Common Lisp gives a lucid treatment of macros.

macros