Intro to Lisp
This is the third time that I rewrite the lisp page of my website from scratch. The first version was pretty good and fleshed out, but I felt it failed to convey why exactly I find a language that’s half a century old by now so exciting. I tried to fix that in the second version, narrating some of the historical background and listing its progeny, yet that felt closer to an excited rant than a meaningful argument (also, I never finished it). So, I try again.
Lisp is a programming language. It was first conceived in 1959, and implemented quickly afterwards. It is thus considered the second high-level language after FORTRAN. It is often called a ‘metaprogramming language’, because while it does very little, most of what it does is to make it easily extensible to solve any problem. Thus, it pioneered a lot of concepts that are now fundamental to how we use computers, and spawned a lot of projects that ended up shaping how we use technology. Listing the properties of the language is difficult, because it predates most technical terms that are in use to describe languages - in fact, many of them emerged specifically to distinguish different implementations. Because of this difficulty, I will use ‘Lisp’ interchangably - I will use it to refer to some general concepts regarding data/state/program execution that were introduced by the aforementioned paper from ‘59 (and the so-called 'lambda papers’ that expanded on their implementation), and as a catch-all for programming languages that implement those features (or derive from them) - including the one that I will be implementing in later chapters of this.
In short, my fascination stems from the fact that the aforementioned ‘metaprogramming’ aspect makes the language incredibly simple, but is also easily extended by various other concepts from within the language itself, allowing for an easy and flexible exploration of the possibilities of computing.
Since there are a lot of general topics to talk about, this series (yeah, it’s a series now) will be split in the following chapters (implemented as individual subpages):
- the introduction - you are here
- history of the language - with a long preface on the history of symbolic computation
- basics/fundamentals - aforementioned “general concepts”
- intro to functional programming (and various other paradigms) - how to use those basics to actually accomplish anything
- implementing an interpreter - ie a thing that can run that code, which is what started all this madness
- implementing a compiler - ie a thing that transforms our code into something that can run on hardware, requiring a much more in-depth analysis of what things are and how they work
So far, only the chapter on “fundamentals” has been finished, presented below.
How Lisp works
As noted in the intro, I use ‘Lisp’ to vaguely refer to a set of concepts and paradigms introduced in a series of papers throughout the last century, and a group of languages that implement those concepts (including one I made myself). This page outlines those concepts, along with some implementation details on my interpreter. It is split in sections regarding syntax, encoding, computation, environments/closures, and garbage collection.
Syntax
Lisp is a ‘functional language’. There’s much talk about what exactly that means and why it’s important, but in the original formulation, it meant that Lisp was made to model how mathematical expressions are formulated. For example:
x^3+2y^2
For us, this is a concise and usable notation, but viewed as a list of symbols on a page, it gets rather complicated to figure out what is what. For starters, let’s write the exponentiation as repeated multiplication (not quite equivalent, but multiplication is commutative, which helps make my point clearer):
x*x*x+2*y*y
Evaluating this is complicated by rules of precedence. For example, if we had only read x*x*x+2
, we could not perform that addition, and would have to finish reading the entire sequence until we can start working on it. Instead, let us reorder the sequence, and say that each operand operates on the two values before it:
x x x * * 2 y y * * +
This is called ‘Polish notation’ (long story short, there was this guy called Lukasiewicz who really hated parentheses and wanted a syntax that doesn’t need them), and works surprisingly well - many languages fundamentally use this syntax (Forth), Funge, PostScript, dc (one of the oldest programs!)), even synth by yours truly), as it is easy and efficient to implement. In fact, at a fundamental level, this is exactly how a CPU computes!
The reason it works so well is because it allows us to be specific about intermediary computations. With our original polynomial x^3+2y^2
, it didn’t matter which part we computed first, or how, or where we kept that computation - arguably, this ambiguity is what makes it so flexible, but also difficult to compute rigorously. Instead, polish notation uses a stack - a list of values that we either append values to, or take from the end of. x
puts the value of x
on the stack, *
takes the last two values and pushes their product, etc. That is also why we used *
twice, or how we could apply +
to two values calculated so far apart - when we call *
the first time, it takes the last two x
s and pushes the result, the second one takes that result and the other x
. When +
is called, the stack contains the 2y^2
on top, and x^3
below it. Complicated as it can be for us humans to keep track of which applies to what, this is significantly easier to work with if you go through the equations a single symbol at a time like a computer does.
We’ll do one more step. The above trick regarding ‘which applies to what’ only works if we know exactly how many arguments each function takes. This makes it convenient (and fast!) if we have a fixed set of functions as eg in a CPU, but it sure would be useful if we didn’t have that limitation and could work on arbitrary expressions. To Lukasiewicz' dismay, this is easily solved with liberal application of parentheses:
((x x x *) (2 y y *) +)
Now, instead of having to repeat *
for every additional argument, we enclose all its arguments in parentheses. This also has the side effect of making it a lot easier to read at a glance - an operand always comes before a closing paren, and everything until the matching opening paren are its arguments (a lot of editors (including the one I am using right now to write these nested sidenotes) automatically highlight matching parentheses).
Note: This allows us to multiply no numbers if we asked for (*)
. The result of this should be undefined, as it makes no sense in the mathematical system we are modelling, but for convenience, we’d rather have it do something meaningful. For ‘mathematical’ functions, we will have such expressions evaluate to their identity - meaning, multiplying a number by 1 or adding 0 to it doesn’t change the number, so (*)
is 1 and (+)
is 0. This idea of ‘have the default be something meaningful’ is mainly done to make them easier to implement - for many primitives (like lambda
), implementing such a ‘default behavior’ would either require a lot more code, or in many cases (eg defun
), it isn’t clear what this behavior would even be.
The last upgrade we will do is to flip the order - instead of the operand coming at the end of a parenthesized expression (also known as a ‘list’), we will put it at the front (referred to as ‘reverse Polish notation’):
(+ (* x x x) (* 2 y y))
And believe it or not, that’s Lisp syntax - all of it! Every function works like this, even the weirder built-in ones, and that last example is in fact a valid Lisp expression one could run in pretty much any Lisp implementation - most of them will happily complain that you haven’t told them what x
or y
is!
Encoding
I’d gladly tell you how we could inform the Lisp what x
and y
are supposed to be, but first I’d like to explain the more practical issue of how such lists are encoded - iow, how exactly would the above expression be laid out in the bits and bytes of the computer?
First off, the obvious solution - exactly the same. The 1989 IOCCC entry does precisely this - store the expressions exactly as they are entered, and operate on those. Once again, easy as it may be for us humans to parse that, it is not at all trivial to do in a computer, not to speak of efficiency.
To store expressions in a manner computers can work with conveniently, we will need at least two datatypes - one for lists, and one for all other symbols (in Lisp lingo, ‘atoms’). Some lispers parse numbers separately from nonnumerical atoms, some also differentiate lists by purpose - we’ll be keeping it simple.
The first order of business is figuring out how to differentiate the two. Internally, the computer knows only numbers - how does it know if a number is supposed to be a list or an atom? In my first implementation, I differentiated by size - if we can only store a thousand different lists, any number between zero and a thousand is referring to one of them, and all bigger numbers refer to other things (in our case, atoms). This works out well enough, but does not adapt well to other setups. If we changed the amount of available memory, we could not reuse our code. Even worse, what if we didn’t have a fixed amount of memory - what if, once we had a thousand lists, we magically acquired space for a thousand more (ya know, since malloc
exists)? Or much worse, what if our memory was not continuous - what if we stored lists and atoms in the same memory?
Memory (or rather, memory references, or if you’re being fancy, ‘pointers’) is still just numbers that tell us where to find what we are looking for. Take that number, multiply it by two, and if it points to an atom, add one. Now, if a number is even, it refers to a list, and if it’s odd, an atom - and we can divide it by two (rounding down) to figure out where the list or atom is. This is known as ‘bit tagging’ (since memory is binary, so the above is done by manipulating a single bit), and also trivially expands to multiple datatypes by using larder moduli (say, if we had four base types, we could multiply by four).
Next problem: how are lists and atoms actually stored? Let us start with atoms. Each Lisp atom is a sequence of characters. We could treat those as numbers - picolisp does just that - but numbers in a computer can only get so big. On most computer architectures, we can only encode up to four symbols in one number - and while it’s possible to write programs using words no longer than FUCK
, it is quite inconvenient. So instead, we use ‘pointers’ - each number refers to where the actual data is stored. This puts no limit on how long the data actually is - for practical reasons, we will say that each atom ends with a null byte. Another convenient side effect of this is that we do not need to store atoms more than once - in parsing the example expression of the last chapter, we use the atom x
three times, but only need to store it once. This also makes checking for equality easier, since instead of having to compare two atoms character by character, we could simply compare their references. This does come at the expense of having to remove duplicates when parsing the expression, but in practice this ends up being the more efficient choice (and yes I did steal that idea from SectorLISP - more on that linked in the section on garbage collection at the end).
And how are lists stored? As with all things computer memory, they will have to be made up of numbers (telling us where to find stuff) laid out one after the other. The obvious solution would be to, for each list, put down references to its contents one after another, and copying the atomic trick of using a zero to denote the end of a list (and making sure we dont put anything in the first byte of memory). This is called a ‘cdrcoded list’, and like before, the most obvious solution turns out to be wrong. While it takes very little memory and is very easy to traverse, it is very difficult to manipulate. Say we wanted to insert an element into a list somewhere. We’d have to shift all elements that come after it, which is a lot of effort for a single element. And not only that, but things are no longer where they were, so we’d have to go through all memory and update all references! All that for a single element!
Note: as implied above, cdrcoded memory is easy and efficient, as long as you don’t need it to be mutable. ExhiBotanist events are written in Lisp syntax, and this is precisely how they are actually encoded inside the final binary.
As anyone who’s done a bit of programming would know, the above is roughly equivalent to an array - laid out in memory in a line, fast to access, but tedious to insert in. Its opposite is the linked list - each element stores some value, and also a reference to the next element (or zero if there isn’t one - have a guess where this practice started). Since in our case the ‘value’ is a reference to an atom or another list, this would mean both elements are actually the same. In Lisp, this is known as a ‘concell’, its first element as the ‘car’, and the second as the ‘cdr’ - all named after the original assembly instructions used to first implement them on the IBM704. In fact, cons
, car
, and cdr
are still the functions you’d use to get those values in Lisp, and our first primitive functions!
This indeed allows us to implement lists - the car of a concell is the first element of a list, the cdr is the rest of the list (so the car of the cdr is the second element (if the cdr is a valid list), and the car of the cdr of the cdr is the third, etc etc - some dialects implement those as additional functions called cadr
, caddr
etc). This is also the main reason we decided to put the operand as the first element in a list - now, any expression to be evaluated is a list, the car of that list is the operand (usually a function name), and the cdr is a list of its arguments! More importantly, this also allows us to express arbitrary structures, not just lists - binary trees are trivial (a concell is literally a binary node), trees of arbitrary order are easy (eg for a tertiary tree, for each node, its car is its first child, the car of its cdr is the second, and the cdr of the cdr the third), etc. Inodes, used to store files and folders in a filesystem, function quite similarly.
Evaluation
A neat little side effect that ended up having profound implications for how Lisp works and how computing has been done ever since is that Lisp is ‘homoiconic’ - meaning, there is no difference between the programs that express computation, and the data this computation is done upon. [XXX etc on this]
Fundamentally, evaluation is about calculating the ‘value’ of an ‘expression’ (I mean, that’s why it’s called ‘evaluation’). To take the example expression from before, what is the value of (+ (* x x x) (* 2 y y))
? As we can guess (and any Lisp will gladly complain to you about), this depends on the values of x
and y
. In the mathematical syntax, this is expressed as f(x,y)=x^3+2y^2
, in Lisp we reuse principles of lambda calculus that Lisp was founded on, and use a primitive function called lambda
- it defines a function, the first argument to lambda
are the function’s arguments, and its second argument is the body (/expression) of that function. In our case, that would be (lambda (x y) (+ (* x x x) (* 2 y y)))
. And how would we call this function? Remember when we said ‘the first element of an expression is a function, and the rest its arguments’? This still holds if the first element is not an atom - ((lambda (x y) (+ (* x x x) (* 2 y y))) 4 5)
is a valid function application, and (in most lispers) evaluates to 114.
Note: lambda
thus allows us to specify our own functions (we’ll get to how to name them in the next chapter). These user-defined functions require evaluating an expression with so-called ‘free’ variables, as opposed to the ‘primitive’ functions I’ve mentioned multiple times before, which we have to hardcode into our interpreter and are not expressed as in-language constructs (usually - metacircular evaluators (mentioned below) are an exception to this).
Note: if we wanted to keep with the pattern of arithmetics, (lambda)
should return the identity function, ie a function that evaluates to whatever we give it as an argument. However, the result of no-argument arithmetics is undefined and we specified our values for convenience, while implementing this would require more code. So, if we tried to use a function with no arguments nor a body, we will simply crash.
But how do we get 114? How exactly do we figure out that x
has the value 4
? We took lambda
from lambda calculus, and likewise, our lambdas work based on substitution - we know x
is 4, so we can go through the body of the lambda and replace every x
with 4. Conceptually, that’s how Lisp computes, but there’s a couple relevant implementation details. First off, what exactly are we substituting? Say, if the 4 wasn’t a 4, but some other expression? Take the following:
((lambda (x y) (+ (* x x x) (* 2 y y))) (* 2 2) (+ 1 1 1 1 1))
The end result is still 114, but what is x
? Following the above, we would take the arguments and put them into the lambda body (this is known as ‘call by name’):
((lambda (x y) (+ (* x x x) (* 2 y y))) (* 2 2) (+ 1 1 1 1 1))
= (+ (* (* 2 2) (* 2 2) (* 2 2)) (* 2 (+ 1 1 1 1 1) (+ 1 1 1 1 1)))
Damn, that expression is so long, I didn’t even bother writing it and copy-pasted it from the previous one instead! This form is easier to argue about mathematically (implementations of the lambda calculus use it, for example John Tromp’s binary lambda calculus), but it should be obvious it is quite inefficient to work with. It would make so much more sense to just calculate that the first argument is 4
and the second is 5
and just use those, no? (note: called ‘call by value’)
As it turns out, neither option really works. Calling by name is not only inefficient, it doesn’t let us compute anything - a +
would not be able to add numbers, only substitute them. Clearly, at least some expressions (say, the primitive ones) would have to be called by value. On the other hand, if we called by value, we would be evaluating every argument before calling a function. This might not immediately seem like a problem, but it makes conditionals impossible - we cannot have code that is not evaluated (which also defeats the whole purpose of homoiconicity - we cannot have programs that modify other programs, since that requires the ‘other programs’ to not be called while we operate on them). So clearly, some expressions have to be called by name.
So what’s the minimum amount of work we have to do here? We could call everything by name, and have only one function that actually evaluates its arguments - let’s call it eval
. It just so happens that to implement eval
, we’d have to implement the entire language in it (since the completely substituted expression is, in fact, an expression of our language). This is how many Lisp implementations are conceptualised (technically called a ‘metacircular evaluator’ - William Byrd has done some cool work with those), but it doesn’t actually solve our problem, it just puts another layer of abstraction on it.
Could we instead call everything by value, but have a single function that does not evaluate its argument(s)? Let’s call it quote
, since in literature, quoting is the act of inserting other text without explicitly making it a part of its surroundings. This actually works out pretty well - so well, in fact, that many implementations add a shorthand for quote
, so instead of (quote + 2 3)
we could also write '(+ 2 3)
. Please note how this creates a list that is not evaluated as a function - in essence, encapsulating the difference between code and data. We will not be implementing that apostrophy, though, as it makes parsing much more complicated. For convenience, however, we will also call some other primitives by name - we already have to make our code be capable of distinguishing a special form where it does not evaluate arguments, what’s a couple more? For example, it would be quite annoying to quote the body of a lambda
every time, or to specify that the conditionals of cond
should not be evaluated unless their conditions are met. There’s been a lot of debate regarding which primitive functions should evaluate their arguments and which shouldn’t, and all the side-effects that follow (aptly started by a paper called “CONS should not EVAL its arguments”). The ‘main’ side effect is that we don’t compute values unless we need them, but that gets complicated - more on that below under efficiency.
Note: we could, of course, also go a bit apeshit with now we denote and interpret arguments, and have each argument of every function explicitly note whether it is evaluated or not, or even whether it’s treated as an individual argument or part of an argument list (which was the inspiration for C’s varargs
and how main
handles command-line arguments). If this sounds like your cup of tea, you might like how picolisp handles arguments (and also being a low-level Lisp in general). ‘Varargs’ is actually probably the easiest way to implement this, ie ‘a function gets a list of arguments, which we will thus treat as a list named args
’, however there isn’t much to this beyond a neat conceptual model that is occasionally useful, ie if we want functions that accept an arbitrary number of arguments, like the arithmetics. We will not be implementing anything for this - if you want a function that can take any number of arguments, write a function that takes one argument, and make that argument is a list.
Note: ah, speaking of picolisp, we will copy its behavior that quote
returns all its arguments, and not just the first one (as you might have guessed from the above examples). This is quite unusual for lispers and will require some modifications to code you copy from stackoverflow, but it elegantly lets us quote more complicated expressions, and also means that a no-argument (quote)
returns nil without needing any special handling.
Speaking of cond
, this is how we implement conditionals. We could have used if
(in (if X Y Z)
, X
is evaluated, and if it evaluates to anything but nil, the entire expression evaluates to Y
, otherwise to Z
) or pattern matching (which Haskell took and went wild with), but both turn out to be easily implementable with cond
(implementing cond
with either of the others is possible, but a good bit more tedious). cond
takes a list of statements. For each statement, it evaluates its car
, and if it isn’t nil, the entire cond
evaluates to that expression’s cdr
, otherwise it continues with the next statement.
Note: in case ‘if it isn’t nil’ didn’t imply so, nil
is also a primitive function that returns nil, ie a zero byte, ie ‘false’, if it’s argument isn’t nil, and a truth value otherwise. It serves as the binary ‘not’ operator. Anything besides nil is considered a true value, though for clarity we might want it to specifically return the atom t
, which if treated as a variable has the value t
.
Regarding efficiency: call-by-name might seem useless and inefficient (since we have quote
for metaprogramming, and everything else just makes expressions longer), but it lets us optimise code in ways otherwise impossible. Take the humble for
loop, in Lisp usually implemented as a macro (a macro is a piece of code that doesn’t do any computation, but restructures other code). A for
loop that iterates over a thousand integers would initialise a loop variable, evaluate some user code, increment the variable (add 1 to it), check if it matches something, and if it doesn’t, it loops again (I am not giving an example here, since this would require imperative operators (‘set variable to value’) that we do not have). Notably, every iteration requires us to increment the variable - thus, over and over, we call (set var (+ var 1))
. The +
might also be polymorphic, meaning we actually have multiple primitives, and choose which one to call depending on the arguments (which also makes no sense for us, since our language is not typed, but it is easy to imagine a dialect where we might want (+ 2 3)
to evaluate to 5
and (+ a b)
to evaluate to ab
- in which case we’d have one primitive for arithmetics and one for strings, and choose which one to use depending on the arguments). If we called the for-loop (specifically, the macro, ie without any user code inside) by name, it would result in a significantly larger expression, but in this expression, we could remove the polymorphic addition (since we now know the arguments will always be integers), and could even optimise the addition itself (since we now know the second argument will always be 1
, and most computer architectures have special codes for this case).
We could have done this by hand, of course, but that defeats the purpose - we would then need a primitive for every conceivable function, in which case what’s even the point of defining our own functions? The above optimisations require an optimiser, sure, but they can operate on all aspects of the language. For example, aforementioned ‘lazy evaluation’ means that when we pass an argument to a function, we don’t actually bother to calculate its value until needed - we could also combine this with memoization, ie storing that computed value, so we don’t have to compute it again next time - thus avoiding the inefficiency inherent in calling by name. More interestingly, since we do not actually compute values until we need them, we could use this to implement infinite structures - in Haskell (which, friendly reminder, is a Lisp dialect), [x*x|x<-[1..]]
is the list of all square numbers, all infinitely many of them. [a..b]
is a list of numbers from a
to b
- but, since we didn’t give it a b
, it would keep going indefinitely. The rest says that we should take numbers from that list, square them, and make a new list from those squares (the page on ‘functional programming fundamentals’ explains how and why map
works in more detail). Usually, this wouldn’t be possible, since we’d first have to finish making that infinite list, but Haskell is ‘lazy’ in that it remembers how to make that list, but doesn’t actually do so until it has to.
Environments and closures
As noted above, substitution is how Lisp works semantically, and there are a couple implementation issues, of which evaluation of arguments is one. The other (main) one is that substitution is just too damn inefficient. We want to calculate the value of an expression, so first we have to go through that expression and modify it. But also, the very act of ‘going through the expression’ requires us to call some other code, which in turn needs our expression to be substituted into it, which in turn requires calling some other code, et cetera ad absurdum. And even if we got around that - substituting something in an expression would require us to create another expression which takes at least the same amount of memory - in other words, we’d be needing more memory for every single evaluation step!
All the way back at the beginning of the chapter on evaluation, I wrote that the purpose is to calculate the value of an expression. We figured out how to deal with compound expressions like (+ (* x x x) (* 2 y y))
, but 2
by itself is also a valid expression - how would we ‘calculate’ that its value is the number 2? Or that *
denotes the multiplication primitive? Or that (inside the lambda of our examples), x
has the value 4? This is done via environments - instead of substituting variables for their values, we keep an environment that lists all variables and what their values are. Thus, when we evaluate the lambda, we create a new environment, in which we bind x
to 4 and y
to 5, and evaluate the body of the lambda inside this new environment.
But wouldn’t this have the dreadful implication that we’d have to supply a definition for every number, every possible atom, in order to know that 2
evaluates to 2? Not really - we will say that any atom (here functioning as a variable name) that we cannot find in the environment simply evaluates to itself (aptly called self-evaluating forms#Self-evaluating%20forms%20and%20quoting)). Once again, this is not how most Lispers handle this - usually, undefined variables should evaluate to nil. However, we are already mangling the language a bit by not differentiating between ‘symbols’ and ‘numbers’ - ((lambda (2) (* 2 2)) 3)
is a valid expression in our language, and happily returns 9
. In addition, the primitive functions are not too special either, which usually has more disastrous consequences. For a more clever implementation (like my in-progress lisp system, that I hope to remember to link in the last chapter when I finish it), we might want numbers to be self-evaluating, and have functions be bound to numbers, which in turn tell us where to find the compiled code of that function. Unfortunately, this only works if we are actually dealing with compiled code. So instead, we will have all atoms be self-evaluating, and we’ll check for primitives manually.
Environments as explained above have two more implications. First, we have a global environment that encompasses all others. If we define a function factorial
, this would be the environment where we bind that name to the lambda of the function (remember how we described lambdas as ‘anonymous functions’?). This definition of a function is done with the defun
primitive (and people say naming things is hard). The other is that environments are recursive - when the factorial function calls the *
primitive, that primitive (being a function) is evaluated within an environment. That environment is a part of the environment of the factorial
function, defining its arguments, and the factorial
environment is in turn a part of the global environment. This can, as most things in life, also be accomplished with a linked list - the environment is a list of variable names and the values they have, ended off with a reference to the environment above it (or nil for the global one). Traditionally, lispers are a bit cleverer with this and limit which environments a given expression has access to, so eg the +
primitive would not be able to access the arguments of factorial
. As with argument evaluation for primitives, this can have some interesting consequences, but we won’t be bothering with that. Likewise, we will let defun
bind any symbol to any value (ie it evaluates its arguments, but assumes the first one will evaluate to an atom), and we make it operate on the global environment only. This will disappoint a lot of programmers used to the ‘classical’ definition where we can use it to modify any environment, however that seems a bit redundant, since we already have a tool for that - it’s called ‘function arguments’.
A more interesting behavior arises when we combine this with metaprogramming, ie functions that operate on functions. The function (lambda (x y) (+ x y))
takes two numbers and returns their sum, and if we evaluated ((lambda (x y) (+ x y)) 4 5)
we’d get 9
. However, what about (lambda (x) (lambda (y) (+ x y)))
? It accomplishes the same, but instead, it is a function of one argument that returns another function that takes one argument, and that second function then gives us the result. This is called ‘currying’, and is another neat trick Haskell uses to optimise code, in part because it works great with the ‘pass all arguments as a list’ that I mentioned before, and because it means that you’d only have to figure out functions of single arguments. Sidenote, currying is named not after the dish, but after the mathematicial H. Curry, who came up with a lot of neat tricks regarding lambda calculus - and his first name was Haskell, which the language is named after.
But back to the problem - how would we evaluate this? If we think of lambdas as being implemented via substitution, we can see it should work, but how do we accomplish the same with environments? The trick is in what lambda
does. It defines a function, yes, it denotes what its arguments and its body is, but it also stores its environment. In other words, when we call (defun sum (lambda (x y) (+ x y)))
, the defun
evaluates its arguments, and when the lambda
is evaluated, the resulting object also contains a reference to the environment said lambda
was evaluated in. Thus, even if we redefined what +
was later, when sum
is called, its body is evaluated in an environment that still binds it to the original meaning.
That sounds complicated. Let us do an example instead.
(defun x 5)
This sets x
equal to five. In other words, it creates a binding in the global environment that declares that the value of the atom x
is the atom 5
.
(defun add5 (lambda (y) (+ x y)))
This defines a global function called add5
that takes an argument and adds x
to it. Note that we do not pass x
as an argument - it is looked up in the environment.
(defun x 10)
This sets x
equal to ten. In other words, if we now asked our lisper for the value of x
, it will look it up, find the binding we just created, and return 10.
So what will (add5 5)
evaluate to? Surprisingly, 10
! When we defined add5
, it evaluated the lambda expression, resulting in a function. Cleverly, the lambda expression made sure that function included a reference to the global environment that contained the binding of x
to 5, and is able to use that binding even if we redefine it in the global environment!
Garbage collection
I was torn on whether I should put this here or in the implementation chapter that follows, but the text regarding closures settled it. I phrased defun
as “creating a binding in the global environment”. Importantly, this still happens even if a binding for a variable already exists. This is precisely how closures can reference the values of variables that have been overwritten - their bindings still exist, but are just overshadowed by later bindings. However, if we don’t actually make a closure, this has the side effect of creating bindings that we don’t need and can’t use - one of various kinds of ‘memory garbage’). Likewise, intermediary values, for example as created by cons
testing its conditionals, will remain in memory even though they are no longer in use.
Creatively, the act of cleaning up this useless memory is called garbage collection). Ideally, we’d rather not do it, as it takes time and can be unpredictable, which is why most fast/low-level programming languages) require the user to handle memory manually. However, since we’ve taken a more functional approach (we don’t really have the concept of ‘memory’ coded inside the language - though the chapter on functional programming explains how we can introduce it), we cannot throw out our own garbage, and require a collector.
Well, not really. We can also just not deal with it. But just like in real life, it will accumulate, clog up our processing, and eventually lead it to ruin. Which does not matter for small programs or prototyping, hence its implementation will be optional.
For starters, we’ll need something more specific for memory allocation. Until now I haven’t brought it up, and the assumption has been that we have a pointer that tells us where free memory begins, and if we need more memory, we can just take some - and if there isn’t any left, the program crashes. With garbage collection, we can recycle memory (unfortunately not a technical term), however this newly freed memory won’t be all in one chunk, but spread out throughout all memory we have access to. So instead of a block, we will keep a linked list of free memory cells - since memory will be organised in concells and we have to use functions that operate on those, we might as well. Thus, if we want memory, we take elements from that list, if we free some, we can put it back on it.
But how do we actually figure out which memory we can recycle, and which is still in use? It just so happens that all the memory we are using is either linked in the current expression, or in the global environment - since, by definition, we are not using anything else. We can also be clever and wait until we have finished computing all expressions (eg after we have printed results in a REPL, or between elements in a file) to collect garbage, making our job a bit easier. In either case, we have a pointer to something that links to everything else we care about (eg the current expression, which in turn links to its environment (the surrounding expression), which link to their closures, etc etc). We can then recursively follow everything it links to, ‘marking’ it as we go, which involves setting a bit somewhere in the pointer we wouldn’t otherwise use. This way, if we are to follow a pointer that is already marked, we know we have already been there, and don’t fall into a loop. Once we’ve marked everything we can access, we go through all available memory. All entries that are now untagged we can put in the free list to be reused, all tagged entries we untag to make sure the pointers are correct once again. This is fittingly called a ‘mark-and-sweep’ garbage collector.
This doesn’t really solve the issue of the shadowed bindings. In practice, this isn’t much of a problem (since we’ll tend to write code that doesn’t make a lot of those, and all intermediary values will be cleaned once the expressions are terminated, since those expressions are then no longer linked to - isn’t it convenient how we decided to instantiate temporary variables as function arguments, and it automagically solved this issue for us?). If we really wanted to tackle it, we’d need an extra pass between the mark and sweep phase where we go through the environments only and check whether bindings are shadowed via a temporary list (that we can take and return to the free list as needed - we don’t update that one until after this step). Though, I do admit that I cannot elaborate much on this, as I’ve never had the need to actually implement this.
A ‘mark-and-sweep’ garbage collector is a kind of tracing garbage collector, since it traces references to values (in the ‘mark’ phase). It is neat in that it is pretty easy to implement, but it can be rather clumsy - Java uses something similar (specifically, a generational tracing gc - in addition to the above, it also sorts references by age, since new ones tend to expire quicker (eg intermediary values to primitives) and thus need to be checked more often), and it causes noticeable ticks whenever it runs. In comparison, reference counting garbage collectors take a different approach - for every object, we keep a counter telling us how many times it is referenced. Whenever we create or expire a reference, we update that counter - if it reaches 0, we know that object is no longer being referenced, and can be safely deleted. Python uses this (once again coupled with generations, since it improves performance). Reference counters are convenient because they are faster and more efficient (in general, in regard to speed), but also concurrent - the garbage collection is done while our code is being evaluated, thus we do not have to figure out when we can interrupt our code and introduce a brief lag while we recycle memory. Unfortunately, they also have a few downsides - while tracing gcs only need a single bit per cell (a 32bit processor would still be able to reference over two billion objects even without being allowed this bit), refcounting ones need an additional integer for every object (well, if you are already doing weird object-oriented stuff like python does, this probably wouldn’t make much of a difference anyway). Refcounters also have to be atomic (meaning that the counter update and the optional deletion have to always be treated as a single operation and may not be interrupted by scheduling or similar), have a much harder time dealing with cycles (a notable problem in Python’s case), and are quite a bit more complicated to implement.
Also notable is SectorLISP’s garbage collector (I mean, pretty much everything about that project is noteworthy, but we are currently in the section on garbage collectors - check out the link, they explain a lot of stuff in detail (I probably should have read that instead of trying to figure out how the IOCCC entry works, huh)). It was the inspiration of my tracing collector. In just 40 bytes of assembly, they coded a function that, once an expression has been evaluated to its final value, can ‘extract’ that value from all the working memory the evaluation required, and then ‘pack’ the expression back into memory, adjusting all references accordingly. This ‘packing’ has the benefit that all memory that ends up being relevant is without gaps - therefore SectorLISP doesn’t need a linked list to keep track of free memory, and gets by with just a single pointer! However, while small and efficient, this requires a very good grip on how exactly Lisp cells use memory, and I decided it was too complicated to include in our implementation.