Lisp interpreter
We know enough about how Lisp works and what we could use it for. It sure would be grand if we actually had something to make that happen. I’ll explain how to write one, plus a suite of modifications we could implement to varying effects (type tagging, garbage collectors, foreign calls, etc).
The interpreters will be in C. It is a language high-level enough as to be portable and conveniently usable (don’t worry, we’ll deal with assembly in the next article), whilst still being simple and hardware-oriented enough as to require us to think about the specifics of the implementation (such as parsing input, encoding data, and memory management) - details that most evaluators tend to glance over.
Edit 20250727: Yeah so I was mostly done with this file before I came up with a tiny tweak I could add to the interpreter, and now it’s been over a year and there’s a lot more stuff going on. The concepts page will be fun to amend, but I’ll probably have to rewrite this file from scratch yet again. Yet, it’s the most recent and complete version I have, so here it is.
Outline
Before we get started, we should know what exactly it is we will be making. This will be a Lisp interpreter, meaning our program will take Lisp code as input and do whatever that code specifies. This will require us to keep track of data, evaluation, and environments. Being homoiconic (‘code and data are the same thing in our language’), we don’t need much to store stuff - the expressions we are evaluating and the data they operate on can use the same memory, and with a bit of care, so can the environments they use. We need said environments to keep track of the values of variables (a problem that gets significantly more complicated in the compiler). The evaluation is easy - the execution order of the interpreter mirrors that of the statement being interpreted, meaning we don’t need to think much about it (in practical terms, this means the C callstack is the one used to accomplish recursion etc).
We will not be tagging pointers nor collecting garbage in the initial implementation, to keep it simple. We will have separate memory for atoms and lists - this works out well, since we will be implementing a purely functional subset of Lisp (ie no set!
or similar). We will kludge together some arithmetic primitives to show the principle, but it will not be a particularly powerful language in that regard - type tagging would allow us to do a lot more there.
Memory
Well, let’s get started. We will need some standard libraries, some memory for atoms and lists, and pointers (rather, indices) to that memory telling us how much of it is still available. Implementing pointers as array indices is convenient, because it allows us to use a simple int
type for all internal data (well, except atoms - we could, but we’d be wasting a lot of memory, and would need extra conversion routines):
#include<stdio.h>
#include<stdlib.h> ; exit() on memory overflows
#define MEMSIZE 10000
int lm,ls,mem[MEMSIZE];
char str[MEMSIZE],la;
We have memory, so let’s write some routines for using it - implemented as define
macros, since it makes little sense to write entire functions for simple parts like this. We’d like car
and cdr
for list traversal, iscons
and isstr
telling us whether an element is an atom or a list, and getstr
that takes our internal representation and returns a C-style null-terminated ASCII string (since our atomic memory is already implemented with those, that’s just a matter of returning a valid pointer). Since we don’t tag pointers (technically, everything inside the list memory is a pointer), we’ll need some other way of figuring out what they point to. Since we have a fixed amount of memory anyway, we could say that any value less than MAX is pointing to something inside list memory, and anything above that is inside the atomic memory - as if the two were laid out one after another in memory (which they also happen to be, but trying to make use of that fact will require some nasty code tricks that are not worth it here). As for car
and cdr
, a pointer to a concell points to its first element, and the second is placed right after - meaning, implementing the two takes a simple dereferencing.
#define car(x) mem[x]
#define cdr(x) mem[x+1]
#define iscons(x) (0<x&&x<MEMSIZE)
#define isstr(x) (MEMSIZE<x&&x<2*MEMSIZE)
#define getstr(x) (str+x-MAX)
We’ll need a way to make use of this memory. Lists are a bit easier - the cons
primitive allocates memory for a concell (via the lm
index), then puts new values inside that memory. We can conveniently also check for memory overflows here:
int cons(int car,int cdr){
mem[lm++]=car;mem[lm++]=cdr;
if(lm>MEMSIZE){printf("out of cons memory!\n");exit(-1);}
return lm-2;
}
Atomic memory is harder - appending a new string to our existing memory is easy enough, however the signifying property of atoms is that they only occur once - meaning, if we try inserting the atom defun-macro
when it is already present in memory, we should return a reference to the previous instance instead of making a new one.
To do this, we will put the string we want to internalize (technical term for the above) at the end of our string memory, without advancing the memory pointer (the ls
in the first piece of code that tells us how much atomic memory we’ve used up). intern()
can then go through all previous atoms and try matching it (essentially reimplementing strcmp
). If we did not find the new string, we advance our memory pointer to capture it, making it a proper atom. If we did find a match, however, we return a reference to that without advancing the pointer - meaning that it is not considered part of memory, and gets overwritten the next time we try interning a string. The code for this was inspired by SectorLISP and looks a bit messy:
int intern(){
for(int =0;i<ls;i++){
char c=str[i++];
for(int j=0;;j++){
if(c!=str[ls+j])break;
if(!c)return i-j-1+MEMSIZE;
c=str[i++];
}while(str[i++]);
}
; no match found, intern new atom
int ret=ls;while(str[ls++]);
if(ls>MEMSIZE){printf("out of atomic memory!\n");exit(-1);}
return ret+MEMSIZE;
}
Speaking of atomic memory, we’d like it to contain atoms for the built-in functions when our interpreter is started. This is technically not necessary (they will get filled in as we encounter them during execution), but putting them in ourselves on a blank slate means we know exactly where they will be in memory - which makes it a lot easier to check for them in the body of the interpreter. While we’re at it, let’s combine this with some more general initialisations like initializing memory and the pointers:
void init(){
const char initmem[71]="nil\0t\0defun\0lambda\0cond\0atom\0"
"eq\0car\0cdr\0cons\0quote\0print\0mem\0+\0-\0*\0/\0%\0";
lm=2,ls=71;
for(int i=0;i<MAXMEM;i++)mem[i]=str[i]=0;
for(int i=0;i<ls;i++)str[i]=initmem[i];
}
In Lisp, nil
is a special value meaning ‘end of list’. We’d have to encode this somehow. Since it is supposed to be nothing, it would be convenient if we could encode it as a pointer whose value is zero. But oh noes, that is still a valid pointer! What if there was something in that memory?! Once again, the solution to this is stupidly simple: don’t put anything in that memory. Our memory is initialised to all zeroes (ie everything points) to nil, and by setting the initial memory pointer to 2 (ie skipping the first concell - since concells each store two values, they take two indices), we make sure the nil
concell remains empty (well okay, we actually end up putting stuff there later on, but as long as we make sure to check for nil, it all works out). Likewise, we make “nil” (meaning, the letters N then I then L then a null byte) the first atom into the atomic memory - thus, if we made an atom with zero offset, it’ll be treated as the atom nil
(meaning, it’ll be mapped to the string “nil” - ideally we’d keep the two concepts separate, but it’s helpful to have a meaningful value for both. Also, I was not arsed to recalculate the hardcoded magic constants below.)
Parsing
The code/data we are to operate on is supplied as a string. We’ll need a method of converting it into the nested lists of atoms that Lisp internally works with. For this we need a parser. Usually this is split into a tokenizer and the like, but we won’t need that for a simple syntax like ours. We will, however, need some helper functions:
FILE*fd;
char getnext(){return la=getc(fd);}
int iswhite(char a){return a==' '||a=='\n'||a=='\t';}
char white(){while(iswhite(la))getnext();return la;}
The char la
we declared all the way at the beginning is the ‘lookahead character’ - ie, we store the last byte we read so we don’t have to keep asking for it. getnext
is the function that actually asks for it, using a file pointer - we set that one in our main
function to point to wherever our code is. The unfortunately named iswhite
function tells us whether a character is whitespace - spaces, tabs, and newlines, all characters we don’t actually consider parts of our code. white
uses that function to skip over them. Here is a slight modification to the last function from the ExBot source to also skip over comments (denoted with ;
):
char white(){
while(1){
if(iswhite(la))getnext();else
if(la==';')while(la!=' ')getnext();
else break;}
}
Now, for the actual parser. Technically it also includes a tokenizer - however, the only tokens we care about are “parenthesis” and “not that”, so that part’s comparatively easy.
If the next token is a closing paren, the list we are currently reading is over - we return zero to nil-terminate it. We will also treat EOF (end-of-file) as a closing paren in order to close off any lists that might still be open by the end of input. Otherwise, we allocate memory for a concell - whatever else we parse, it will be part of a list, thus it will need a list entry.
If the next token is an opening paren, the next thing is a list. Fortunately, we can simply call the parse function recursively to parse it and get a pointer. We allocated memory first because otherwise it would’ve been overwritten by the recursive call.
Everything else we treat as an atom. We fetch until the next whitespace or parenthesis (that’s the tokenizer part), putting input at the end of string memory, then calling intern
from before to take care of the rest.
Regardless of what we parsed in, we put it in our allocated list entry, and call parse
recursively to get us the rest of the list in the cdr
of the entry. Then we return said entry. In code, all that looks as follows:
int parse(){
if(lm>=MEMSIZE||ls>=MEMSIZE){printf("out of parse memory!\n");exit(-1);}
white();if(la==')'||la==EOF){getnext();return 0;} // nil-terminate
int ret=lm;lm+=2; // allocate one concell
if(la=='(')getnext(),mem[ret]=parse(); // recurse for lists
else{char*s=str+ls; // atomic - put here
while(!iswhite(la)&&la!='('&&la!=')'&&la!=EOF&&ls<MAX)
*s++=la,getnext();
*s=0;mem[ret]=intern(); // null-terminate, then intern
}
mem[ret+1]=parse(); // recurse for rest of list
return ret;
}
Printing
We can convert input, would be nice if we also had output. In this case it’s slightly more complicated than usual since we have environments, which in some cases can end up being circular. This can easily be handled by just passing the environment as well, and adding some code to make sure we don’t output it. We’ll also use a wrapper to add a newline at the end of input, since we output everything on a single line, and that would otherwise mess up output.
void pprint_(int expr,int env){
if(!i){printf("<nil>");return;}
if(isstr(i)){printf("<%s>",getstr(i));return;}
if(i==env){printf("<env %i>)",i);return;}
if(i<env)env=i; // to make sure closures dont loop
putchar('('); XXX
}
void pprint(int expr,int env){pprint_(expr,env);putchar(10);}
merging code from lisp/v1 and spc/lisp, see to it that it makes sense also, wrapper func for newline, and see if i didnt have indenting code somewhere
Wrapper
We’ll jump ahead a bit, assume we have written the eval
function, and see what main
looks like. This way, we’ll be done with everything else and can focus entirely on the implementation of eval
for the rest of the article.
The main
function is the one that puts it all together and lets us actually use the code. It calls init
to set up memory, does its own little set-up routine, and drops into a REPL - ‘read-eval-print-loop’, which reads one unit of code, evaluates it, prints its result, and repeats for the next one (or exits if there’s no more input). The setup in main
keeps track of the current ‘global’ environment and the current expression, and it also conveniently handles input, allowing us to either run interactively in a console, or executing code from a file.
int main(int argc,char**argv){
init();
fd=argc>1?fopen(argv[1],"r"):stdin;
if(!fd){printf("cannot read file\n");return -1;}
while(1){
if(argc==1)printf("> "),fflush(0);
if(feof(fd))break;
getnext();white();getnext();
pprint(
eval(
parse(),
mem[0]),
mem[0]);
}fclose(fd);putchar(10);return 0;
}
Let’s take a look at that mem[0]
up there. The paragraph on ‘variable dereferencing’ below goes into more detail on how the values of variables are stored, but long story short, we use ‘environments’ for this, and every expression can only be evaluated in an appropriate environment. Since we made sure to never use the first two memory units (in order to encode nil
), we have two unused units of memory - let’s assign the first of those the role of the ‘global environment’, containing all variables that are not otherwise contained in a function (meaning, all the stuff we defun
) (more on how and why of environments in the paragraph on ‘closure calls’).
Eval
Ho boy here we go, finally some code that actually does something interesting. eval
is the function that takes an expression and an environment, and figures out what the value of that expression is in that environment. My reference implementation is about a hundred lines of code- which is really not a lot (the Emacs source is 777.642 lines), but it is still a lot to take in at once, so I’ll be splitting it in parts.
Variable dereferencing
First off, environments. The ‘why’ should be pretty clear by now. The ‘what’ is a list of variables and their values (also known as ‘variable bindings’). This makes the ‘how’ pretty easy: we keep a list matching variable names to their values. There’s a couple ways to do this (William Byrd has done some fun stuff with those), but we’ll be keeping it easy - each entry in that list is a concell, its car
is a variable name (ie an atomic symbol), its cdr
that variable’s value.
Note: Byrd implements environments as nested closure functions. This is convenient in the sense that if you have closures, you don’t need extra code for environments, but it is inconvenient because it requires closures before the concept of an environment. We struggle around the distinction more in the compiler, but for now, we will be defining environments in a way that is easy to handle for the interpreter (ie with loops in C).
So, to begin, let us write an eval
that takes a variable name and goes through the environment, returning its value if it finds one, or the variable name itself if not (since symbols are self-evaluating):
int eval(int expr,int env){
if(!expr)return 0;
if(isstr(expr)){ // variable dereferencing
// XXX use car/cdr
for(int bind=env;bind;bind=cdr(bind))
if(car(car(bind))==expr)
return cdr(car(bind));
return expr; // if no binding, return self
}
// --- stuff for evaluating lists ---
// if we cannot eval, complain and return nil
printf("> wtf:");pprint(expr,env);return 0;
}
Well, that wasn’t hard! Unfortunately, the bulk of our code will be replacing that innocuous stuff for evaluating lists
comment.
Function calls
How do we evaluate lists? The (very concise) syntax of Lisp tells us that in a list, the first expression is the operator (ie function to apply), and the remaining elements are the operands (ie the arguments to that function). So, the first thing we do is evaluate the first element of the list. The result could be a primitive operator (we manually inserted atoms for the built-ins to simplify checking for these), another variable (ideally, we’d keep dereferencing until we get something usable, but for now, we can also just abandon evaluation), or a compound lambda
expression (aka a closure). lambda
s are a bit special in how they are called, so we will leave those for later. In almost all cases, we have to also evaluate the arguments to the operator. We will first check whether we have one of the special cases and handle those manually, then we will evaluate all arguments and have all other functions work on those. In code, we have the following:
int eval(int expr,int env){
// --- stuff for variable dereferencing ---
int op=eval(car(expr),env);
if(isstr(op)){ // builtins
op-=MEMSIZE; // move index w/in string memory - easier to check
// check special cases
switch(op){
case 45:break; // quote
case 6:break; // defun
case 12:break; // lambda
case 19:break; // cond
}
// not a special case. eval args
int args=cdr(expr),res;
if(args){args=lm;
for(int t=cdr(expr);t;t=cdr(t)){
res=lm;lm+=2;
if(lm>MAX){printf("out of evlist memory!\n");exit(-1);}
car(res)=eval(car(t),env);
cdr(t)=lm;
}cdr(t)=0;
}
// check regular cases
switch(op){
case 51:break; // print
case 57:break; // printmem
case 40:break; // cons
case 32:break; // car
case 36:break; // cdr
case 24:break; // atom
case 29:break; // eq
case 61:break; // add
case 63:break; // mult
case 65:break; // sub
case 67:break; // div
case 69:break; // mod
default:printf("not primfn: %s\n",str+op);return 0;
}
}
if(car(op)==12+MEMSIZE){ // lambda
// --- handle closures ---
}
// --- fallthrough ---
}
Wonderful! Technically, we don’t even need all the break
s since the code there would return
values directly, and we won’t get to fall through. Now we only need the code for all that.
Builtins
Quote
Let’s do quote
first, since it’s easiest. To quote
an expression, we simply return it:
case 45: // quote
return car(cdr(expr));
I told you it was easy. We need the cdr
to skip the quote
keyword, and the car
to only return the first argument. PicoLisp instead returns all arguments, which should not be hard to modify.
Cond
Recall that cond
takes a list of pairs - the first element of a pair is a condition, and if that condition does not evaluate to nil
, we return the (evaluated) second element. This is also relatively straight-forward to accomplish with a for
loop:
case 19: // cond
for(int cond=cdr(expr);cond;cond=cdr(cond))
if(eval(car(car(cond)),expr))
return eval(car(cdr(car(cond))),env);
return 0;
The car
s and cdr
s get a bit more involved, but otherwise it’s pretty simple.
Defun
Now things get interesting. defun
modifies the global environment, which we do not have access to. Fortunately (as mentioned previously), said global environment is always in the same place in our memory - meaning, we can write directly to it:
case 6: // defun
mem[0] =
cons(
cons(
car(cdr(expr)),
eval(
car(cdr(cdr(expr))),
env)),
mem[0]);
return car(cdr(expr));
Lambda
Now things get complicated. lambda
has two arguments, a list of variables, and a function body to evaluate to. The issue is that not every variable inside the function body is also a function argument. So-called ‘free variables’ would have to be looked up in the environment - however, that look-up would have to happen then we evaluate the lambda
, not when we call the resulting closure! If this is confusing, the page regarding concepts of Lisp explains it in more detail (and with examples!). It suggests solving this by going through the function body and replacing all free variables with their values, then explains that this is very inefficient and stupid. Instead, it suggests storing the values separately, and storing a reference to that within the closure.
We do not collect garbage in this implementation. This is pretty bad, since it means we keep gobbling up memory we might not need, but here it ends up working out for us, since it means a variable we’ve defined never gets removed or redefined, merely overshadowed (ie we create a new binding that gets found first) (this is also why I opted to write a fully functional interpreter first - if we had assignment or a variable number of expressions in the body, this wouldn’t work either). This means that to create a closure, we put a reference to the current environment into the lambda
. There’s clever ways to do this, but instead we will mangle our syntax a little - the first argument of a lambda
are the names of its arguments, the second is the function body, and we will manually insert a third argument that is a reference to the environment it was created in. We can also save up on a bit of memory by replacing the nil
that terminates the expression with that environment - it means we don’t need to allocate more memory, but it violates the principle that ‘lists are always nil
-terminated’ - hence why we needed extra checks in the pprint
function. Oh, also, if there already happens to be an environment there, we won’t do this - that would mean we’d already closed it before, and we should not overwrite its environment.
That was a lot of technical talk. The code to accomplish it is significantly shorter:
case 12: // lambda
if(!cdr(cdr(cdr(expr))))
cdr(cdr(cdr(expr)))=env;
return expr;
Car
Now for the easier functions. car
takes its first argument, and returns its car
:
case 32: // car
return car(car(args));
I told you these were easier.
Cdr
cdr
does the same as car
, but returns the cdr
of its first argument:
case 36: // cdr
return cdr(car(args));
Cons
cons
takes its first two arguments, and cons
es them:
case 40: // cons
return cons(car(args),car(cdr(args)));
Atom
atom
tells us whether its argument is an atomic symbol (in most dialects of Lisp, it’s usually called something like symb?
). Unfortunately, it is slightly more complicated than the previous ones, as the value that the isstr
macro returns in C is not immediately usable in Lisp (well, it is, but we’d rather not pass arbitrary stuff). Fortunately, it is easy to make it passable by replacing it with a reference to the atom t
:
case 24: // atom
return isstr(car(args))?MEMSIZE+4:0;
Eq
Oh, the debates there have been on how equivalence should be defined in Lisp! We will be implementing the simplest pointer comparison, equivalent to Scheme’s eq?
:
case 29: // eq
return car(args)==car(cdr(args))?MEMSIZE+4:0;
Arithmetic helpers
We tossed in arithmetics to have something useful for our Lisper to operate on. Unfortunately, our current implementation does not actually distinguish whether its atoms are numeric (‘consisting of digits’) or not. The solution is easy: we stick our fingers in our ears, start loudly singing ‘The Eternal Flame’, and naively assume that anything passed to an arithmetic primitive will in fact be a number. This will break in fascinating manners in a lot of cases, but it will do the right thing with the right inputs.
We will need two helper functions to convert between the digit-by-digit-based representation we use for atoms and actual numeric types - in other words, convert between ASCII and integers. Unfortunately, some moron decided that atoi
should be a reserved function in C for some reason [XXX funny, but unnecessarily hostile if you dont get the joke], so we shall have to name our function latoi
and litoa
.
Joke explanation: as you can imagine, C already features a function to accomplish this, and it’s called atoi
. We will be reimplementing it ourselves because it’s simple and fun.
latoi
(atom to integer) is the easier one. We successively take digits (defined as a byte between 48 (ASCII code for ‘0’) and 57 (ASCII code for ‘9’) inclusive). With each digit, the running result is multiplied by 10 (our base), and the value of our digit is added to it. We also check whether the first byte is 45 (ASCII code for ‘-’), and remember to invert the result if so.
int latoi(int index){
char*s=getstr(index),inv=0;
int ret=0;
if(*s='-')inv=1,s++;
while('0'<=*s&&*s<='9')
ret=ret*10+*s++ -'0';
return inv?-ret:ret;
}
litoa
(integer to atom) is somewhat more complicated, because given a number, it is significantly harder to calculate the first digit (#include<math.h>\nfloor((double)x/exp(10.0,log10((double)x)))
) than it is to calculate the last (x%10
). We can circumvent this in two ways - we either store the number backwards (relative to the calculation - thus reversing it twice), or we store it as we calculate it and then invert it. Storing it backwards requires us to know the length of the number first, which requires either more inconvenient math (exp(10.0,log10((double)x))
), or annoying code duplication to make sure we pass through the string correctly, not to speak of the actual conversion.
We’ll take the easier way of storing the string as we calculate it, and then using some esoteric shuffles to reverse the string in-place. Which looks as follows:
int litoa(int x){
char*s=str+ls,swap,i;
if(!x)*s++='0';
while(x)*s++=(x%10)+'0',x/=10;
*s=0,i=s-(str+ls);
for(int j=0;j<i>>1;j++){
swap=str[ls+j];
str[ls+j]=str[ls+i-j-1];
str[ls+i-j-1]=swap;
}return intern();
}
Arithmetic primitives
With this, we can finally implement the arithmetic primitives in eval
. To do so, we initialise a return variable to some value (see principles for an explanation of ‘which’ and ‘why’) (for the ‘return variable’, we will be clever and save on some lexical memory by reusing the int res
that we used to evaluate arguments), then go through all arguments, using the helpers to convert them to numbers and combining with the return value appropriately. We then convert the return value to an atomic value that the lisper can understand, and return:
case 61: // add
for(res=0;args;args=cdr(args))
res+=latoi(car(args));
return litoa(res);
case 65: // mult
for(res=1;args;args=cdr(args))
res*=latoi(car(args));
return res;
// etc
Foreign functions
The keen would have noticed that we defined primitive atoms print
and mem
, and case-checks for them in eval
. These aren’t ‘standard’ Lisp functions, and shouldn’t really be here (ie, feel free to skip them) - but they are somewhat convenient, so we’ll add them anyway. They also showcase how we could add bindings to external (relative to the program being interpreted) code.
print
is a wrapper around pprint
, and lets us output intermittent values:
case 51: // print
pprint(car(args),env);
return car(args);
In case our interpreter breaks or misbehaves, it is very useful to be able to take a look at its memory and see what exactly it is doing. mem
does exactly this, outputting memory and letting us figure out what is happening:
case 57: // mem
printf("%i cons, %i atoms, env is %i\n",lm,ls,env);
for(res=0;res<lm;res++){
printf("%i:",i);
if(isstr(mem[i]))printf("'%s' ",getstr(mem[i]));
else printf("%i ",mem[i]);
}printf("\n\n");res=0;
while(str[res]&&res<ls){
printf("%i:<",res);
while(str[res])putchar(str[res++]);
printf("> ");res++;
}printf("\n\n");return car(args);
Closure calls
Phew, that was something! One more block of code, and we’re done! We explained how we create closures, but we haven’t figured out how to call them.
Closures are user-defined functions, ie we make them via lambda
. This means that any closure has an associated environment, letting us reference the values of variables as they were at the time the closure was created (once again, I’ve written more about why we are doing this in other places). With that, all we need to do is extend this environment to include the ‘formal arguments’, binding them to the variable names the function specified, and evaluate the body in that new environment.
To help with the last paragraph, here’s an illustrative graphic, rendered in glorious ASCII art:
; lambda function
; / \
; function / names of \ closure env
; name / formal args \ / wedged here
; | | | body X
; v | / \ / \ v |
(defun add2 (lambda (x y) (+ x y) ))
(add2 4 5) ; <- function call
; ^ \ /
; | \ function arguments
; |
; function name
And here’s the code for it:
if(car(op)==12+MEMSIZE){ // lambda application
int closenv=cdr(cdr(cdr(op))); // the env 'lambda gave us
int names=car(cdr(op)); // the argument names the function wants
int vals=cdr(exp); // the arguments we pass it
while(names){
closenv=
cons( // extend closure env
cons( // by new binding:
car(names), // bind this name
eval(car(vals),env)), // to this value
closenv);
names=cdr(names);vals=cdr(vals);
}
return eval(car(cdr(cdr(op))),closenv);
}
Addons
Well, believe it or not, the above is all we need for a functioning Lisp interpreter. It’s not very efficient, nor capable, but it does the right thing more often than not. That being said, we’ll now look at some ways to extend that basic layout to make it just a bit more useful.
Garbage collection
Garbage collection is, besides tail-call optimisation, the thing that makes functional languages practical. Evaluating arguments for ‘standard’ functions (ie not one of the special cases defun lambda quote cond
), evaluating the conditionals of cond
, and making environments for closures (ie all primitives whose implementation calls cons()
) all allocate values in our memory that we do not use once we’ve calculated the value of the expression. Thus, if we run our program for anything more complicated than the simplest test cases, it will quickly run out of memory - despite having plenty of memory, which happens to be taken up by temporary values that we are no longer using.
The concepts page goes in more detail on garbage collectors (and I’ll be quite honest, I don’t quite get them either yet) (<- fyi, that means I’ll probably rewrite the entire pages from scratch once I hit enlightenment). For now, we will be implementing a simple XXX gc.
Garbage collectors, as the name implies, go through our memory, and collect all the ‘garbage’ memory (usually, temporary values that we no longer use) so we can use them somewhere else later. To keep things simple, we will do garbage collection in the repl - meaning, we evaluate some code, and once we’re done, we clean up after it. This is not very useful, as we usually run out of memory while evaluating said expressions, but it still helps somewhat, as it ‘resets’ the use of temporary memory in-between expressions. That sounds pretty easy - and it would be, if it wasn’t for a slight complication - some builtins (defun
) modify the global envronment - and we’re not allowed to touch that one.
The easiest solution would be to keep global values separate from temporary ones. This is not as trivial as it sounds - evaluating the value we defun
happens in temporary memory, so we’d have to (recursively) copy it over once we’re done. It’s also rather inconvenient to keep track of multiple segments of memory that otherwise store the same kind of data - imagine how closures would jump around. How would we even keep track of where the closure environment is?
Downsides aside, the idea of recursively copying values around will come in handy (note: recursively copying, since for a given concell, its car/cdr will almost never be after it in memory, but we can call the copying function recursively with them as values). Why don’t we use the same memory? We evaluate our expression, then we recurse through the global environment we want to keep, copying it somewhere else, and then we free all memory for future use except that bit where we put the global environment? This has three issues. The ‘memory we keep’ will be after the freed memory - and our use of lm
requires free memory to come at the end. With some pointer witchcraft during the recursion, this can be solved by doing the above, but then copying the ‘memory we keep’ to the beginning of our memory (this would be solved a lot more elegantly by using a linked list of free cells, but that’ll require some more auxilliary code - I’ll probably end up using it for a later version of the gc anyway tho). The second issue is that it cannot handle circular data, and closures usually end up being circular. This can be solved even easier by marking each cell we’ve passed, noting where we put it (this makes more sense if you mess around with the code). The third issue is that this is very inefficient - we have global memory, we put some stuff on top, we then shift the entire global memory after the stuff on top and then back again. [etc, collect only new stuff, requires keeping track of how much globmem weve taken]
inefficient, noncircular