Edit 070325: Currently reworking this, please ignore the mess
History
Think back to the late ‘50s. The u.s. is butthurt about losing to some farmers , some white boy is misappropriating black culture again, israel is pulling an israel while hungary is having a crisis, the russkies managed to toss a metal ball so hard it stayed up and the yanks are spinning up the propaganda machine about it , and relatively recently, we figured out we could replace the light bulbs the nerds were playing with at the universities with amalgamations of metal . They were smaller, used less power, and could be made much faster, but the main question remained - how the hell do we tell these contraptions what we want them to do? We had figured out that, instead of rewiring the thing from scratch every time, we could feed it strips of paper telling it what to do, but how would we tell it to do things?
There was the straight-forward bottom-up approach. Figure out what all the things it could do are, name them all (for convenience), and just supply a list of those to do one after another. This is very useful, as it gives you direct access to every single thing a computer can do, and still exists in the form of assembly languages. However, for most tasks, it proved way too basic (no pun intended), too low-level to be useful.
Computing was still considered a branch of mathematics back then, and the mathematicians were hard at work at figuring out the top-down approach - how to turn all the fancy mathematical and theoretical formulas and equations into something a computer could work with. FORTRAN introduced a lot of concepts that made it significantly more convenient to work on computers, but fundamentally it was still built up from the same one-arithmetic-after-another that earlier assemblers were made from.
A nerd working on ai by the name of John McCarthy) quite liked how it treated lists, but was dissatisfied with the primitive program structure and flow . A coworker of his, Steve Russel), realised that the theoretical model of computation McCarthy had come up with could in fact be implemented, not as a sophisticated machine that took the extended mathematical notations and translating them to machine code - as was the wisdom at the time -, but instead as a much simpler program that took code and just did whatever the code was supposed to do - in other words, the first interpreter.
General
Lisp, at its core, is a metalanguage - it doesn’t let you do stuff as much as it gives you the tools to make something that does stuff. Its recursive conditionals, its representation of symbolic information, be it code or data, as lists , proved to be simple, elegant, flexible, and yet complete enough to adapt to anything we thought to throw at it, and, as the years came and went, it remained at the forefront of computing.
There’s an old adage that there is no innovation in computing - we are just collectively rediscovering what some autist cooked up in their basement in the ‘70s. Painfully often does this prove to be the case, and in almost all cases, it was cooked from either Lisp, or a derivative thereof. In most cases, there are languages more suited to the task, yet looking back, Lisp was used to fit a task never before worked on, and was then morphed and evolved into those exact languages to fit the problem better. Thus the closely correlated adage that every sufficiently sophisticated program, be it an operating system, compiler, mailing client, web server, or (jod forbid) a text editor, contains at least half a buggy implementation of Common Lisp.
Even without necessarily being useful to solve a problem, merely knowing how one could solve it in Lisp is great help in actually solving it. To quote Eric Raymond , Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot. It is not without reason that the Wizard Book of Programming is dedicated to “the spirit that lives inside the computer” - for if it spoke to us, this would certainly be its language. (XXX quote - put at the list below and figure out how to arrange later)
Specifics
(Some) concepts introduced by Lisp:
- Tree-like structures. Linear arrays and pointers were of course both familiar concepts, yet Lisp was the first to introduce the concept of a structure that branches equally.
- Conditionals. The concept of an if-then-else was by now quite familiar, yet the idea of a list of conditions, each associated with an appropriate response, was quite novel (and a very direct inspiration to many a language’s
switch
statements) - Similarly, convenient async. Interrupt vectors instructing the computer how to handle specific “outside events” were also a familiar assembly concept, yet, due to “functional programming” having no side effects (ie any function can be evaluated without affecting anything outside of it - in pure lisp at least, side-eyeing CommonLisp hard here), Lisp was much more flexible - the attempt to adapt this flexibility to browser/internet-based software resulted in the development of Javascript
- Similarly, recursion. Also not unfamiliar, but in assembly, it required extensive busywork setting up and freeing stack frames (which C later automated), and in Fortran, it simply didn’t work, as each function had a frame, instead of each function call (which is situationally more efficient - this is how many lisp implementations optimise tail recursion)
- Automatic memory management, in the form of a garbage collector. No longer is the programmer responsible for keeping track of available memory and where and how data is laid out in it, no longer are they clogging it up with references they forgot to remove, or corrupting it by reaching out of bounds and causing segfaults that shall plague the nightmares of any C/CPP developer! Rust, like Java before it, tries to reintroduce this convenience, unfortunately both are bound to fail, as the power of having direct access to memory carries with it the responsibility over that memory.
- Dynamic typing and higher order / first class functions. Lisp “symbols” (otherwise known as variables) have a value, yet lisp does not care what exactly that value is, meaning they can be atoms, other symbols, complex data structures or entire functions, and none of this requires any syntactic distinction. This is part of what makes Lisp’s syntax so terse and flexible in expressiveness.
- Self-hosting. If we have a program that is good at making other programs work, why not make that program run itself? This introduces a chicken-and-egg problem that will be practically resolved later (if I ever get to writing that part, that is)
The REPL (read-eval-print loop) as means of interfacing with a computer system. Instead of writing large files, passing them through the computer, and waiting to find out there was a mistake somewhere, this allows the user to write a line of code and immediately see its results. In addition, it also allows for dynamic work on data as needed instead of having it be encapsulated in programs that complete work in its entirety. Smalltalk evolved as an extreme case of the latter two points, wherein the entire computer system was accessible and modifiable from within itself.
XXX most programming is about getting to a result (journey not destination). lisp, being functional, gets you the result, but obscures the computation required to get there. this is beneficial, as it allows you to think a lot more and more abstractly, and it is detrimental, as it gives you almost no access to the underlying structures, and is therefore also really hard to do efficiently. - maybe also segue that into my latest enlightenment?
- A Lisp programmer knows the value of everything, but the cost of nothing. - Alan Perlis, Epigrams on Programming
- reference the comment abt garbage collectors
XXX summarise/reformat the following:
- leaps and bounds in compilers and interpreters
- the first virtual machines and dedicated hardware for them
- object-based systems
- jit execution and memory management)
- graphical/windowing interfaces
- extensive text editing
- mathematical
- logical
- messaging, and browser-based/async derivatives
- educated AI
- distributed supercomputing
- protein/nanomachine synthesis
- quantum computing
- trees, automatic memory management, dynamic typing, conditionals, higher-order/first-class functions, recursion (details here), selfhosting, repl, async
I had already written a page about Lisp over a year ago, and I’ve made enough progress during that year to warrant a rewrite. I’ll once again explain how to implement a basic lisper, which should be a useful tool to mess around with to implement weird ideas of your own. I’ll give some examples regarding functional programming (ie how one could use such a thing), and (if I get to it) a list of further reading.
Lisp specs
We will be implementing a Lisp interpreter (in C), ie a program that takes Lisp code and does whatever that code is meant to do (I am assuming some basic experience with C here). In order to accomplish that, it would do us well to understand how Lisp works.
Lisp is a functional language. There is a list of definitions regarding what that even means, those tend to focus on various implications without actually getting to the core of what it actually is. In the simplest (original) interpretation, “functional” programs are modeled after mathematical functions.
Let’s take f(x,y) = x^3 + y^2 - 5
as an example. We humans have little trouble reading and understanding this equation, but for computers it’s a mess to figure out which numbers and symbols go together, what that =
means, etc. There’s been sufficient progress in that regard (WolframAlpha is quite good at figuring them out), but it’s not exactly the small, flexible, easy-to-use program we want (ignoring issues of licensing).
Computers are imperative in nature, meaning they execute one instruction after the next. “Imperative” is a bit of a misnomer - “iterative” might’ve been better, but it’s too late to challenge that convention now. Here’s an assembly program that implements our function from above:
f:
sub sp, sp, #16
str w0, [sp, 12]
str w1, [sp, 8]
ldr w0, [sp, 12]
mul w1, w0, w0
ldr w0, [sp, 12]
mul w1, w1, w0
ldr w0, [sp, 8]
mul w0, w0, w0
add w0, w1, w0
sub w0, w0, #5
add sp, sp, 16
ret
Why, that’s a veritable mess! Not only is it impossible to tell where everything goes and what everything means, but it’s also the farthest thing from flexible - it only works on a specific type of computer (arm64 processor), it doesn’t tell us what its signature is (ie how we call it - the f(x,y)=
part of the above), it requires specific care (x
and y
have to be placed in a specific place and have to be 64-bit signed integers). It sure is fast, but can’t we make it any easier to use?
One useful aspect is the so-called stack (the sp
in the above stands for “stack pointer”). It lets us put values somewhere and retrieve them later. It’s how C passes its arguments to functions - which is why it shows up in the above. We could write a language around that - in fact, this is what Befunge does (ignoring the dimensional chicanery). Our program, expressed in Befunge, reads as follows:
5xxx**yy*+-
A lot more concise, for sure, and we can now tell what each symbol does - x
puts x
on the stack, +
adds … something? The issue here is that we cannot tell at a glance how the data and operations come together - the -
and the 5
belong together, so why are they on opposite ends of the code?
Fortunately, math has long had a solution for these kinds of syntactic ambiguities, in the form of brackets. Clearly, everything inside a pair of brackets belongs together, and anything applied to it is applied after we figure out what the stuff in the brackets means. To fit in with this, we move the operand (“the thing being done”) to the first place inside the brackets (ie, f(x,y)
becomes (f x y)
and x*y
becomes (* x y)
). Thus, our original equation (assuming we don’t have an exponentiation function, to match the above examples) is now:
(- (+ (* x x x) (* y y)) 5)
That sure is a lot of brackets (Lisp is infamous for this), but the brackets are the only syntax we need to declare almost anything. In fact, we can also add the f(x,y)=
part:
(defun f (lambda (x y) (- (+ (* x x x) (* y y)) 5)))
We now know what that =
means - it DEfines a FUNction! And what is all that math nonsense? Why, it’s a generic function (generic functions in lisp are called “lambdas”, in homage to lambda calculus) - we even list its arguments!
This exceedingly simple syntax (everything is in brackets, and the first thing in the brackets tells you what to do) is part of what makes Lisp so powerful:
- It can be adapted to anything. In SpaceCore, I use it to put together spaceships, in ExhiBotanist, it determines what everything is and where and what happens when the player runs into it.
- It doesn’t distinguish its contents. After executing the above line,
f
is set equal to some list. Whether that list is a function or just data depends on what we do to it - meaning, we could write a function that takes another function (treating it as data), and gives us its derivative (chapter 2.3.2 of “Structure and Implementation of Computer Programs” describes just such a setup). In fact, this is how a lot of functionality is implemented in Lisp. - We could bring the previous point to an extreme and implement the language itself as a structure inside the language (technically known as a “meta-circular evaluator”). This opens immense possibilities, but is slightly outside of the scope of what we’ll do here.
The only other thing we need is knowing what those “functions” we apply to data are. We will be using defun lambda cond eq car cdr cons atom quote + - * / %
- the first two we know (though they are more of a convenience), some primitives for operating on lists, and some arithmetics that we don’t need but will be useful to play around with and showcase what our language is capable of. [XXX]:
defun
takes an atom and some other value, and creates a global variable with the name of that atom and the value of the rest of the expression. This is only done for global variables, as local ones are much easier to create with a closure instead (there’s a section further down explaining variable binding)lambda
takes a list, possibly empty, and some expression. It describes a function - the list gives us the variable names the function takes as input, and the expression is the value of the function. More on how it’s implemented below.cond
takes a list, each element of that list consisting of two parts. The first part is a condition, and the second is an expression. It goes through the list, and when a condition evaluates as anything but nil, it evaluates and returns its expression (ignoring all others).eq
takes two arguments, and returns non-nil if they evaluate to the same expressioncar
andcdr
return either the first element of their argument, or everything but. The next section explains why.cond
attaches an element to a list, likewise explained in the next sectionatom
returns non-nil if its argument evaluates to an atom (ie, isn’t a list)quote
returns its arguments unevaluated. This is needed, as functions will otherwise evaluate all their arguments -(sum (1 2 3 4 5))
will fail because the interpreter will first try to evaluate1(2,3,4,5)
, and fail because1
is not a function. Instead,(sum (quote 1 2 3 4 5))
will first evaluatequote(1,2,3,4,5)
, which returns a list of five numbers, which then gets summed up. People familiar with Lisp will notice the unusual syntax - most implementations ofquote
only return their first argument, ie in our case it would only sum the number1
and we should have written(sum (quote (1 2 3 4 5)))
instead. We will not be doing this - quote returnsall
its arguments. This is both easier to use and more efficient to implement.+-*/%
will sum, subtract, multiply, divide, or take the modulus of their arguments. Commutative functions (+*
) treat all members equally, while noncommutative ones (-/%
) operate between the first and all other elements - ie,(- 10 2 3)
subtracts 2 and 3 from 10, giving us 5.
Parser
Now, off to implementing! The first step is to make the computer understand the “nested lists of stuff” idea from above. For this, we will need two basic data types:
- The concell. A concell contains two references to other things - for our purposes, these will usually be “an item from a list” and “everything else in the list”. As a sidenote, the end of a list is denoted by the second reference being set to
0
. For archaic reasons, these two elements are called car and cdr (after the assembly mnemonics for “Contents Accumulator/Decrement Register” that were originally used to implement them). Concells will be implemented as a list of integers - any integer we refer to will be a car, and the next integer will be the corresponding cdr. - The atom. Here, the name is actually helpful - an “atom” is something indivisible, which in our case are the symbols/functions/variables/etc that are contained inside the lists. Atoms will be implemented as a large string buffer that we reference arbitrarily - as each atom will be null-terminated, we can simply read from whichever spot we referenced to the next null-byte.
A small issue is distinguishing references. A concell can point to another concell or an atom - how do we tell which one it is? This is usually done via tagging, but we won’t be doing that yet. We will have a limited number of concells, so we could just say that whenever a reference is pointing to a nonexistent concell, it is actually pointing to an atom. Metaphorically, we are putting the string buffer with the atoms immediately after the integer array for the concells - if we count past the end of one, we will be in the other.
“Parsing” is the act of taking text and breaking it up in parts the computer can understand. In our case, this is rather easy - we have a parse
function which gives us a reference to the next thing in the input. An open bracket starts a list (meaning, we call the function recursively), a close bracket ends it (meaning, we return 0
to signify there is nothing more in the current list), we put atoms in the string buffer, and once we’ve parsed one item, we also call the function recursively to parse the rest of the list (since “the rest of the list” is technically the same as “a list” without the starting bracket). For convenience, we’ll also add two functions that 1. give us the next byte of input, and 2. skip over whitespace (since for our purposes, (a b)
and ( a b )
are the same thing). In code, this looks as follows:
#define MAX 10000 // length of data
int mem[MAX],lm=0,ls=0; // concell memory, plus variables for keeping track how much we've used
char str[MAX],la=0;FILE*fd=0; // atom memory, lookahead character, and file we read data from
// the following are for convenience
#define car(x) mem[x]
#define cdr(x) mem[x+1]
#define isstr(x) (x>MAX)
char getnext(){return la=getc(fd);}
void white(){while(la==' '||la=='\n'||la=='\t')getnext();}
int cons(int a,int b){ // creates a single concell
int ret=lm;lm+=2;
car(ret)=a;cdr(ret)=b;
return ret;
}
int parse(){
white(); // skip whitespace
if(la==')' || la==EOF){
// if closing bracket (or end of file), the list is over. null-terminate
getnext();
return 0;
}
int ret=lm;lm+=2; // reserve space for the current concell (called ret)
if(la=='('){ // if open bracket, parse a new list
getnext();
car(ret)=parse();
}else{ // otherwise we're dealing with an atom
char*s=str+ls; // figure out where to put it
while(la!=' ' && la!='\n' && la!='(' && la!=')' && la!=EOF && ls<MAX)
*s++=la,getnext();
*s=0; // null-terminate atom
car(ret)=intern();
}
cdr(ret)=parse(); // parse rest of list
return ret;
}
There is one itsy-bitsy detail missing - the intern()
function (towards the end). If we look at the math function from above, (lambda (x y) (- (+ (* x x x) (* y y)) 5))
contains x
four times. However, instead of storing it four times, we could store it once, and have all “instances” of it refer to the same place. This not only saves memory, but it makes the code more efficient - instead of having to compare the values of two pointers (“we have two references to string memory, are the things being referenced the same?”), we can compare the pointers themselves. In practice, this means we put each new string at the end of the string buffer, but without advancing the top of the buffer. We can now use that top of the buffer to refer to the new string, and compare it to all other strings in the buffer. If we find a match, we return a reference to that instead, otherwise we advance the buffer (ie the new atom is now a valid atom that we want to keep around) and return a reference to the new atom. In code:
int intern(){
for(int i=0;i<ls;){ // foreach previous string
char c=str[i++]; // foreach byte of that string
for(int j=0;;j++){ // foreach byte of current
if(c!=str[ls+j])break; // match failed, stop testing
if(!c)return i-j-1+MAX; // end of string still matches, return reference
c=str[i++]; // advance match
}while(str[i++]); // reels to beginning of next string
}
// no match found, intern new atom
int ret=ls; // cache beginning of new item
while(str[ls++]); // update stack pointer
return ret+MAX;
}
Notice how all returns in the above code add MAX
to whatever they are returning. This is done to distinguish the result - an atom - from a reference to a concell, as described previously.
To check if our code actually works, we will implement a print
function. It should take in data and produce output representing what it looks like. Due to the homoiconisity of Lisp (“code and data are represented the same way”), this means that the output should be more or less the same as input, barring spacing and the like.
void print(int a){
if(!a){printf("()");return;} // empty list
if(isstr(a)){printf("%s",str[a-MAX]);return;} // string - untag and print
// if we made it here, argument is a cons (or invalid)
putchar('('); // print opening bracket
while(a){ // while there are elements left
print(car(a)); // print element of list
a=cdr(a); // advance list index
if(a)putchar(' '); // space out elements - if clause makes sure we dont insert a space between the last element and the closing bracket
}putchar(')'); // print closing bracket
}
In practice, I’ve also found it useful to have a separate printmem
utility that prints the actual memory being used (ie a memory dump), which I can then go through by hand to figure out how things are stored and what is happening. Fortunately, this is relatively simple, as we are simply printing the contents of two arrays:
void printmem(){
// foreach in cons memory, print its index and value
for(int i=0;i<lm;i++)printf("%i:%i ",i,mem[i]);
// foreach in string memory
for(int i=0;i<ls;i++){
printf("%i:<",i); // print index and open bracket
// we advance the index manually because null-terminated strings will otherwise mess with parsing
while(str[i])putchar(str[i++]);
printf("> "); // close bracket
}
}
Integer parsing
One slight issue is that we store all atoms as strings. This means that numbers are a list of digits, and we do not know what their values are! This can be alleviated via type tagging, yet we won’t be implementing this yet. Instead, we will implement two functions to convert between digit strings and integers as needed. C already implements those as atoi/itoa
, yet we will be implementing our own, to reduce library use and to make them work with our present datatypes (we’d otherwise have to do some nasty data shuffling to make the library functions cooperate with our atoms).
int latoi(int index){
char*s=str+index-MAX; // the atomwe will be converting
char inv=0; // whether there was a minus sign - functonally a boolean
int ret=0; // the result
if(*s=='-'){inv=1;s++;} // if the first character is a minus sign, set flag
while('0'<=*s && *s<='9') // while the characters are valid digits
ret=ret*10+(*s++)-'0'; // multiply result by 10 and add current digit
// current digit is the current character minus the character for '0'
if(inv)ret*=-1; // if negative, match sign
return ret;
}
int litoa(int x){
char*s=str+ls; // figure out where to put new atom
if(x<0)*s++='-',x*=-1; // inverse of latoi case for the sign
if(!x){*s++='0';*s=0;return intern();} // if x==0, we can simplify a lot
while(x)*s++=(x%10)+'0',x/=10; // inverse of latoi case
*s=0; // null-terminate
// oh noes! our string is actually in reverse right now!
// to fix this, we can simply swap the order of digits
char i=s-(str-ls); // length of new number
if(str[ls]=='-')i--; // skip minus sign if needed
for(int j=(str[ls]=='-');j<i/2;j++){ // for half the digits
// swap with the other half (via intermediary var)
char swap=str[ls+j];
str[ls+j]=str[ls+i-j-1];
str[ls+i-j-1]=swap;
}return intern(); // dont forget to call intern to make sure we reuse references if possible!
}
Bindings
Before we continue, there’s one somewhat more theoretical concept we have to cover. Lisp is descendent from lambda calculus, which did not really have variables in the modern sense. It did have “function application”, which did technically use them, but instead of dealing with “values”, it simply substituted whatever the variable was supposed to be into the rest of the expression. This usually isn’t really that useful in practice (for a counterexample, see John Tromp’s binary lambda calculus), so we will use variables with values - specifically, closure-based dynamic scope. In other words, we will keep a list of all variable names and their values (the “dynamic” part - we only look up the value when we need it), and when a function binds new variables, we make a new list with those bindings (called a closure) and attach the global list to it.
XXX how closures are attached to lambdas
A small aside here - we create closures to keep track of what the variables in a function are - so how do we do the global one? Certain implementations (eg the metacircular one mentioned way above) don’t really have global scope, as the program itself is running inside another lisper, but that’s not what we are doing. Instead, we will use the defun
macro whose only purpose is to bind variables in the global scope.
Evaluation
Finally, we made it! The eval
function, the core of any Lisper. As input it takes an expression (and an environment) and returns the value of that expression. It’s a bit long and complicated, so first, an overview.
Atoms will be looked up in the environment, and if not found, will evaluate to themselves. We can go through the environment with a pair of for-loops, the comparison is easy enough, as intern
makes sure two atoms also have the same reference (meaning, we dont have to compare every single character of the atoms, but can simply check if their references point to the same place).
If the argument is a list, it is a function application, and the first element of the list is the function. First, a small loop that figures out what that function is - if we have (f 2 3)
, we are not trying to apply the atom f
, we want the function that is bound to that name. Then, we make a new envronment where we bind the arguments provided to the variable names listed in the lambda declaration, and evaluate the body of the lambda in this new environment.
Here again, we have a distinction - is it a “primitive” function, ie one we implement in the interpreter, or a user-defined one? This is easy - primitive functions are atoms and we can case-check them, user defined functions are lists and are evaluated with a closure and another evaluation. More interestingly, some primitive functions (quote defun lambda cond
) do not evaluate all of their arguments, while all the others do. This is also rather easy to handle - we split the case-checking into two parts. First, we check whether the function is one of the four special cases, and if so handle it appropriately. Then we evaluate all of the arguments to the function, and proceed with checking all other commands. Some implementations (most notably PicoLisp listed above) allow for user-defined functions with unevaluated arguments, however we will not be doing this - it requires a lot of complexity in the parser, and we can accomplish the same effect with some generous application of quote
.
All together, it looks as follows:
int eval(int exp,int env){
if(!exp)return 0;
if(isstr(exp)){ // if expression is atomic
for(int cloj=env;cloj;cloj=cdr(cloj)) // foreach closure
for(int bind=car(cloj);bind;bind=cdr(bind)) // foreach binding in current closure
if(car(car(bind))==exp) // if name of binding is equal to current expression
return cdr(car(bind)); // return value of binding
return exp; // if no binding found, return self
}
// otherwise expression is a list
int op=car(exp); // and its first element is the function
for(;;){
int prev=op;
op=eval(op,env);
if(!isstr(op))break; // op is list, ie lambda application
if(op==prev)break; // atom evaluates to itself, ie builtin
}
if(isstr(op)){ // builtin
switch(op-MAX){ // remove tag
case 45:return cdr(exp);
case 6: // defun - we modify the environment
car(env)=cons(
cons(
car(cdr(exp)),
eval(car(cdr(cdr(exp))),env)
),car(env)
);return car(cdr(exp)); // return name of new binding, for convenience
case 12: // lambda
if(!cdr(cdr(cdr(exp)))){ // if the lambda doesnt have a closure already
return cons(
op,cons(
car(cdr(exp)),
cons(car(cdr(cdr(exp))),env)
));
}return exp;
case 19: // cond
for(int cond=cdr(exp);cond;cond=cdr(cond)) // foreach item in list
if(eval(car(car(cond)),env)) // if its car evaluates to true (ie not nil)
return eval(car(cdr(car(cond))),env); // return its cdr
return 0; // if none of the conditions evaluated to true, return nil
}
// all other builtins evaluate their arguments, so lets do that
int args=cdr(exp),res; // res also reused for arithmetics
if(args){args=lm; // XXX manual memory management - bad!
int prev=0;
for(int t=cdr(exp);t;t=cdr(t)){
res=cons(0,0); // reserve concell
if(prev)cdr(prev)=res; // append to result list
prev=res; // update result list
car(res)=eval(car(t),env); // fill out reserved concell
}
}
switch(op-MAX){
case 51:print(args,env);return args; // print - nonstandard
case 57:printmem();return 0; // mem - nonstandard
case 40:return cons(car(args),car(cdr(args))); // cons
case 32:return car(car(args)); // car
case 36:return cdr(car(args)); // cdr
case 24:return isstr(car(args))?MAX+4:0; // atom
case 29:return car(args)==car(cdr(args))?MAX+4:0; // eq
// arithmetics
case 61: // +
for(res=0;args;args=cdr(args))
res+=latoi(car(args));
return litoa(res);
case 65: // *
for(res=1;args;args=cdr(args))
res*=latoi(car(args));
return litoa(res);
case 63: // -
res=latoi(car(args)
for(args=cdr(args);args;args=cdr(args))
res-=latoi(car(args));
return litoa(res);
case 67: // /
res=latoi(car(args)
for(args=cdr(args);args;args=cdr(args))
res/=latoi(car(args));
return litoa(res);
case 69: // %
res=latoi(car(args)
for(args=cdr(args);args;args=cdr(args))
res%=latoi(car(args));
return litoa(res);
}return 0; // if op wasnt builtin, return nil - not ideal, but makes sure nothing breaks
}else{ // lambda application
// make closure
int cloj=0,names=car(cdr(op)),vals=cdr(exp);
for(;names;names=cdr(names),vals=cdr(vals)) // foreach variable name and lambda argument XXX breaks if not enough vals
cloj=cons(cons(car(names),eval(car(vals),env)),cloj); // bind value of argument to variable name
cloj=cons(clj,cdr(cdr(cdr(op)))); // bind new closure to parent environment
return eval(car(cdr(cdr(op))),cloj); // evaluate body of lambda in new environment
}
}
Init
Of course, for the “magic numbers” in the case checks above to work, the atomic memory has to be preloaded with the names for the built-in commands. For this, we will use a simple function to initialise memory as follows:
init(){
const char initmem[]="nil\0t\0defun\0lambda\0cond\0atom\0"\
"eq\0car\0cdr\0cons\0quote\0print\0mem\0+\0-\0*\0/\0%\0";
lm=2;ls=71; // length of initmem
for(int i=0;i<MAX;i++)mem[i]=str[i]=0;
for(int i=0;i<ls;i++)str[i]=initmem[i];
}
Interface
Now, we have pretty much everything we need to make our lisp interpreter work. The only thing missing is the main
function to put it all together. We’ll do the good old repl loop, where we fetch a line of input, process (evaluate) it, and output the result. For convenience, we’ll allow the user to also specify a file to be used as input - this could in theory also be done on the command line as ./lisp < infile
, but doing it here means one less thing to worry about. Since the evaluation needs an environment, we’ll also have to initialise it here:
int main(int argc,char**argv){
init();
int env=cons(0,0),data; // empty environment
fd=argc>1?fopen(argv[1],"r"):stdin; // if we supplied a file, use it as input, else use stdin
while(1){
if(feof(fd))break; // stop processing if end of file reached
ig(argc==1)printf("> "); // print prompt if were doing stdin
getnext();white();getnext(); // skip opening bracket (since parse already processes a list) and any leading whitespace
print(eval(parse(),env)); // the repl itself, in all its glory
}fclose(fd); // clean-up
}
XXX “and we now have a functioning lisper!” or smth catchy like that idgaf
Functional programming
Now that we have a functioning lisper, what do we do with it? Well, we could of course use it for math, as we derived from the original examples:
> (* 2 (+ 4 5))
18
> (- 100 (* 2 3 4))
76
We could do conditionals:
> (cond ((eq 2 2) a) (t b))
a
> (cond ((eq 2 3) a) (t b))
b
We could do in-place lambdas:
> ((lambda (x y) (+ (* x x)(* y y))) 2 3)
13
But of course, this is more convenient if we use defun
to instead bind those functions to a name:
> (defun sqdist (lambda (x y) (+ (* x x)(* y y))))
sqdist
> (sqdist 2 3)
13
We could combine those with conditionals:
> (defun iszero (lambda (x) (cond ((eq x 0) t) (t ()))))
iszero
> (iszero 15)
()
> (iszero 0)
t
However, the really neat part begins when you realise that, because of the dynamic binding of variables, we can use functions from within those functions themselves:
> (defun fac (lambda (x)
(cond
((eq x 0) 1)
(t (* x (fac (- x 1))))
)))
fac
> (fac 3)
6
> (fac 5)
120
And there you go: a recursively defined factorial function, which uses itself to evaluate its arguments! In the case of (fac 5)
, it calculates 5*4*3*2*1 = 120
, yet instead of listing all those numbers, it calls itself for increasingly smaller arguments (5! = 5*(4!)
etc)
Now, to do something a bit different. The sqdist
function above calculates the (squared) magnitude of a two-dimensional vector accodring to the Pythagorean theorem (squared, because we don’t have a square root routine). However, it only does so for two-dimensional vectors - ie it only works for x
and y
, no more nor less arguments. Could we use the recursive trick to make that happen for arbitrarily many arguments? Of course we can! The trick is to give the arguments to the function as a list (instead of individually), and to then process that list a single element at a time, with the car
of that list (ie its first element) being the element processed, and its cdr
being the rest passed on to the recursive call:
> (defun sqdist (lambda (x)
(cond
((eq x ()) 0)
(t (+ (* (car x) (car x)) (sqdist (cdr x))))
)))
sqdist
> (sqdist (quote 2))
4
> (sqdist (quote 3 4 5))
50
Hurray! The function works! The quote
is required so that the arguments to the function are passed as a list and not evaluated (I mean, they do get evaluated, except that the function applied is quote, which then skips the evaluation… nvm). Here, I copied a small convention from PicoLisp wherein quote
returns all its arguments unevaluated instead of just the first one - in pretty much all other lisps, we’d have to have used (sqdist (quote (3 4 5)))
instead (well, we would have used (sqdist '(3 4 5))
since they define a shorthand for the quote function, but would have complicated our interpreter a lot).
Closures
An interesting side effect of how we defined environments (closures) and the lambda
to work is that we can store variables inside functions (in a process known as currying, or more specifically, partial application):
> (defun curryadd (lambda (x) (lambda (y) (+ x y))))
curryadd
> ((curryadd 2) 3)
5
> (defun addfive (curryadd 5))
addfive
> (addfive 15)
20
Like a surprising number of other things in Lisp, this “fun little trick” ended up founding its own extremely elaborate (and nowadays, incredibly obnixious and counterproductive) discipline of “object-oriented programming” . It meant we could have a function (the “constructor”) which takes arguments and puts them into another function (the “object”) which we can store and query (originally by “passing messages” ie calling it with arguments describing what we wanted it to do, although this concept is lost on modern software - however, Smalltalk got very far with this), and thus represent abstract, composite datatypes!
And yes I spent an hour writing an example for this and I’m already over a kilobyte and it still doesn’t really make sense, so imagine something really cool but slightly contrived that can easily get misinterpreted by techbros here.
Integration
and yeah turns out you can use that idea to make it interact w/ outside stuff in more complex manners memmapping also exists, tho there you run into the issue that its now an imperative language
functional programming can prolly reuse a lot of the stuff from the original page basic math, defun functional programming closures oop integration (how to make it interact w/ outside stuff)
vm bytecode and compilation?
XXX should prolly explain what each of the builtin does somewhere
Optimisations
comments garbage collector tagging binary numbers
(written 070325, 110325, 160325)