%% Oh Emacs, this is a -*- Makefile -*-, so give me tabs. \documentclass{article} \usepackage{axiom} \title{\File{src/boot/Makefile} Pamphlet} \author{Timothy Daly \and Gabriel Dos~Reis} \begin{document} \maketitle \begin{abstract} \Tool{Axiom} is built in layers. The first layer is contructed into an image called {\bf bootsys}. The \Tool{bootsys} image is used to translate Boot code to Common Lisp code. Since a Boot coded interpreter is needed to translate the code for the Boot coded interpreter we have a ``boot-strapping'' problem. In order to get the whole process to start we need certain files kept in common lisp form. This directory contains those files. \end{abstract} \eject \tableofcontents \eject \section{Introduction} \label{sec:intro} The Scratchpad language is implemented by using a mixture of Lisp and a more convenient language for writing Lisp called \emph{Boot}. This document contains a description of the Boot language, and some details of the resulting Lisp programs. The description of the translation functions available are at the end of this file. The main difference between Lisp and Boot is in the syntax for the application of a function to its argument. The Lisp format [[(F X Y Z)]], means, when [[F]] is a function, the application of [[F]] to its arguments [[X]], [[Y]], and [[Z]], is written in Boot as [[F(X,Y,Z)]]. When [[F]] is a special Lisp word it will be written in Boot by using some other syntactic construction, such as spelling in CAPITAL LETTERS. Boot contains an easy method of writing expressions that denote lists, and provides an analogous method of writing patterns containing variables and constants which denote a particular class of lists. The pattern is matched against a particular list at run time, and if the list belongs to the class then its variables will take on the values of components of the list. Similarly, Boot provides an easy way of writting discriminated unions or algebraic types, and pattern matching as found in ML. A second convenient feature provided by Boot is a method of writing programs that iterate over the elements of one or more lists and which either transform the state of the machine, or produce some object from the list or lists. \section{Boot To Common Lisp Translaters} \label{sec:boot-to-cl} The Boot to Common Lisp translation is organized in several separate logical phases. At the moment, those phases are not really separate; but from a logical point of view, it is better to think of them that way. \subsection{The Boot Includer} \label{sec:boot-to-cl:includer} The Boot Includer is the module that reads Boot codes from source files. The details of the Includer, as well as the grammar of the include files are to be found in \File{includer.boot} \subsection{The Scanner} \label{sec:boot-to-cl:scanner} The tokenization process is implemented in \File{scanner.boot}. Further details about keywords and reserved identifiers are available in \File{tokens.boot}. \subsection{Piling} \label{sec:boot-to-cl:piling} The Boot language uses layout to delimit blocks of expressions. After the scanner pass, and before the parser pass is another pass called \emph{piling}. The piling pass inserts tokens to unambiguously delimit the boundaries of piles. This is implemented in \File{pile.boot} \subsection{The Parser} \label{sec:boot-to-cl:piling} The Boot parser is implemented in \File{parser.boot}. It is a hand-written recursive descent parser based on \emph{parser combinators} methodology. Thoe files also implicitly defines the grammar of the Boot language. \subsection{The Transformer} \label{sec:boot-to-cl:transfo} As observed earlier, the Boot language was originally defined as a syntactic sugar over Common Lisp. Consequently, it semantics is defined by tranformation to Lisp. The transformers are defined in \File{ast.boot}. \subsection{Utils} \label{sec:boot-to-cl:utils} Finally, the file \File{translator.boot} is a pot-pourri of many utility functions. It also contains the entry points to the Boot translater. \section{Boot} \label{sec:boot} \subsection{Lines and Commands} If the first character of a line is a closing parenthesis the line is treated as a command which controls the lines that will be passed to the translater rather than being passed itself. The command [[)include filename]] filemodifier will for example be replaced by the lines in the file [[filename filemodifier]]. If a line starts with a closing parenthesis it will be called a command line, otherwise it will be called a plain line. The command lines are \begin{verbatim} name as written Include )include filename filemodifier IncludeLisp )includelisp filename filemodifier If )if bootexpression Else )else ElseIf )elseif bootexpression EndIf )endif Fin )fin Say )say string Eval )eval bootexpression EvalStrings )evalstrings bootexpression Package )package packagename SimpleLine::= PlainLine | Include | IncludeLisp |Say | Eval | EvalStrings | Package \end{verbatim} A [[PlainLine]] is delivered to the translater as is. An [[Include]] delivers the lines in the file filename.filemodifier, treated as boot lines. An [[IncludeLisp]] delivers the lines in the specified file, treated as Lisp lines. The only comments allowed in lisp files that are included in this way require that the semicolon is at the beginning of the line. A [[Say]] outputs the remainder of the line to the console, delivering nothing to the translater. An [[Eval]] translates the reminder of the line, assumed to be written in Boot, to Lisp, and evaluates it, delivering nothing to the translater. An [[EvalStrings]] also translates and evaluates the rest of the line but this time assumes that the Boot expression denotes a list of strings which are then delivered to the translater instead of the EvalString line. The strings are treated as Boot lines. It is also possible to include or exclude lines based upon some condition which is the result of translating and evaluating the boot expression that follows an )if or )elseif command. This construction will be called a Conditional. A file will be composed from SimpleLines and Conditionals. A file is either terminated by the end of file or by a Fin line. \begin{verbatim} Components ::=(SimpleLine | Conditional)* File ::= Components ( Fin | empty) A conditional is bracketed by an If and an EndIf. Conditional ::= If Components Elselines EndIf \end{verbatim} If the boot expression following the )if has value true then the Components are delivered but not the ElseLines, otherwise the Components are ignored ,and the ElseLines are delivered to the translater. In any case the lines after the EndIf are then processed. \begin{verbatim} ElseLines ::= Else Components | ElseIf Components ElseLines | empty \end{verbatim} When the Elselines of a Conditional is being included then if an "Else Components" phrase is encountered then the following Components are included otherwise if an "ElseIf Components ElseLines" phrase is encountered then the boot expression following the )elseif is evaluated and if true the following Components are included, if false the following ElseLines is included. \subsection{Boot syntax and semantics} The semantics of Boot was originally defined by translation to Lisp. Ideally, we would like to give it a self-contained semantics, without explicitly referring to Lisp, or if we must we should use lambda calculus. \subsubsection{Source character set} \label{sec:boot:char-set} ???What is the source character set??? That of Common Lisp? \subsubsection{Identifiers} \label{sec:boot:identifier} The standard identifiers start with a letter ([[a-z]] or [[A-Z]]) dollar sign ([[$]]), question mark ([[?]]), or the percent sign ([[\%]]), and are followed by any number of letters, digits, single quotes([[']]), question marks, or percent signs. It is possible however, by using the escape character ([[\_]]), to construct identifiers that contain any characters except the blank or newline character. The rules in this case are that an escape character followed by any non-blank character will start an identifier with that character. Once an identifier has been started either in this way or by a letter, [[$]], or [[%]], then it may be continued either with a letter, digit, quote , question mark or percent sign, or with an escape character followed by any non-blank character. Certain words having the form of identifiers are not classified as such, but are reserved words. They are listed below. An identifier ends when a blank or end of line is encountered, or an escape character followed by a blank or end of line, or a character which is not a letter, digit, quote, question mark or percent sign is found. Two identifiers are equal if the strings produced by replacing each escape followed by a character by that character are equal character by character. \subsubsection{Numbers} \label{sec:boot:number} Integers start with a digit ([[0-9]]) and are followed by any number of digits. The syntax for floating point numbers is \begin{verbatim} <.I | I. | I.I> <+ | - | empty> I \end{verbatim} where I is an integer. \subsubsection{Strings} \label{sec:boot:string} Strings of characters are enclosed by double quote signs. They cannot span two or more lines and an escape character within a string will include the next character regardless of its nature. The meaning of a string depends somewhat on the context in which it is found, but in general a bare string denotes the interned atom making up its body whereas when it is preceded by a single quote (') it denotes the string of characters enclosed. \subsubsection{S-expressions} \label{sec:boot:s-expression} An s-expression is preceded by a single quote and is followed by a Lisp s-expression. \begin{verbatim} sexpression ::=identifier | integer | MINUS integer | float | string | QUOTE sexpression | parenthesized sexpression1 sexpression1 ::=sexpression (DOT sexpression | sexpression1)| empty \end{verbatim} There are two ways to quote an iddentifier: either 'name or "name", which both give rise to (QUOTE name). However a string that is a component of an sexpression will denote the string unless it is the sole component of the s-expression in which case it denotes a string i.e. '"name" gives rise to "name" in Lisp rather than (QUOTE "name"). \subsubsection{Keywords} \label{sec:boot:keyword} The table of key words follows, each is given an upper case name for use in the description of the syntax. \begin{verbatim} as written name and AND by BY case CASE cross CROSS else ELSE for FOR if IF in IN is IS isnt ISNT of OF or OR repeat REPEAT return RETURN structure STRUCTURE then THEN until UNTIL where WHERE while WHILE . DOT : COLON , COMMA ; SEMICOLON * TIMES ** POWER / SLASH + PLUS - MINUS < LT > GT <= LE >= GE = SHOEEQ ^ NOT ^= NE .. SEG # LENGTH => EXIT := BEC == DEF ==> MDEF ( OPAREN ) CPAREN (| OBRACK |) CBRACK [ OBRACK ] CBRACK suchthat BAR ' QUOTE | BAR \end{verbatim} \subsubsection{Primary} \label{sec:boot:primar-expr} \begin{verbatim} constant::= integer | string | float | sexpression \end{verbatim} The value of a constant does not depend on the context in which it is found. \begin{verbatim} primary::= name | constant | construct | block | tuple | pile \end{verbatim} The primaries are the simplest constituents of the language and either denote some object or perform some transformation of the machine state, or both. The statements are the largest constituents and enclosing them in parentheses converts them into a primary. An alternative method of grouping uses indentation to indicate the parenthetical structure. A number of lines whose first non-space characters are in the same column will be called a \emph{pile}. The translater first tokenizes the lines producing identifier, key word, integer, string or float tokens, and then examines the pile structure of a Boot program in order to add additional tokens called [[SETTAB]], [[BACKTAB]] and [[BACKSET]]. These tokens may be considered as commands for creating a pile. The [[SETTAB]] starts a new line indented from the previous line and pushes the resulting column number on to a stack of tab positions. The [[BACKTAB]] will start a new line at the column position found at the head of the stack and removes it from the stack. The [[BACKSET]] has the same effect as a [[BACKTAB]] immediately followed by a [[SETTAB]]. The meaning of a sequence of tokens containing [[SETTAB]], [[BACKTAB]], and [[BACKSET]] is the same the sequence in which each [[SETTAB]] is replaced by [[OPAREN]] , each [[BACKTAB]] is replaced by [[CPAREN]], and each [[BACKSET]] is replaced by [[SEMICOLON]]. By construction the [[BACKTABS]] and [[SETTABS]] are properly nested. \begin{verbatim} listof(p,s)== p | p s ... s p parenthesized s ::= OPAREN s CPAREN piled s ::= SETTAB s BACKTAB blockof s ::= parenthesized (listof (s,SEMICOLON)) pileof s ::= piled (listof (s,BACKSET )) \end{verbatim} A pileof s has the same meaning as a blockof s. There is however a slight difference because piling is weaker than separation by semicolons. In other words the pile items may be listof(s,SEMICOLON). In other words if statements::= listof(statement,SEMICOLON) then we can have a pileof statements which has the same meaning as the flattened sequence formed by replacing all [[BACKSET]]'s by [[SEMICOLON]]'s. A blockof statement is translated to a compound statement e.g. in the absence of any exits, (a;b;c;d) is translated to (PROGN a b c d). \subsubsection{Selectors} \label{sec:boot:selector} \begin{verbatim} selector::= leftassociative(primary, DOT) \end{verbatim} A selector [[a.b]] denotes some component of a structure, and in general is translated to [[(ELT a b)]]. There are some special identifiers that may be used in the [[b]] position to denote list components, of which more later. The [[DOT]] has a greater precedence than juxtaposition and is left associative, For example \begin{verbatim} a.b.c is grouped as (a.b).c which is translated to (ELT (ELT a b) c) application ::= selector selector ... selector \end{verbatim} Application of function to argument is denoted by juxtaposition. A sequence of selectors is right associative and so [[f g h x]] is grouped as [[f(g(h x))]]. The applications [[f x]] and [[f(x)]] mean the application of [[f]] to [[x]] and get translated to the Lisp [[(f x)]]. The application of a function to the empty list is written [[f()]], meaning the Lisp [[(f)]]. [[f(x,y,z)]] gets translated to the Lisp [[(f x y z)]]. Common Lisp does not permit a variable to occur in operator position, so that when f is a variable its application has to be put in argument position of a [[FUNCALL]] or [[APPLY]]. [[f(x,y,z)]] has to be replaced by [[FUNCALL(f,x,y)]] which gets translated to the Lisp [[(FUNCALL f x y z)]]. In Common Lisp each symbol might refer to two objects a function and a non-function. In order to resolve this ambiguity when a function symbol appears in a context other than operator position it has to be preceded by the symbol [[FUNCTION]]. Also it is possible to produce the function type symbol from the non-function symbol by applying [[SYMBOL-FUNCTION]] to it. Certain reserved words called infixed operators namely [[POWER]], [[TIMES]], [[SLASH]], [[PLUS]], [[MINUS]], [[IS]], [[EQ]], [[NE]] , [[GT]], [[GE]], [[LT]], [[LE]], [[IN]], [[AND]], [[OR]], indicate application by being placed between their 2 arguments. Infixed application may be either right- or left-associative. \begin{verbatim} rightassociative(p,o)::= p o p o p o ... o p == p o (p o (p o ... o p))) leftassociative(p,o)::= p o p o p o ... o p == (((p o p) o p) o ...) o p exponent ::= rightassociative(application,POWER) reduction ::= (infixedoperator |string | thetaname) SLASH application \end{verbatim} In a reduction the application denotes a list of items and operator [[SLASH]] application accumulates the list elements from the left using the operator \begin{verbatim} e.g. +/[a,b,c] means (((0+a)+b)+c) \end{verbatim} Only certain operators are provided with values when the list is empty they are [[and]], [[or]], [[+]], [[*]], [[max]], [[min]], [[append]], [[union]]. However any function can be used as an operator by enclosing it in double quotes. In this case the reduction is not applicable to an empty list. \begin{verbatim} multiplication ::= rightassociative(exponent,TIMES|SLASH) | reduction minus ::= MINUS multiplication | multiplication arith ::= leftasscociative(minus,PLUS | MINUS) is ::= arith | arith (IS | ISNT) pattern comparison ::= is (EQ | NE | GT | GE | LT | LE | IN) is | is and ::= leftassociative (comparison,AND) return ::= and | RETURN and expression ::= leftassociative(return,OR) \end{verbatim} The infixed operators denote application of the function to its two arguments. To summarize, the infixed operators are, in order of decreasing precedence strengths. \begin{verbatim} . juxtaposition ** * / + - is = ^= > >= < <= in and or \end{verbatim} \subsubsection{Conditionals} \label{sec:boot:conditional} \begin{verbatim} conditional ::= IF where THEN where | IF where THEN where ELSE where IF a THEN b is translated to (COND (a b)) and IF a THEN b else c is translated to (COND (a b) (T c)) statement::= conditional | loop | expression \end{verbatim} \subsubsection{Loops} \label{sec:boot:iteration} \begin{verbatim} loop ::= crossproduct REPEAT statement | REPEAT statement iterator ::= forin | suchthat | until | while iterators ::= iterator iterator ... iterator crossproduct ::=rightassociative(iterators,CROSS) suchthat ::= BAR where while ::= WHILE expression until ::= UNTIL expression forin ::= for variable IN segment | for variable IN segment BY arith segment::= arith | arith SEG arith | arith SEG \end{verbatim} A loop performs an iterated transformation of the state which is specified by its statement component and its iterators. The forin construction introduces a new variable which is assigned the elements of the list which is the value of the segment in the order in which they appear in the list . A segment of the form [[arith]] denotes a list, and segments of the form [[arith SEG arith]] and [[arith SEG]] denote terminating and non-terminating arithmetic progressions. The [[BY arith]] option is the step size, if omitted the step is [[1]]. Two or more [[forin]]'s may control a loop. The associated lists are scanned in parallel and a variable of one [[forin]] may not appear in the segment expression that denotes the list in a second [[forin]]. Such a variable may however occur in the conditions for filtering or introduced by a [[suchthat]], or for termination introduced by a while iterator, and in the statement of the loop. The [[forin]] variables are local to the statement, the conditions that follow a [[while]] or [[suchthat]] in the same list of iterators and have no meaning outside them. The loop will be terminated when one of its [[forin]] lists is null, or if the condition in a [[while]] is not satisfied. The list elements are filtered by all the [[suchthat]] conditions. The ordering of the iterators is irrelevant to the meaning, so it is best to avoid side effects within the conditions for filtering and termination. It is possible to control a loop by using a \emph{cross-product} of iterators. The iteration in the case [[iterators1 CROSS iterators2]] is over all pairs of list items one from the list denoted by iterators1 and the other from the list denoted by iterators2. In this case the variables introduced [[forin]] statements in [[iterators1]] may be used in [[iterators2]]. \subsubsection{Lists} \label{sec:boot:list} Boot contains a simple way of specifying lists that are constructed by [[CONS]] and [[APPEND]], or by transforming one list to another in a systematic manner. \begin{verbatim} construct ::= OBRACK construction CBRACK construction ::= comma | comma iteratortail iteratortail ::= REPEAT iterators | iterators \end{verbatim} A construct expression denotes a list and may also have a list of controlling iterators having the same syntax as a loop. In this case the expression is enclosed in brackets and the iterators follow the expression they qualify, rather than preceding it. In the case that there are no iterators the construct expression denotes a list by listing its components separated by commas, or by a comma followed by a colon. In the simple case in which there are no colons the Boot expression [a,b,c,d] translates to the Lisp [[(LIST a b c d)]] or [[(CONS a (CONS b (CONS c (CONS d NIL))))]]. When elements are separated by comma colon, however, the expression that follows will be assumed to denote a list which will be appended to the following list, rather than consed. An exception to this rule is that a colon preceding the last expression is translated to the expression itself. If it immediately preceded by a CONS then it need not denote a list. For example: \begin{verbatim} [] is translated to the empty list NIL [a] is translated to the 1-list (LIST a) or (CONS a NIL) [:a] is translated to a [a,b] is translated to the 2-list (LIST a b) or (CONS a (CONS b NIL)) [:a,b] is translated to (APPEND a (CONS b NIL)) [a,:b] is translated to (CONS a b) [:a,:b] is translated to (APPEND a b) [:a,b,c] is translated to (APPEND a (CONS b (CONS c NIL))) [a,:b,c] is translated to (CONS a (APPEND b (CONS c NIL))) [a,b,:c] is translated to (CONS a (CONS b c)) \end{verbatim} If the construct expression has iterators that control the production of the list the resulting list depends on the form of the comma expression. i.e. \begin{verbatim} construction ::= comma iteratortail \end{verbatim} If the comma expression is recognised as denoting a list by either preceding it by a colon, or having commas at top level as above, then the successive values are appended. If not then the successive values are consed. e.g. \begin{verbatim} [f i for i in x] denotes the list formed by applying f to each member of the list x. [:f i for i in 0..n] denotes the list formed by appending the lists f i for each i in 0..n. \end{verbatim} \subsubsection{Patterns} \label{sec:boot:pattern} \begin{verbatim} is ::= arith | arith IS pattern \end{verbatim} The pattern in the proposition [[arith IS pattern]] has the same form as the construct phrase without iterators. In this case, however it denotes a class of lists rather than a list, and is composed from identifiers rather than expressions. The proposition is translated into a program that tests whether the arith expression denotes a list that belongs to the class. If it does then the value of the is expression is true and the identifiers in the pattern are assigned the values of the corresponding components of the list. If the list does not match the pattern the value of the is expression is false and the values of the identifier might be changed in some unknown way that reflects the partial success of the matching. Because of this uncertainty, it is advisable to use the variables in a pattern as new definitions rather than assigning to variables that are defined elsewhere. \begin{verbatim} pattern::= identifier | constant | [ patternlist ] \end{verbatim} The value of [[arith IS identifier]] is [[true]] and the value of [[arith]] is assigned to the [[identifier]]. [[(PROGN (SETQ identifier arith) T)]] The expression [[arith IS constant]] is translated to [[(EQUAL constant arith)]]. The expression arith [[IS [ pattenlist ] ]] produces a program which tests whether arith denotes a list of the right length and that each patternitem matches the corresponding list component. \begin{verbatim} patternitem ::= EQ application | DOT | pattern | name := pattern \end{verbatim} If the [[patternitem]] is [[EQ application]] then the value is true if the component is [[EQUAL]] to the value of the application expression. If the [[patternitem]] is [[DOT]] then the value is [[true]] regardless of the nature of the component. It is used as a place-holder to test whether the component exists. If the patternitem is pattern then the component is matched against the pattern as above. If the [[patternitem]] is [[name:=pattern]] then the component is matched against the pattern as above, and if the value is [[true]] the component is assigned to the name. This last provision enables both a component and its components to be given names. \begin{verbatim} patternlist ::= listof(patternitem,COMMA)| listof(patternitem,COMMA) COMMA patterntail patterntail patterncolon ::= COLON patternitem patterntail ::= patterncolon | patterncolon COMMA listof(patternitem,COMMA) \end{verbatim} The [[patternlist]] may contain one colon to indicate that the following patternitem can match a list of any length. In this case the matching rule is to construct the expression with [[CONS]] and [[APPEND]] from the pattern as shown above and then test whether the list can be constructed in this way, and if so deduce the components and assign them to identifiers. The effect of a pattern that occurs as a variable in a for iterator is to filter the list by the pattern. \begin{verbatim} forin ::= for pattern IN segment \end{verbatim} is translated to two iterators \begin{verbatim} for g IN segment | g IS pattern \end{verbatim} where [[g]] is an invented identifier. \begin{verbatim} forin ::= for (name:=pattern) IN segment \end{verbatim} is translated to two iterators \begin{verbatim} for name IN segment BAR name IS pattern \end{verbatim} in order to both filter the list elements, and name both elements and their components. \subsubsection{Assignments} \label{sec:boot:assignment} A pattern may also occur on the left hand side of an assignment statement, and has a slightly different meaning. The purpose in this case is to give names to the components of the list which is the value of the right hand side. In this case no checking is done that the list matches the pattern precisely and the only effect is to construct the selectors that correspond to the identifiers in the pattern, apply them to the value of the right hand side and assign the selected components to the corresponding identifiers. The effect of applying [[CAR]] or [[CDR]] to arguments to which they are not applicable will depend on the underlying Lisp system. \begin{verbatim} assignment::= assignvariable BECOMES assignment| statement assignvariable := OBRACK patternlist CBRACK | assignlhs \end{verbatim} The assignment having a pattern as its left hand side is reduced as explained above to one or more assignments having an identifier on the left hand side. The meaning of the assignment depends on whether the identifier starts with a dollar sign or not, if it is and whether it is followed by [[:local]] or [[:fluid]]. If the identifier does not start with a dollar sign it is treated as local to the body of the function in which it occurs, and if it is not already an argument of the function, a declaration to that effect is added to the Lisp code by adding a [[PROG]] construction at top level within the body of the function definition. Note also the all local variables and fluid variables are treated this way, resulting in initialization to [[nil]] before execution of the body of the function. Consequently care must be exercised when assigning to Lisp special global variables. If you do not want that implicitly initialization to [[nil]], then use the explicit [[SETQ]] Lisp special form in an application syntax. If such an identifier assignment does not occur in the body of a function but in a top level expression then it is also treated as a local. The sole exception to this rule is when the top level expression is an assignment to an identifier in which case it is treated as global. If the left hand side of an assignment is an identifier that starts with a dollar sign it will not be classified as a local but will be treated as non-local. If it is also followed by [[:local]] then it will be treated as a declaration of a [[FLUID]] (VMLisp) or [[SPECIAL]] variable (Common Lisp) which will be given an initial value which is the value of the right hand side of the assignment statement. The [[FLUID]] or [[SPECIAL]] variables may be referred to or assigned to by functions that are applied in the body of the declaration. If the left hand side of an assignment statement is an identifier that does not start with a dollar sign followed by [[:local]] then it will also be treated as a [[FLUID]] or [[SPECIAL]] declaration, however it may only be assigned to in the body of the function in which the assignment it occurs. \begin{verbatim} assignment::= assignvariable BECOMES assignment | statement assignvariable := OBRACK patternlist CBRACK | assignlhs assignlhs::= name | name COLON local | name DOT primary DOT ... DOT primary \end{verbatim} If the left hand side of an assignment has the form \begin{verbatim} name DOT primary DOT ... DOT primary \end{verbatim} the assignment statement will denote an updating of some component of the value of name. In general [[name DOT primary := statement]] will get translated to [[(SETELT name primary statement)]] or [[(SETF (ELT name primary) statement)]] There are however certain identifiers that denote components of a list which will get translated to statements that update that component (see appendix) e.g. \begin{verbatim} a.car:=b is translated to (SETF (CAR a) b) in Common Lisp. \end{verbatim} The iterated [[DOT]] is used to update components of components and e.g \begin{verbatim} a.b.c:=d is translated to (SETF (ELT (ELT a b)c) d) exit::= assignment | assignment EXIT where \end{verbatim} The exit format [[assignment EXIT where]] is used to give a value to a blockof or pileof statements in which it occurs at top level. The expression \begin{verbatim} (a =>b;c) will be translated to if a then b else c or (COND (a b) (T c)) \end{verbatim} If the exit is not a component of a blockof or pileof statements then \begin{verbatim} a=>b will be translated to (COND (a b)) \end{verbatim} \subsubsection{Definitions} Functions may be defined using the syntax \begin{verbatim} functiondefinition::= name DEF where | name variable DEF where variable ::= parenthesized variablelist | pattern variableitem ::= name| pattern | name BECOMES pattern | name IS pattern variablelist ::= variableitem | COLON name | variableitem COMMA variablelist \end{verbatim} Function definitions may only occur at top level or after a [[where]]. The [[name]] is the name of the function being defined, and the most frequently used form of the [[variable]] is either a single name or a parenthesized list of names separated by commas. In this case the translation to Lisp is straightforward, for example: \begin{verbatim} f x == E or f(x)==E is translated to (DEFUN f (x) TE) f (x,y,z)==E is translated to (DEFUN f (x y z) TE) f ()==E is translated to (DEFUN f () TE) \end{verbatim} where [[TE]] is the translation of [[E]]. At top level \begin{verbatim} f==E is translated to (DEFUN f () TE) \end{verbatim} The function being defined is that which when applied to its arguments produces the value of the body as result where the variables in the body take on the values of its arguments. A pattern may also occur in the variable of a definition of a function and serves the purpose, similar to the left hand side of assignments, of naming the list components. The phrase \begin{verbatim} name pattern DEF where is translated to name g DEF (pattern:=g;where) \end{verbatim} similarly \begin{verbatim} name1 name2 := pattern DEF where or name1 name2 is pattern DEF where are both translated to name1 name2 DEF (pattern:=name2;where) \end{verbatim} similarly for patterns that occur as components of a list of variables. order \begin{verbatim} variablelist ::= variableitem | COLON name | variableitem COMMA variablelist \end{verbatim} The parenthesized [[variablelist]] that occurs as a variable of a function definition can contain variables separated by commas but can also have a comma colon as its last separator. This means that the function is applicable to lists of different sizes and that only the first few elements corresponding to the variables separated by commas are named, and the last name after the colon denotes the rest of the list. Macros may be defined only at top level, and must always have a variable \begin{verbatim} macrodefinition::= name variable MDEF where \end{verbatim} The effect of a [[macrodefinition]] is to produce a Lisp macro which is applied to arguments that are treated as expressions, rather than their values, and whose result if formed by first substituting the expressions for occurrences of the variables within the body and then evaluating the resulting expression. \subsubsection{Where Clauses} \label{sec:boot:where-clause} Expressions may be qualified by one or more function definitions using the syntax \begin{verbatim} where ::= exit | exit WHERE qualifier qualifier ::= functiondefinition | pileof (functiondefinition) | blockof functiondefinition \end{verbatim} The functions may only be used within the expression that is qualified. This feature has to be used with some care, however, because a where clause may only occur within a function body, and the component functions are extruded, so to speak, from their contexts renamed, and made into top level function definitions. As a result the variables of the outer function cannot be referred to within the inner function. If a qualifying function has the format [[name DEF where]] then the [[where]] phrase is substituted for all occurences of the name within the expression qualified. If an expression is qualified by a phrase that is not a function definition then the result will be a compound statement in which the qualifying phrase is followed by the qualified phrase. \subsubsection{Tuples} \label{sec:boot:tuples} Although a tuple may appear syntactically in any position occupied by a primary it will only be given meaning when it is the argument to a function. To denote a list it has to be enclosed in brackets rather than parentheses. A tuple at top level is treated as if its components appeared at top level in the order of the list. \begin{verbatim} tuple::= parenthesized (listof (where,COMMA)) \end{verbatim} \subsubsection{Blocks and Piles} \label{sec:boot:block} \begin{verbatim} block::= parenthesized (listof (where,SEMICOLON)) pile::= piled (listof (listof(where,SEMICOLON),BACKSET)) A block or a pile get translated to a compound statement or PROGN \end{verbatim} \subsubsection{Top Level} \label{sec:boot:top-level} \begin{verbatim} toplevel ::= functiondefinition | macrodefinition | primary \end{verbatim} \subsubsection{Translation Functions} \label{sec:boot:translation} \begin{verbatim} (boottocl "filename") translates the file "filename.boot" to the common lisp file "filename.clisp" \end{verbatim} \begin{verbatim} (bootclam "filename") translates the file "filename.boot" to the common lisp file "filename.clisp" \end{verbatim} producing, for each function a hash table to store previously computed values indexed by argument list. The function first looks in the hash table for the result if there returns it, if not computes the result and stores it in the table. \begin{verbatim} (boottoclc "filename") translates the file "filename.boot" to the common lisp file "filename.clisp" with the original boot code as comments \end{verbatim} \begin{verbatim} (boot "filename") translates the file "filename.boot" to the common lisp file "filename.clisp", compiles it to the file "filename.bbin" and loads the bbin file. \end{verbatim} \begin{verbatim} (bo "filename") translates the file "filename.boot" and prints the result at the console \end{verbatim} \begin{verbatim} (stout "string") translates the string "string" and prints the result at the console \end{verbatim} \begin{verbatim} (sttomc "string") translates the string "string" to common lisp, and compiles the result. \end{verbatim} \begin{verbatim} (fc "functionname" "filename") attempts to find the boot function functionname in the file filename, if found it translates it to common lisp, compiles and loads it. \end{verbatim} \begin{verbatim} BOOT_-COMPILE_-DEFINITION_-FROM_-FILE(fn,symbol) is similar to fc, fn is the file name but symbol is the symbol of the function name rather than the string. (fn,symbol) \end{verbatim} \begin{verbatim} BOOT_-EVAL_-DEFINITION_-FROM_-FILE(fn,symbol) attempts to find the definition of symbol in file fn, but this time translation is followed by EVAL rather than COMPILE \end{verbatim} \begin{verbatim} (defuse "filename") Translates the file filename, and writes a report of the functions defined and not used, and used and not defined in the file filename.defuse \end{verbatim} \begin{verbatim} (xref "filename") Translates the file filename, and writes a report of the names used, and where used to the file filename.xref \end{verbatim} \subsection{Reserved identifiers} \label{sec:boot:reserved-identifiers} The following identifiers are reserved by Boot. \begin{verbatim} and append apply atom car cdr cons copy croak drop exit false first function genvar IN is isnt lastNode LAST list member mkpf nconc nil not NOT nreverse null or otherwise PAIRP removeDuplicates rest reverse setDifference setIntersection setPart setUnion size strconc substitute take true PLUS MINUS TIMES POWER SLASH LT GT LE GE SHOEEQ NE T \end{verbatim} The following identifiers designate special accessor functions in Boot. \begin{verbatim} setName setLabel setLevel setType setVar setLeaf setLeaf setDef aGeneral aMode aTree aValue attributes cacheCount cacheName cacheReset cacheType env expr CAR mmCondition mmDC mmImplementation mmSignature mmTarget mode op opcode opSig CDR sig source streamCode streamDef streamName target \end{verbatim} \section{The Makefile} \label{sec:Makefile} When all of the native object files are produced we construct a lisp image that contains the boot translator, called [[bootsys]], which lives in the [[$(axiom_target_bindir)]] directory. This [[bootsys]] image is critical for the rest of the makefiles to succeed. There are two halves of this file. the first half compiles the .lisp files that live in the src/boot directory. the second half compiles the .clisp files (which are generated from the .boot files). It is important that the .clisp files are kept in the src/boot directory for the boot translator as they cannot be recreated without a boot translator (a bootstrap problem). An important subtlety is that files in the boot translator depend on the file npextras. there are 3 macros in npextras that must be in the lisp workspace (\verb$|shoeOpenInputFile| |shoeOpenOutputFile| memq$). \subsection{Environment} \label{sec:Makefile:env} \subsubsection{Lisp Images} \label{sec:Makefile:env:lisp-images} We will use create and use several lisp images during the build process. We name them here for convenience. \paragraph{[[AXIOM_LOCAL_LISP]].} First we create a Lisp image that contains at least three macros for translating Boot source files. We do this by loading \File{initial-env.lisp} in [[AXIOM_LISP]], and saving the resulting image. That image is then used to build the bootstrapping Boot translator. <>= AXIOM_LOCAL_LISP_sources = initial-env.lisp AXIOM_LOCAL_LISP = ../lisp/base-lisp$(EXEEXT) @ \paragraph{[[BOOTSYS_FOR_TARGET]].} The [[$(BOOTSYS_FOR_TARGET)]] image is the final Boot translator image, produced after several bootstrap stages. That is the result of running the \Tool{Make} target [[all-boot]]. <>= BOOTSYS_FOR_TARGET = $(axiom_target_bindir)/bootsys$(EXEEXT) @ \section{Proclaim optimization} \label{sec:proclaim} GCL, and possibly other common lisps, can generate much better code if the function argument types and return values are proclaimed. In theory what we should do is scan all of the functions in the system and create a file of proclaim definitions. These proclaim definitions should be loaded into the image before we do any compiles so they can allow the compiler to optimize function calling. GCL has an approximation to this scanning which we use here. The first step is to build a version of GCL that includes gcl\_collectfn. This file contains code that enhances the lisp compiler and creates a hash table of structs. Each struct in the hash table describes information that about the types of the function being compiled and the types of its arguments. At the end of the compile-file this hash table is written out to a ".fn" file. The second step is to build axiom images (depsys, interpsys, AXIOMsys) which contain the gcl\_collectfn code. The third step is to build the system. This generates a .fn file for each lisp file that gets compiled. The fourth step is to build the proclaims.lisp files. There is one proclaims.lisp file for boot (boot-proclaims.lisp), interp (interp-proclaims.lisp), and algebra (algebra-proclaims.lisp). To build the proclaims file (e.g. for interp) we: \begin{verbatim} (a) cd to obj/linux/interp (b) (yourpath)/axiom/obj/linux/bin/lisp (c) (load "sys-pkg.lsp") (d) (mapcar #'load (directory "*.fn")) (e) (with-open-file (out "interp-proclaims.lisp" :direction :output) (compiler::make-proclaims out)) \end{verbatim} Note that step (c) is only used for interp, not for boot. The fifth step is to copy the newly constructed proclaims file back into the src/interp diretory (or boot, algebra). In order for this information to be used during compiles we define <>= PROCLAIMS=(load "$(srcdir)/boot-proclaims.lisp") @ \section{Special Commands} \label{sec:special-commands} We are working in a build environment that combines Makefile technology with Lisp technology. Instead of invoking a command like {\bf gcc} and giving it arguments we will be creating Lisp S-expressions and piping them into a Lisp image. The Lisp image starts, reads the S-expression from standard input, evaluates it, and finding an end-of-stream on standard input, exits. \section{The Boot Compiler} \label{sec:boot-compiler} This section describes the set of object files that make the Boot compiler. \subsection{The Bootstrap files} \label{sec:boot-compiler:bootstrap} This is a list of all of the files that must be loaded to construct the boot translator image. <>= boot_objects = initial-env.$(FASLEXT) $(boot_sources:.boot=.$(FASLEXT)) boot_SOURCES = \ initial-env.lisp.pamphlet \ $(addsuffix .pamphlet, $(boot_sources)) pamphlets = Makefile.pamphlet $(boot_SOURCES) @ [[$(boot_sources)]] is a list of the boot file targets. If you modify a boot file you'll have to explicitly build the clisp files and merge the generated code back into the pamphlet by hand. The assumption is that if you know enough to change the fundamental bootstrap files you know how to migrate the changes back. This process, by design, does not occur automatically (though it could). The Boot compiler, [[bootsys]], is built from a set of source files written in Boot. Note that the order is important as earlier files will contain code needed by later files. <>= boot_sources = tokens.boot includer.boot scanner.boot \ pile.boot ast.boot parser.boot translator.boot boot_clisp = $(boot_sources:.boot=.clisp) boot_data = $(boot_sources:.boot=.data) boot_fn = $(boot_sources:.boot=.fn) @ These source files use macros defined in the first set, and they be compiled in an environment where those macros are present. The Boot source file for [[bootsys]] are automatically extracted --- only during bootstrap --- from the pamphlets into the current build directory. When bootstrapping, they are the inputs to the stage0, stage1 [[bootsys]] compilers. <>= .PRECIOUS: %.boot %.boot: $(srcdir)/%.boot.pamphlet $(axiom_build_document) --tangle $< @ Since the Boot language is defined as a syntactic sugar over Lisp (a reasonably tasty sugar), the the second set of source files (written in Boot) is first translated to Lisp, and the result of that translation is subsequently compiled to native object files. Partly for bootstrapping reasons, and partly because Axiom (therefore Boot) is not yet widespread, the pamphlets for the source files written in Boot currently keep a cache of their translated versions. Hopefully the maintainance of that cache will be unnecessary as the build machinery becomes more and more improved, and Axiom gets in widespread use. <>= boot_cached_clisp = $(boot_sources:.boot=.clisp) @ \section{Bootstrapping Boot} \label{sec:bootstrapping} When the system is configured for bootstrap, we build the Boot compiler --- [[bootsys]] --- in three steps: \begin{enumerate} \item a stage-0 Boot compiler, built from the cached (Lisp) source files; \item a stage-1 Boot compiler, built the original Boot source files using the stage-0 Boot compiler; \item and a stage-2 Boot compiler, built from original Boot source files using the stage-2 Boot compiler. \end{enumerate} Notice that in last two steps, the source file written in Boot are first translated to Lisp using the freshly built Boot compiler, and the resulting Lisp files subsequently compiled to natve object files. Ideally, we should also compare the intermediate Lisp source files from stage 1 and 2 to detect possible miscompilation. We don't do that for the moment. \subsection{Compiling the Boot source files} \label{sec:bootstrapping:source-files} We compile the Boot compiler source files written in Boot only at stage 1 and 2 (when bootstrapping). As explained earlier, the compilation of these files proceeds in two steps: \begin{enumerate} \item Translate the Boot source files to Lisp code, \item compile the resulting Lisp source files to native object code. \end{enumerate} <>= ## Dependency for various modules. ## FIXME: This should be automatically extracted from the ## Boot source file at packaging time. %/tokens.($FASLEXT): %/initial-env.$(FASLEXT) %/includer.$(FASLEXT): %/tokens.$(FASLEXT) %/scanner.$(FASLEXT): %/tokens.$(FASLEXT) %/includer.$(FASLEXT) %/pile.$(FASLEXT): %/scanner.$(FASLEXT) %/includer.$(FASLEXT) %/ast.$(FASLEXT): %/includer.$(FASLEXT) %/parser.$(FASLEXT): %/ast.$(FASLEXT) %/scanner.$(FASLEXT) %/includer.$(FASLEXT) %/translator.$(FASLEXT): %/parser.$(FASLEXT) %/ast.$(FASLEXT) \ %/pile.$(FASLEXT) %/scanner.$(FASLEXT) \ %/includer.$(FASLEXT) <> @ \subsection{Building [[bootsys]]} \label{sec:bootstrapping:build-bootsys} \subsection{The various bootstrapping stages} \label{sec:bootstrapping:stages} The bootstrapping phase is carried out in three stages: \begin{itemize} \item[Stage 0] we compile the cached Lisp translations of the Boot codes. Currently, these translations are functionally equivalent to the final \Tool{bootsys} we get out of the bootstrap. Ideally, this should just be powerfull enough to translate the \Tool{bootsys} Boot codes. The compilation of thee Lisp code is done with the Lisp image [[$(AXIOM_LOCAL_LISP)]]. \item[Stage 1] Using the \Tool{bootsys} built from the previous stage (\eg{} from cached Lisp translations), we build a new \Tool{bootsys} from the Boot codes proper. \label{sec:bootstrapping:stages} \item[Stage 2] Finally, we build another (and final) \Tool{bootsys} image using the \Tool{bootsys} from Stage 1. This is the \Tool{bootsys} image that is used to build the rest of the Axiom system. \end{itemize} Stage 1 and Stage 2 are structurally identical. Ideally, we should be doing a bootstrap compare. Although all the \Tool{bootsys} images are powerful enough to compile Boot codes directly, we don't use them for compilation. Instead, we the fresh, clean, [[$(AXIOM_LOCAL_LISP)]] image. The reason is that the process of compiling a Boot source file may have the side effect of loading a module in the compiler (as by-product of resolving module dependencies). But such module will contain objects already present in the compiler and being used. Consequently, we must use a fresh image to guarantee clean and reproductible build and semantics. Notice that only the compilation of \Tool{bootsys} itself needs that care. The rest of the Axiom system should use \Tool{bootsys} to compile Boot codes, instead of manually going through the Lisp translation phase. \subsubsection{Stage 0} \label{sec:bootstrapping:stages:stage-0} We build the stage-0 Boot compiler from the cached Lisp souces code. <>= .PRECIOUS: stage0/%.clisp .PRECIOUS: stage0/%.$(FASLEXT) stage0_boot_clisp = $(addprefix stage0/, $(boot_clisp)) stage0_boot_objects = $(addprefix stage0/, $(boot_objects)) stage0/stamp: stage0/bootsys$(EXEEXT) @rm -f $@ @$(STAMP) $@ stage0/bootsys$(EXEEXT): $(stage0_boot_objects) $(AXIOM_LOCAL_LISP) -- --make --main="|AxiomCore|::|topLevel|"\ --output=$@ --load-directory=stage0 \ $(stage0_boot_objects) .PHONY: mk-stage0-dir mk-stage0-dir: @[ -d stage0 ] || $(mkinstalldirs) stage0 $(stage0_boot_objects): $(AXIOM_LOCAL_LISP) stage0/%.$(FASLEXT): stage0/%.clisp $(AXIOM_LOCAL_LISP) -- --compile \ --load-directory=stage0 --output=$@ $< stage0/%.clisp: $(srcdir)/%.boot.pamphlet mk-stage0-dir $(axiom_build_document) --tangle=$*.clisp --output=$@ $< %/initial-env.$(FASLEXT): initial-env.lisp mk-%-dir $(AXIOM_LOCAL_LISP) -- --compile --output=$@ $< @ \subsubsection{Stage 1} \label{sec:bootstrapping:stages:stage-1} <>= .PRECIOUS: stage1/%.$(FASLEXT) .PRECIOUS: stage1/%.clisp stage1/stamp: stage1/bootsys$(EXEEXT) rm -f $@ $(STAMP) $@ stage1/bootsys$(EXEEXT): $(addprefix stage1/, $(boot_objects)) $(AXIOM_LOCAL_LISP) -- --make --main="|AxiomCore|::|topLevel|" \ --output=$@ --load-directory=stage1 \ $(addprefix stage1/, $(boot_objects)) stage1/%.$(FASLEXT): stage1/%.clisp $(AXIOM_LOCAL_LISP) -- --compile \ --load-directory=stage1 $< stage1/%.clisp: %.boot stage0/stamp mk-stage1-dir stage0/bootsys -- --translate --output=$@ $< .PHONY: mk-stage1-dir mk-stage1-dir: @[ -d stage1 ] || $(mkinstalldirs) stage1 @ \subsubsection{Stage 2} \label{sec:bootstrapping:stages:stage-2} <>= .PRECIOUS: stage2/%.$(FASLEXT) .PRECIOUS: stage2/%.clisp stage2/stamp: stage2/bootsys$(EXEEXT) @echo Building stage 2 $(STAMP) $@ stage2/bootsys$(EXEEXT): $(addprefix stage2/, $(boot_objects)) $(AXIOM_LOCAL_LISP) -- --make --main="|AxiomCore|::|topLevel|" \ --output=$@ --load-directory=stage2 \ $(addprefix stage2/, $(boot_objects)) stage2/%.$(FASLEXT): stage2/%.clisp $(AXIOM_LOCAL_LISP) -- --compile \ --load-directory=stage2 $< stage2/%.clisp: %.boot stage1/stamp mk-stage2-dir stage1/bootsys -- --translate --output=$@ $< .PHONY: mk-stage2-dir mk-stage2-dir: @[ -d stage2 ] || $(mkinstalldirs) stage2 @ <>= <> <> <> @ \section{Making the documentation} \label{sec:doc} \subsection{Compiling Lisp files without deps from pamphlets} <>= .PRECIOUS: %.lisp initial-env.lisp: initial-env.lisp.pamphlet $(axiom_build_document) --tangle $< @ \subsection{boot from pamphlet} <>= .PRECIOUS: %.boot %.boot: $(srcdir)/%.boot.pamphlet $(axiom_build_document) --tangle $< @ \section{Making the documentation} <>= COMPILE_LISP = \ $(axiom_build_document) --tag=lisp --mode=compile --output=$@ BOOT_TO_LISP = \ $(axiom_build_document) --tag=boot --mode=translate \ --use=./prev-stage/bootsys $< @ \section{Cleanup} <>= mostlyclean-local: @rm -f $(AXIOM_LOCAL_LISP) @rm -f $(BOOTSYS_FOR_TARGET) @rm -rf prev-stage @rm -rf stage0 stage1 stage2 @rm -f *.data *.fn @rm -f stamp clean-local: mostlyclean-local @rm -f $(boot_sources) @rm -f *.clisp *.lisp distclean-local: clean-local @ \section{Global variables} The Boot implementation uses a number of global variables for communication between several routines. Some of them follow the syntactic convention of starting their names with [[$]]. Some don't. \subsection{[[$linepos]]} \subsection{[[$f]]} \subsection{[[$r]]} \subsection{[[$ln]]} \subsection{[[$sz]]} \subsection{[[$n]]} \subsection{[[$floatok]]} \subsection{[[$bfClamming]]} \subsection{[[$GenVarCounter]]} \subsection{[[$inputstream]]} \subsection{[[$stack]]} \subsection{[[$stok]]} \subsection{[[$ttok]]} \subsection{[[$op]]} \subsection{[[$wheredefs]]} \subsection{[[$typings]]} \subsection{[[$returns]]} \subsection{[[$bpCount]]} \subsection{[[$bpParentCount]]} \subsection{[[$lispWordTable]]} \subsection{[[$bootUsed]]} \subsection{[[$bootDefinedTwice]]} \subsection{[[$used]]} \subsection{[[$letGenVarCounter]]} \subsection{[[$isGenVarCounter]]} \subsection{[[$inDefIS]]} \subsection{[[$fluidVars]]} \subsection{[[$locVars]]} \subsection{[[$dollarVars]]} \section{The Makefile} <<*>>= <> subdir = src/boot/ .PHONY: all-ax all-boot all: all-ax all-boot all-ax all-boot: stamp stamp: $(BOOTSYS_FOR_TARGET) @rm -f stamp $(STAMP) $@ $(BOOTSYS_FOR_TARGET): stage2/bootsys$(EXEEXT) $(INSTALL_PROGRAM) stage2/bootsys$(EXEEXT) $(axiom_build_bindir) <> <> <> <> @ \eject \begin{thebibliography}{99} \bibitem{1} src/boot/boothdr.lisp.pamphlet \bibitem{2} src/boot/includer.boot.pamphlet \bibitem{3} src/boot/pile.boot.pamphlet \bibitem{4} src/boot/scanner.boot.pamphlet \bibitem{5} src/boot/exports.lisp.pamphlet \bibitem{7} src/boot/translator.boot.pamphlet \bibitem{8} src/boot/parser.boot.pamphlet \bibitem{9} src/boot/tokens.boot.pamphlet \bibitem{10} src/boot/ast.boot.pamphlet \end{thebibliography} \end{document}