home—lectures—exams—hws—breeze (snow day)

lect13c-macros
macros

Q's, for future semesters:

- define-struct, vs just struct:
    (struct (ship x y dx dy))  ; Cf. `define-struct`
    (ship 4 7 23 80)           ; Cf. `make-ship`
    (ship-x ...)               ; (same)

- asteroids: physics/geom of acceleration, rotation, facing too distracting?
- project: 
    use racket imperative scanner, vs reading in entire input as sexpr?
    test harness too abstract?

Macros

Macros are code which generates other code (in the same source-language).

The C pre-processor

C has a primitive system, based purely on modifying strings. (It doesn't let you work on the level of syntax trees at all.) Before compiling a C program, the "c pre-processor" (cpp) makes a first pass where it does some string-substitution. (Use 'gcc -E' to show the results of pre-processing only.)

 1  #include <stdio.h>
 2  
 3  #define PI 2.14159+1
 4  #define RADIANS_PER_DEGREE 2*PI/360
 5  #define MAX(a,b)  a >= b ? a : b
 6  #define swap(x,y) {int tmp;  tmp = x; x = y; y = tmp;}
 7  
 8  int main() {
 9    double p = 360 * RADIANS_PER_DEGREE;
10    printf( "The max is: %g.\n", 1/MAX(p++,PI) );
11    
12    
13    const int ETM = 15;  // #business days between "every third Monday"
14    //const int EOF = 10;  // days in the pay period: "Every Other Friday".
15    // This previous line doesn't compile:
16    //    error: expected identifier or '(' before '-' token
17    // Because the pre-processor is oblivious to C, we get weird error messages.
18  
19  
20  
21    int a = 5, b = 99;
22    printf( "Before swap: a=%d, b=%d.\n", a, b );
23    swap(a,b);
24    printf( "After  swap: a=%d, b=%d.\n", a, b );
25  
26    // Happily: swap(++a,b) gives a syntax-error.
27  
28    // Uh-oh!
29    int tmp = 5;
30    b = 99;
31    printf( "Before swap: tmp=%d, b=%d.\n", a, b );
32    swap(tmp,b);
33    printf( "After  swap: tmp=%d, b=%d.\n", a, b );
34    }

(We talk about pitfalls of using the preprocessor.)

Note that one relatively-more-robust use of the preprocessor is conditional compilation:

#define USING_WINDOWS_VISTA

#ifdef (USING_WINDOWS_XP)
// code which *only* gets compiled on some platforms
#define MAX_DISK_SIZE 300
  void writeErrorLog( string msg ) { /* XP-specific code */ }
#else
  void writeErrorLog( string msg ) { /* (non-XP)-specific code */ }
#endif

or (from Wikipedia)

#if VERBOSE >=2
  print("trace message");
#endif

Overall, C's preprocessor is a failed experiment: rather than a tool which does text-substitution (and knows nothing about the language -- scoping, variables, etc.), it's better to have features like named constants, module systems which search a library for declarations but don't actually include the code, and real functions (which you can suggest that the compiler inline). Conditional compilation can be done by having built-in functions like "getCurrentOS" plus an aggressive compiler.

Further abilities (and other sorts of pitfalls) are mentioned on wikipedia.

Possible reasons to want to write a macro:

You want to write a function, but one which short-circuits:
(define (implies a b) (or (not a) b))
However, this doesn't short-circuit, since it's a regular function: whenever implies is called, the second argument gets evaluated before we ever reach the above code.

We want:
(define-syntax-rule (implies a b) (or (not a) b))
This is a use of macros to achieve (a limited form of) lazy evaluation.
You want your own little-language (example tomorrow), but don't want to give up all the stepper/debugger/syntax-coloring/comment already provided.
You are dissatisfied with the syntax of your favorite language, and want to extend it with new syntax. For example, in racket I find it handy to have "an anonymous function of exactly one argument, and that argument will be named ____ (reminiscent of ‘fill-in-the-blank’). I can write my own special form “λ1”:
; The function 'add3': (λ1 + 3 ____) ; Extract all the positive numbers from 'nums': (filter (λ1 > ____ 0) nums)
Since λ1 isn't built-in to racket, we have to write it ourselves. Note that it's not possible to implement λ1 as a normal racket function, because we'd get an error like “____ unbound”.
In java, I wish I could add a keyuword 'gettable' that would auto-create the default getter for me. Similarly, I want a default constructor. [We'll see reflection to do this, in a couple of lectures.]
when using the delegate pattern, I want to have the fwd'd methods auto-declared.

Note that macros in racket are kind of nice: because S-expressions correspond to syntax trees, your macro can go ahead and work at the level of syntax.

better macros

How well does this C pre-processor macro work?

#define swap(a,b)   int tmp = a; a = b; b = tmp;

Well, just to be safe, I'd add parentheses around each use of a and b (in case the user calls something complicated like swap(aStruct.aField, b[3+7]); I'm not sure this is strictly needed, but I certainly can't prove to myself that it's safe to omit them (and I can prove to myself that it can't hurt to include parentheses them).
Note that I'm not worried about swap(3,x) — this will cause a syntax error after expanding the macro (“3 = tmp” is illegal), but that's okay — calling swap(3,x) should case an error. (Giving clear error messages in terms of the original source code is more difficult, though.)
Statements like swap(x++,++y) are sensible, but trying to use cpp to write them have a couple of problems:
C doesn't allow “++” on the left-hand-side of an assignment, so our macro generates non-legal C code;
even if it did generate legal code, it's likely that the variables x and y would get incremented multiple times, since the macro repeated its input several times.
What does
if (n>0) swap(n,m)
expand to? We have an extra semicolon, Oops!

How can we fix it? Wrapping the body in curly-brackets seems to help, but it doesn't really: If you add curly brackets to swap, then what does
if (n>0) swap(n,m); else ++n;
expand to? Oops!

The standard hack is to remember to wrap all your macros in do { … } while(0).
(Examples from 15 subtle flaws in C/C++ macros).

Also, observe how the interaction between semicolon in the macro vs the macro call is all a bit haphazard and error-prone. Another thing stemming from the fact that we're doing macros based at the string-level, rather than at the syntax-tree level.
This macro only works if you can declare tmp in the middle of a block of code (C does allow this, but not all imperative languages do).
The real nail in the coffin: What if somebody else already was using a variable named 'tmp'? And moreover, what would swap(tmp,b) expand to?

We say this macro is non-hygienic, because the introduced-variable 'tmp' can conflict with existing variables.

(One of the most powerful things about regular functions is that they have their own scope; programming where all variables are global (e.g. assembly, early BASIC, early FORTRAN, …) means it's hard to write large programs which don't use conflicting variable names. It's a giant step backwards to add global namespaces.)
In C++, they added “call by reference”:
void swap( int& a, int& b ) { // a,b are actually references to the real variables. int tmp = a; a = b; b = tmp; }
Note that tmp is a regular int, where a,b are something different: reference-ints. You cannot compile betterSwap(n+3,47), as you'd hope.
In Pascal, reference-parameters were preceded with the keyword “var”, instead of followed by an ampersand: procedure swap( var int a, var int b ).

In C, we can (kinda) write swap as a true method, using the & (address-of) operator (to fake pass-by-reference).

      void betterSwap( int *a, int *b ) {
        int tmp;
        tmp = *a;
        *a = *b;
        *b = tmp;
        }

     // call this by:
     int a = 5, b = 99;
     betterSwap(&a,&b);

In PL/SQL:

  create or replace procedure pr_Demo(
    n    in number(10),
    ans out varchar(2)
    )
  is
  begin
    out := 'HI'
  end;

But in racket, there is no "address-of" operator, we really do want this to be an inlined method, except that we don't want tmp to conflict with any other variable. racket macros do that, with hygiene:
(define-syntax-rule (swap a b) (let* {[tmp a]} (begin (set! a b) (set! b tmp)))) (define tmp 5) (define b 99) (swap tmp b) b tmp
define-syntax-rule means “run this on the syntax-tree, before calling eval!” That's what a (true) macro is.
Note that if looking for on-line tutorials, you'll also see more basic building blocks “syntax-rules” and “define-syntax”; those are both more general-purpose. define-syntax-rule is a good starting point for macros.

It turns there is also a variant, syntax-rules, that allows pattern-matching (just like prolog!). Here's an example, based on this page; it re-writes let as lambda, as we discussed earlier:

(define-syntax my-let
  (syntax-rules ()
    ( (my-let ((id expr) ...) body)      ; before
      ((lambda (id ...) body) expr ...)  ; after
      )))

After doing this,

(my-let {[a 3]
         [id1 5]
         [c 7]
        }
  (+ a id1 c))
; =>
((lambda (a id1 c) (+ a id1 c)) 3 5 7)
; =>
15

Challenge: write define-and-pass-by-reference, which is just like define except that all arguments are actually passed by references. That is,

(define-and-pass-by-reference (swapper a b)
   …)

Have this macro turn around and generate code which

Some personal favorite macros:

assert* (making a bunch of asserts all at once),
define/opt (combines define with lambda/opt),
define/case (combines define with case-lambda),
letdef* (combines let* with define)
when-not (an improved name for unless — probably human-interace to introduce redundant names)
begin1 (like begin0, except that it returns the 2nd value from a list-of-expressions)
test (has been obviated by check-expect)

A personal favorite macro: I often find myself writing functions of one argument, like

  ; Suppose I have a list 'data' and a threshold 'n';
  ; grab all the elements of 'data' bigger than 'n':
  (filter (lambda (x) (> x n)) data)

I do things like this so often, I'd like to have my own abbreviated version for creating a function:

  (filter (l1 > x n) data)

Note that x is an introduced variable. (We want it to shadow any existing ¹ variable name!) The code for this is actually pretty involved, using racket's hygienic macro system to introduce non-hygienic macros. Therefore, the code is given only as an un-explained footnote ²

Little Languages

But macros can be used for more than programmers: it can be used to introduce entire scripting languages, inside (say) racket! Here is an example -- taken from Krishnamurthi's manifesto Swine Before Perl: Suppose we want to implement finite state machines. While we could have our non-programming expert friends learn our own programming language plus our own libraries, or we could write a program which takes in strings and interprets them, there is a third approach: let them write something that looks like a FSM spec, but is translated directly into racket. For example, here's a canonical parity-checker FSM, which checks that the input contains an even number of as, and an even number of bs:

BEGIN stoplight 

start state is s00

from s00 on input a go to s10
from s00 on input b go to s01

from s01 on input a go to s11
from s01 on input b go to s00

from s10 on input a go to s00
from s10 on input b go to s11

from s11 on input a go to s01
from s11 on input b go to s10

Remember, this is an example program some user might write using our hypothetical automaton-language.

(Okay, you might decide to make a less verbose language, like with lines like “transition: s00 a s10”. Or you might make your work easier by requiring the user to use a parenthesized grammar, as shown below. It's up to you.)

(define parity
  (automaton s00 
             (s00 T : (a -> s10)
                      (b -> s01))
             (s01 F : (a -> s11)
                      (b -> s00))
             (s10 F : (a -> s00)
                      (b -> s11))
             (s11 F : (a -> s01)
                      (b -> s10))))

(parity '(a a b a b a))    ; yields true
(parity '(a a b a b a b))    ; yields false

Regardless of your input format, the first thing for you (as a macro-writer) to figure out is what code should this macro generate?

You can see what a racket program would do: have a variable for the current-state, and then (depending on that state and (first input)) recur with a new value for the current state. Actually, it might even make four functions, s00, s01, s10, s11; for example s10 be a function, which given a list starting with a will (tail-)recur on s00, and given a list starting with b it will (tail-)recur on s11.

(define parity
     (letrec {[s00 (lambda (input)
                       (if (empty? input)
                           #t 
                           (case (first input)
                             [(a) (s10 (rest input))]
                             [(b) (s01 (rest input))]
                             [else false])))}  ; No listed transition: fail.
              [s01 (lambda (input)
                       (if (empty? input)
                           #f
                           (case (first input)
                             [(a) (s11 (rest input))]
                             [(b) (s00 (rest input))]
                             [else false])))}  ; No listed transition: fail.
              [s10 (lambda (input)
                       (if (empty? input)
                           #f
                           (case (first input)
                             [(a) (s00 (rest input))]
                             [(b) (s11 (rest input))]
                             [else false])))}  ; No listed transition: fail.
              [s11 (lambda (input)
                       (if (empty? input)
                           #f
                           (case (first input)
                             [(a) (s01 (rest input))]
                             [(b) (s10 (rest input))]
                             [else false])))}  ; No listed transition: fail.
              }
       s00))

The macro that can auto-do-this is:

(define-syntax automaton
  (syntax-rules (-> :)
    [(automaton init-state
                (state accepting? : (cndn -> new-state) 
                                    ...)
                ...)
     (letrec ([state (lambda (input)
                       (if (empty? input)
                           (symbol=? accepting? 'T)
                           (case (first input)
                             [(cndn) (new-state (rest input))]
                             ...
                             [else false])))]  ; No listed transition: fail.
              ...)
       init-state)]))

Note that we can take a non-parenthesized language, and implement it in racket with a two-to-three stage process:

Implement regular racket functions to implement the language;
Make a syntax for the language that isn't straight-racket, but is fully parenthesized. Use macros that convert this parenthesized code (AST, really) to the regular racket code.
Finally, make a reader that converts non-parenthesized syntax into the nice, easy-to-handle parenthesized structure

brainfudge

Dynamic scope:(Lisp; Perl.)

class Foo() {
  int x = 3;

  void foo() {
    ++x;        // If dynamically scoped: refers to 'most recently declared' 
x
    print(x);   // in a *dynamic* sense!
    }

  void haha() {
    foo();
    }

  void lala() {
    int x = 9;
    foo();
    }

  void gaga() {
    int x = 5;
    foo();
    }

  }


class Hoo() {
  int x;
  Foo.foo();
  }

¹ Actually, I use the variable name __ instead of x, reminiscent of "fill in the blank (with the argument)": (filter (l1 > __ n) data) ↩

  (define-syntax (l1 stx)
    (syntax-case stx ()
      [(src-l1 proc args ...)
       ; This nested syntax-case is so that __ has the same
       ; context/scope as proc,args.  (That is,  we
       ; don't want to introduce a hygienic __ that is really
       ; different from any __ occuring in proc,args.)
       ; So create a syntax-object __ that takes context from src-l1,
       ; and then return a lambda which binds that __.
       ;
       (syntax-case (datum->syntax #'src-l1 '__) ()
         [__ (syntax/loc #'src-l1 (lambda (__) (proc args ...)))])]))

; There might be a cleaner way of writing the above

You can see more examples of syntax-case in ProgLangs Chpt. 36 ("Macros as Compilers"). ↩