Effects of Side effects

Recently I was working on a project that involved the usage of PayPal REST API SDK.

To give a little bit of background, starting in PHP 7.x (I believe 7.2), the language will throw a warning if you use sizeof on a variable that does not implement Countable. Arrays implement Countable, but obviously not primitive types, such as integer for example:

php > var_dump( sizeof( [] ) );
int(0)
php > var_dump( sizeof( 1 ) );
PHP Warning:  sizeof(): Parameter must be an array or an object that implements Countable in php shell code on line 1
int(1)

So, now depending on how we use sizeof, it can trigger the warning which is a kind of a side-effect. In PayPal, before they provided a fix for it, what they did was:

if (sizeof($v) <= 0 && is_array($v))

After:

if (is_array($v) && sizeof($v) <= 0)

The only difference in the code above is the order of operations. The second code snippet won’t trigger the warning. However, for the first one, obviously if the variable $v does not implement Countable, you will get the warning.

Mathematically, the logical AND operation is commutative, i.e. a AND b = b AND a. In some cases, it can also be commutative in programming, up until the point side effects are involved.

So a AND b may not necessarily be b AND a if there are side effects involved. There’s a good reason for this “trade-off”, since side effects can be expensive so if evaluating only the first one returns false there’s no need to evaluate the rest of them to waste resources (since we know that false AND a AND ... is false). The side effect here is PHP throwing warnings to stdout on sizeof.

Creating our own ‘struct’ macro in Racket

One thing that was always interesting to me about Lisps in general is their minimality, which means that with a couple of starting points you can implement almost anything in the language, even parts of the language itself.

SICP in chapter 2 asks us to re-implement car, cons, and cdr (related post).

That’s a neat way to represent structured data.

Now, for something more interesting, how about we re-implement the struct syntax, by providing our own create-struct macro that will generate a data type with fields, and procedures for retrieving them?

For example, if we did the following:

(create-struct person (firstname lastname))

This macro should generate these procedures for us:

  1. person – to construct an object of such type, which accepts 2 parameters in this case
  2. person? – to check if a variable is really of that struct type, which accepts 1 parameter
  3. person-firstname – to retrieve the first field of a person object, which accepts 1 parameter
  4. person-lastname – to retrieve the second field of a person object, which accepts 1 parameter

That is, the code above should generate something similar to this:

(define (person firstname lastname) (list 'person firstname lastname))
(define (person? p) (and (list? p) (>= (length p) 1) (equal? (car p) 'person)))
(define (person-firstname p) (list-ref p 1))
(define (person-lastname p) (list-ref p 2))

So that when we evaluate it, we get:

> (person-firstname (person "Hello" "World"))
"Hello"
> (person-lastname (person "Hello" "World"))
"World"
> (person? (person "Hello" "World"))
#t

Sounds fun?

We will use non-hygienic macros just for fun, since I find them easier to write code with and understand them.

So, our code starts with:

#lang racket

; Non-hygienic macros support.
(require mzlib/defmacro)

Let’s start by just implementing our constructor person initially:

(define-macro (create-struct struct-name struct-fields)
  (list 'begin
        (list 'define struct-name
              (list 'lambda struct-fields (cons 'list (cons (list 'quote struct-name) struct-fields))))))

Testing it:

> (person "Hello" "World")
'(person "Hello" "World")

Seems to be working fine.

To get a better grasp for the code above, what I usually do is change it to a function, and call it to see what it returns. But testing it this way can be tricky as it won’t have the variables in context, so you will have to quote allthethings:

#lang racket
(require mzlib/defmacro)

(define (create-struct struct-name struct-fields)
  (list 'begin
        (list 'define struct-name
              (list 'lambda struct-fields (cons 'list (cons (list 'quote struct-name) struct-fields))))))

(create-struct 'person '(firstname lastname))
; returns '(begin (define person (lambda (firstname lastname) (list 'person firstname lastname))))

Looking good.

Now, let’s proceed by implementing person?:

(define-macro (create-struct struct-name struct-fields)
  (append
   (list 'begin
         (list 'define struct-name
               (list 'lambda struct-fields (cons 'list (cons (list 'quote struct-name) struct-fields))))
         (list 'define (string->symbol (string-append (symbol->string struct-name) "?"))
               (list 'lambda '(x) (list 'and (list 'list? 'x) (list '>= (list 'length 'x) 1) (list 'equal? (list 'car 'x) (list 'quote struct-name))))))))

Testing it:

> (person? (person "Hello" "World"))
#t
> (person? '(1 2 3))
#f
> (person? '(person 2 3))
#t

We will have to re-use the string->symbol magic we did, so it’s good to even have that as a macro:

(define-macro (symbol-append . x)
  (list 'string->symbol (list 'apply 'string-append (list 'map 'symbol->string (cons 'list x)))))

(symbol-append 'a 'b 'c) ; returns 'abc

Looks good. Now, let’s try to use it in our macro by changing that line to:

(list 'define (symbol-append struct-name '?)

We get the following:

. symbol-append: unbound identifier in module (in the transformer environment, which does not include the macro definition that is visible to run-time expressions) in: symbol-append

After some searching through the docs, I found out that it’s related to compilation phases. Note that in REPL mode it would not make any difference.

In order to define a function that we can use in our macro, we can use define-for-syntax. It acts just like define, except that the binding is at phase level 1 instead of phase level 0.

(define-for-syntax (symbol-append . x)
  (string->symbol (apply string-append (map symbol->string x))))

Now our code works. Phew!

To finalize our macro, we also have to implement the getters. We could try with the following code:

(define struct-name 'test)
(define struct-fields '(a b c))

(map (lambda (field index)
       (list 'define
             (symbol-append struct-name '- field)
             (list 'lambda (list 'ctx) (list 'list-ref 'ctx index))))
     struct-fields
     (range 1 (+ 1 (length struct-fields))))

What this code does is it will map through all the struct-fields, create a definition for them named by using our symbol-append helper concatenated with their field name, and then the body of that function should just use list-ref. Note how we added another field to our map (index) so that we know what to list-ref to.

Running the code above produces:

'((define test-a (lambda (ctx) (list-ref ctx 1))) (define test-b (lambda (ctx) (list-ref ctx 2))) (define test-c (lambda (ctx) (list-ref ctx 3))))

Looks good. Now, let’s try to merge the code above in our macro, so that the final definition is:

(define-macro (create-struct struct-name struct-fields)
  (append
   (list 'begin
         (list 'define struct-name
               (list 'lambda struct-fields (cons 'list (cons (list 'quote struct-name) struct-fields))))
         (list 'define (symbol-append struct-name '?)
               (list 'lambda '(x) (list 'and (list 'list? 'x) (list '>= (list 'length 'x) 1) (list 'equal? (list 'car 'x) (list 'quote struct-name))))))
   (map (lambda (field index)
          (list 'define
                (symbol-append struct-name '- field)
                (list 'lambda (list 'ctx) (list 'list-ref 'ctx index))))
        struct-fields
        (range 1 (+ 1 (length struct-fields))))))

Whoops, again a compilation phases error:

range: unbound identifier in module (in phase 1, transformer environment) in: range

By visiting the same docs as above, we just need to do the following:

; We need to do this so that the bindings for `range` are provided in context for the macro.
(require (for-syntax racket))

And that concludes our macro. Let’s give it a few tries now:

(create-struct person (firstname lastname))
(define test-p1 (person "Test" "Person"))
(person-firstname test-p1)
(person-lastname test-p1)
(person? test-p1)

(struct person-2 (firstname lastname))
(define test-p2 (person-2 "Test" "Person"))
(person-2-firstname test-p2)
(person-2-lastname test-p2)
(person-2? test-p2)
#|
Produces:
"Test"
"Person"
#t
"Test"
"Person"
#t
|#

This post re-implements the struct keyword just for fun. It also demonstrates how tricky compilation and runtime phases can be.

Bonus: Serialization works pretty well with a structure like this. If you try to serialize a struct you will see it binds the data to the current executing module:

#lang racket
(require racket/serialize)

(serializable-struct person (firstname lastname) #:transparent)
(serialize (person "Hello" "World"))
; Produces the following:
; '((3) 1 (('anonymous-module . deserialize-info:person-v0)) 0 () () (0 "Hello" "World"))

There is a good reason for the behaviour above, since a fish structure used by one programmer in one module is likely to be different from a fish structure used by another programmer.

However, you can pass #:prefab with a struct to achieve similar functionality with our struct macro above, that is, generate a “global” serialized value:

#lang racket
(require racket/serialize)

(struct person (firstname lastname) #:prefab)
(serialize (person "Hello" "World"))
; Produces the following:
; '((3) 0 () 0 () () (f person "Hello" "World"))

Refactoring using mathematical properties of min

Today I refactored a small piece of code.

Before:

$coupon_amount = self::get_coupon_prop( $coupon, 'amount' ) - $already_applied_coupons;

if ( isset( WC()->cart ) ) {
	$coupon_amount = min( WC()->cart->subtotal - $already_applied_coupons, $coupon_amount );
}

if ( $coupon_amount < 0 ) {
	return $discount;
}

After:

$coupon_amount = self::get_coupon_prop( $coupon, 'amount' );

if ( isset( WC()->cart ) ) {
	$coupon_amount = min( WC()->cart->subtotal, $coupon_amount );
}

$coupon_amount -= $already_applied_coupons;

if ( $coupon_amount < 0 ) {
	return $discount;
}

This was intuitive to me, but I wanted to prove that it really does the same thing. For that, we need to use the property min(a - c, b - c) = min(a, b) - c.

The usual definition for min is:
min(a, b) =  \left\{  	\begin{array}{ll}  		a,  & \mbox{if } a < b \\  		b,  & \mbox otherwise  	\end{array}  \right.

Now we can use proof by cases to prove our property which will allow us to confidently refactor our code. There are two cases we can assume:

1. Case a - c > b - c:
In this case, min(a - c, b - c) = b - c, and since from a - c > b - c we can conclude that a > b, then min(a, b) = b.
So, we have that b - c = b - c, which proves this case.

2. Case a - c \le b - c:
In this case, min(a - c, b - c) = a - c, and since from a - c < b - c we can conclude that a < b, then min(a, b) = a.
So, we have that a - c = a - c, which proves this case.

Thus, the property min(a - c, b - c) = min(a, b) - c is proven. 🙂

There are also some other properties like min(a, a) = a and min(a, b) = min(b, a) but you can try doing these yourself. Stuff with max is proven analogously.

Dafny – programming language for formal specifications

Dafny is an imperative compiled language that supports formal specification through preconditions, postconditions, loop invariants.

I found out about it from The Great Theorem Prover Showdown, while reading Hacker News one day.

It only seems to target C#, but can automatically prove properties about functions nevertheless.

I finished Dafny Tutorial and it was pretty straight-forward. The tutorial should be easy to follow for anyone with basic knowledge with the C programming language (and mathematical logic). The examples for the Dafny tutorial can be ran from the browser so you do not have to install it locally to go through it.

The reason why I like this programming language is because it seems much simpler than e.g. Coq or Idris that are based on dependent types / Calculus of Constructions (however, note that it still shares some stuff such as algebraic data types, or inductive types).

The tutorial starts with functions, pre-post conditions and assertions, and then proceeds with loop invariants. It proves basic properties for abs/max, etc.

For example, abs takes an integer and produces an integer greater than or equal to zero. Here’s the proof for that:

method Abs(x: int) returns (y: int)
ensures 0 <= y
{
y := 0;
}

With ensures we set the post-conditions (alternatively, requires is for pre-conditions). Pretty simple and neat, right? 🙂

Like pre- and postconditions, an invariant is a property that is preserved for each execution of the loop. Here’s an example for an invariant:

method m(n: nat)
{
var i := 0;
while i < n
invariant 0 <= i
{
i := i + 1;
}
}

However if you go through the tutorial you will see by the exercises that the challenge in picking a good loop invariant is finding one that is preserved with each loop execution, but also one that lets you prove what you need.

Finally, example for quantifiers:

method m()
{
assert forall k :: k < k + 1;
}

The tutorial takes only a couple of hours to finish, and for anyone interested in programming languages like this I highly recommend you go through it 🙂

The conclusion paragraph of the tutorial states it well:

Even if you do not use Dafny regularly, the idea of writing down exactly what it is that the code does is a precise way, and using this to prove code correct is a useful skill. Invariants, pre- and post conditions, and annotations are useful in debugging code, and also as documentation for future developers. When modifying or adding to a codebase, they confirm that the guarantees of existing code are not broken. They also ensure that APIs are used correctly, by formalizing behavior and requirements and enforcing correct usage. Reasoning from invariants, considering pre- and postconditions, and writing assertions to check assumptions are all general computer science skills that will benefit you no matter what language you work in.

For all the tutorial exercises, you can check out my Dafny Git repository.

Lisp interpreter

Here’s a fun exercise:

1. Write a Lisp interpreter in PHP (done a few years ago here)
2. Write a Lisp interpreter in Lisp (done here)

This made me realize why DSLs are much easier to do in a Lisp than in any other programming language.

It has to do with the underlying structure of the programming language. I believe that in a Lisp we have complete control over the abstract syntax tree and it is much easier to tokenize/parse/evaluate. We just read a string and then have a function that recursively evaluates the AST produced.

Feel free to compare the code and browse around, and let me know your thoughts in the comments section.