CoC base terms – Type and Prop

In mathematics, a judgement is a logical statement in the meta language
An inference rule means that for a given list of judgements $J_1, J_2, \ldots, J_n$ , we can infer that $J$ , which we denote as $\frac{J_1, J_2, \ldots, J_N}{J}$
Calculus of Constructions is a type theory based on intuitionistic logic.

If 1 and 2 don’t make much sense, I found this picture on Twitter to be useful:

For $K$ being either $\text{Type}$ or $\text{Prop}$ , Calculus of Constructions has the following inference rule:

${\Gamma \ \vdash A \ : \ K \over {\Gamma, \ x:A \ \vdash \ x:A}}$

What this rule means is that $x$ can only be of type $A$ , if $A$ is of type $\text{Type}$ , or $A$ is of type $\text{Prop}$ .

The questions that I had were:

What are the (dis)advantages of limiting $A$ to be $K$ ?
Alternatively, what are the (dis)advantages of allowing infinite subtyping (allowing terms to be used as types)?
Why do we have these base types $\text{Type}$ and $\text{Prop}$ ?

First of all, we can already “emulate” something like “infinite subtyping” in Coq. For example, to “construct” foo : 3 : Nat : Type we can implement F : nat -> Type, then we have that F 3 : Type. Further, we can implement F' : F 3 -> Type, so F' foo is possible. It’s still of type Type but it kinda works.

But, what happens if we lift this restriction? Well that’s not an easy thing to do. At the very least we need some kind of well-formedness condition in place of $A : K$ , otherwise we can get judgements like $x : (\text{true} + 0) \ \vdash \ x : (\text{true} + 0)$ . So plainly lifting that restriction not only allows arbitrary terms that are not types, it even allows ill-typed terms which do not type-check.

So foo : 3 : Nat : Type kind of makes no sense. To the left, we have the values we actually compute on; types classify those, and can in turn be classified. In other words, a : b means that a is an element of the “collection” b, but some of these elements are not collections.

So the reason why we have types/values distinction is computation and we can’t really bring them at the “same” level, even though dependent types try to blur the line between the two. We can have values in types, and types in values, but there’s still a difference.

This is why that judgement rule is so important 🙂

As usual, somebody had already considered this 🙂 I started a discussion on #coq@freenode and ##dependent@freenode and got some great answers. Thanks to pgiarrusso, lyxia, and mietek.

Proving Monoids with Idris

I was watching an interesting video, Opening Keynote by Edward Kmett at Lambda World 2018. Note in the beginning of that video how in Haskell you’d put code comments in place of proofs.

Enter Idris.

As a short example, we can abstract a proven Monoid as an interface:

interface ProvenMonoid m where
	mappend : m -> m -> m
	mempty  : m
	assocProof : (a, b, c : m) -> mappend a (mappend b c) = mappend (mappend a b) c
	idProofL   : (a : m) -> mappend mempty a = a
	idProofR   : (a : m) -> mappend a mempty = a

Do Nats make a Monoid? Sure they do, and here’s the proof for that:

implementation ProvenMonoid Nat where
	mappend x y = x + y
	mempty      = 0

	-- Proofs that this monoid is really a monoid.
	assocProof  = plusAssociative
	idProofL    = plusZeroLeftNeutral
	idProofR    = plusZeroRightNeutral

Note that plusAssociative, plusZeroLeftNeutral, and plusZeroRightNeutral are already part of Idris’ prelude. In the next example we’ll do an actual proof ourselves instead of re-using existing proofs.

What about List of Nats? Yep.

implementation ProvenMonoid (List Nat) where
	mappend x y = x ++ y
	mempty      = []

	-- Proofs that this monoid is really a monoid.
	assocProof [] b c      = Refl
	assocProof (a::as) b c = rewrite assocProof as b c in Refl
	idProofL a             = Refl
	idProofR []            = Refl
	idProofR (x::xs)       = rewrite idProofR xs in Refl

Some valid questions are:

Why should I care about monoids?
- Edward covers this in the video at around 3:00.
Why should I care about proving monoids?
- Let’s assume that there’s an algorithm running in production that relies on Monoids. What we don’t want to happen is associativity to break, because if it does then the “combine” function applied on all of the produced results will likely be wrong.

🙂

EDIT: Updated first example per u/gallais’s comment.

Partial orders in Idris

A binary relation $R$ is a partial order if the following properties are satisfied:

Reflexivity, i.e. $\forall a, a R a$
Transitivity, i.e. $\forall a, b, c, a R b \land b R c \to a R c$
Antisymmetry, i.e. $\forall a, b, a R b \land b R a \to a = b$

Let’s abstract this in Idris as an interface:

interface PartialOrder (a : Type) (Order : a -> a -> Type) | Order where
    total proofReflexive     : Order n n
    total proofTransitive    : Order n m -> Order m p -> Order n p
    total proofAntisymmetric : Order n m -> Order m n -> n = m

The interface PartialOrder accepts a Type and a relation Order on it. Since the interface has more than two parameters, we specify that Order is a determining parameter (the parameter used to resolve the instance). Each function definition corresponds to the properties above.

This interface is too abstract as it is, so let’s build a concrete implementation for it:

implementation PartialOrder Nat LTE where
    proofReflexive {n = Z}   = LTEZero
    proofReflexive {n = S _} = LTESucc proofReflexive

    proofTransitive LTEZero           _                 = LTEZero
    proofTransitive (LTESucc n_lte_m) (LTESucc m_lte_p) = LTESucc (proofTransitive n_lte_m m_lte_p)

    proofAntisymmetric LTEZero           LTEZero           = Refl
    proofAntisymmetric (LTESucc n_lte_m) (LTESucc m_lte_n) = let IH = proofAntisymmetric n_lte_m m_lte_n in rewrite IH in Refl

Now you can go and tell your friends that the data type LTE on Nats makes a partial order 🙂

Mathematical structure of `git-bisect`

According to Wikipedia, the bisection method in mathematics is a root-finding method that repeatedly bisects an interval and then selects a subinterval in which a root must lie for further processing. We can use this method to find zeroes of some continuous function.

Finding zeroes of a function is a very useful concept in practice. Since the function can be almost anything, this means that we can solve about any equality.

For a given continuous function $f(x)$ , and an interval $(a, b)$ , where $f(a) > 0$ and $f(b) < 0$ (that is, the function changes sign for some value in that interval), the algorithm thus tries to minimize $\epsilon > 0$ such that $|f(c)| < \epsilon$ , where $c \in (a, b)$ .

The algorithm is as follows:

Set interval $I = (a, b)$
Calculate
- If $f(\frac{I}{2})$ is sufficiently small, stop
- Else if $f(\frac{I}{2}) > 0$ , then set $b := \frac{I}{2} = \frac{a+b}{2}$
- Else, set $a := \frac{I}{2} = \frac{a+b}{2}$
Goto 1

As always, definitions are best understood with examples, so we’ll give a few examples.

Example 1: Approximating square root

We will start by approximating the value of $\sqrt{5}$ . The function we want to find zero for is then $f(x) = x^2 - 5$ , since $f(\sqrt{5}) = 0$ . We have that $f(2) < 0 < f(4)$ , thus our interval is $I = (2, 4)$ .

Let's apply the algorithm a few times now:

a	b	$f(\frac{a+b}{2})$
2	4	$f(3) = 4 > 0$
2	3	$f(2.5) = 1.25 > 0$
2	2.5	$f(2.25) = 0.0625 > 0$
2	2.25	$f(2.125) = -0.48... < 0$
2.125	2.25	$f(2.1875) = -0.21... < 0$
2.1875	2.25	$f(2.21875) = -0.07... < 0$

So, 2.21875 is a close approximation of $\sqrt{5} = 2.236...$ . We can make it even closer by applying the algorithm a couple of more times.

Example 2: Finding a bad commit in a sequence of commits

As a second example, let’s imagine a function that will represent a bunch of Git commits, where one of the commits are bad. Let’s pick such a function at random, maybe $f(x) = -x^3 + 105x^2 - 3675x + 42875$ . We will also agree that the function returns a value greater than 0 if the software is functioning at that point of commit, and a value less than or equal to 0 otherwise.

How will you find the bad commit? Brute-force or linear search is one way, but it may not be as efficient as bisecting. You can graph the function and look at its zeroes, but that’s a bit cheating, and usually impossible to do in practice with git-bisect since we cannot cover all predicates $P$ (in which case $P$ here is “software is functioning”), and thus cannot construct a graph of all such functions w.r.t commits. Since we’re cool, we’ll use the bisection method.

Let’s start at random, knowing at least one good and one bad commit. Maybe picking the interval $(10, 50)$ is a good start. We have that $f(50) < 0 < f(10)$ .

Our task is to find which commit caused the software to break. Let's apply the same algorithm.

a	b	$f(\frac{a+b}{2})$
50	10	$f(30) = 125 > 0$
50	30	$f(40) = -125 < 0$
40	30	$f(35) = 0$

So, in just 3 steps we found that the bad commit was 35.

Git-bisect

git-bisect is a very useful command. Given a sequence of commits, it allows you to find a commit that satisfies some property.

The way bisect works is that it will find zeroes of the function $f(x, y) = \text{commit} \ x \ \text{satisfies} \ P(y)$ . So, given an interval of commits $(c_1, c_2, \ldots, c_n)$ , the function will return $c_k$ such that $P(y)$ holds for it.

The fastest way to do that is to use bisection, which we explained earlier. Git uses good and bad for bisecting left and right.

For example, let’s assume we have the following setup:

bor0@boro:~$ git init
Initialized empty Git repository in /Users/bor0/.git/
bor0@boro:~$ echo "Hello World!" > test.txt
bor0@boro:~$ git add test.txt && git commit -m "First commit"
[master (root-commit) faf3b15] First commit
 1 file changed, 1 insertion(+)
 create mode 100644 test.txt
bor0@boro:~$ echo "Hello Worldd!" >> test.txt
bor0@boro:~$ git add test.txt && git commit -m "Second commit"
[master c9b527b] Second commit
 1 file changed, 1 insertion(+)
bor0@boro:~$ echo "Hello World!" >> test.txt
bor0@boro:~$ git add test.txt && git commit -m "Third commit"
[master 8d28e4a] Third commit
 1 file changed, 1 insertion(+)

Now, if we check the contents of the file:

bor0@boro:~$ cat test.txt
Hello World!
Hello Worldd!
Hello World!

For the sake of this demo, let’s assume the the only acceptable string in a list of strings is “Hello World!”. So the latest commit is now broken. In order to find which commit broke this, we can use bisect as follows:

bor0@boro:~$ git bisect start
bor0@boro:~$ git bisect bad 8d28e4a
bor0@boro:~$ git bisect good faf3b15
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[c9b527bbd44542fb7d69df15ce82919055b36578] Second commit
bor0@boro:~$ cat test.txt 
Hello World!
Hello Worldd!
bor0@boro:~$ git bisect bad
c9b527bbd44542fb7d69df15ce82919055b36578 is the first bad commit
commit c9b527bbd44542fb7d69df15ce82919055b36578
Author: Boro Sitnikovski <buritomath@gmail.com>
Date:   Sun Oct 21 20:10:44 2018 +0200

    Second commit

:100644 100644 980a0d5f19a64b4b30a87d4206aade58726b60e3 0c5b693e0f16f325e967f6482d4f9fe02159472b M	test.txt
bor0@boro:~$

It was easy in this case, since we had 3 commits. But if you had 1000 commits, it would only take about 10 good or bad choices, which is cool.

So our property was $P(y) = \text{File y does not contain acceptable strings}$ , and so our function $f$ (the result of git bisect) returned c9b527bbd44542fb7d69df15ce82919055b36578, the first commit where the property was satisfied.

As a conclusion, it’s interesting to think that we’re actually finding a zero of a function when we use git-bisect 🙂

Bonus: Example implementation in Scheme:

(define (bisect f iterations low high)
  (letrec ([approx (/ (+ low high) 2)]
           [computed (f approx)])
    (cond ((zero? iterations) approx)
          ((> computed 0) (bisect f (- iterations 1) low approx))
          (else (bisect f (- iterations 1) approx high)))))

Interacting with it:

> (bisect (lambda (x) (- (* x x) 5)) 50 2.0 4)
2.236067977499789

Lambda calculus implementation in Scheme

Lambda calculus is a formal system for representing computation. As with most formal systems and mathematics, it relies heavily on substitution.

We will start by implementing a subst procedure that accepts an expression e, a source src and a destination dst which will replace all occurences of src with dst in e.

(define (subst e src dst)
  (cond ((equal? e src) dst)
        ((pair? e) (cons (subst (car e) src dst)
                         (subst (cdr e) src dst)))
        (else e)))

Trying it a couple of times:

> (subst '(lambda (x) x) 'x 'y)
'(lambda (y) y)
> (subst '(lambda (x) x) '(lambda (x) x) 'id)
'id

Next, based on this substitution we need to implement a beta-reduce procedure that, for a lambda expression $(\lambda x . t) s$ will reduce to $t[x := s]$ , that is, $t$ with all $x$ within $t$ replaced to $s$ .

Our procedure will consider 3 cases:

Lambda expression that accepts zero args – in which case we just return the body without any substitutions
Lambda expression that accepts a single argument – in which case we substitute every occurrence of that argument in the body with what’s passed to the expression and return the body
Lambda expression that accepts multiple arguments – in which case we substitute every occurrence of the first argument in the body with what’s passed to the expression and return a new lambda expression

Before implementing the beta reducer, we will implement a predicate lambda-expr? that returns true if the expression is a lambda expression, and false otherwise:

(define (lambda-expr? e)
  (and (pair? e)
       (equal? (car e) 'lambda)
       (list? (cadr e))))

Here’s the helper procedure which accepts a lambda expression e and a single argument x to pass to the expression:

(define (beta-reduce-helper e x)
  (cond ((and (lambda-expr? e)
              (pair? (cadr e))
              (pair? (cdadr e)))
         ; lambda expr that accepts multiple args
         (list 'lambda
               (cdadr e)
               (subst (caddr e) (caadr e) x)))
        ((and (lambda-expr? e)
              (pair? (cadr e)))
         ; lambda expr that accepts a single arg
         (subst (caddr e) (caadr e) x))
        ((and (lambda-expr? e)
              (equal? (cadr e) '()))
         ; lambda expr with zero args
         (caddr e))
        (else e)))

Then, our procedure beta-reduce will accept variable number of arguments, and apply each one of them to beta-reduce-helper:

(define (beta-reduce l . xs)
  (if (pair? xs)
      (apply beta-reduce
             (beta-reduce-helper l (car xs))
             (cdr xs))
      l))

Testing these with a few cases:

> (beta-reduce '(lambda (x y) x) 123)
'(lambda (y) 123)
> (beta-reduce '(lambda (x y) y) 123)
'(lambda (y) y)
> (beta-reduce '(lambda (x) (lambda (y) x)) 123)
'(lambda (y) 123)
> (beta-reduce '(lambda (x) (lambda (y) y)) 123)
'(lambda (y) y)

However, note this case:

> (beta-reduce '(lambda (n f x) (f (n f x))) '(lambda (f x) x))
'(lambda (f x) (f ((lambda (f x) x) f x)))

It seems that we can further apply beta reductions to simplify that expression. For that, we will implement lambda-eval that will recursively evaluate lambda expressions to simplify them:

(define (lambda-eval e)
  (cond ((can-beta-reduce? e) (lambda-eval (apply beta-reduce e)))
        ((pair? e) (cons (lambda-eval (car e))
                         (lambda-eval (cdr e))))
        (else e)))

But, what does it mean for an expression e to be beta reducible? The predicate is simply:

(define (can-beta-reduce? e)
  (and (pair? e) (lambda-expr? (car e)) (pair? (cdr e))))

Great. Let’s try a few examples now:

> ; Church encoding: 1 = succ 0
> (lambda-eval '((lambda (n f x) (f (n f x))) (lambda (f x) x)))
'(lambda (f x) (f x))
> ; Church encoding: 2 = succ 1
> (lambda-eval '((lambda (n f x) (f (n f x))) (lambda (f x) (f x))))
'(lambda (f x) (f (f x)))
> ; Church encoding: 3 = succ 2
> (lambda-eval '((lambda (n f x) (f (n f x))) (lambda (f x) (f (f x)))))
'(lambda (f x) (f (f (f x))))

There’s our untyped lambda calculus 🙂

There are a couple of improvements that we can do, for example implement define within the system to define variables with values. Another neat addition would be to extend the system with a type checker.

EDIT: As noted by a reddit user, the substitution procedure is not considering free/bound variables. Here’s a gist that implements that as well.