Lambda calculus with generalized abstraction

Lately I’ve been thinking about this: What do we get if we change lambda calculus’s abstraction from (λx.M) to (λN.M), where M and N are lambda terms, and x is a variable?

That is (following the structure from my previous post), our new syntax becomes:

<var>   ::= x
<abs>   ::= λ<term>.<term>
<app>   ::= <term><term>

<term>  ::= <var> | <abs> | <app>
<value> ::= <abs>

In Haskell code:

type VarName = String
 
data Term =
    TmVar VarName
    | TmAbs Term Term
    | TmApp Term Term deriving (Eq, Show)

With the new list of inference rules:

Name	Rule
E-App1	$\frac{t_1 \to t_1'}{t_1t_2 \to t_1't_2}$
E-App2	$\frac{t_2 \to t_2'}{v_1t_2 \to v_1t_2'}$
E-App3	$\frac{t_2 \to t_2', v_1t_2' \to v_1'}{(\lambda v_1.t_{12}) \ t_2 \to \lambda v_1'.t_{12} }$
E-AppAbs	$(\lambda x.t_{12})v_2 \to [x \mapsto v_2]t_{12}$

So the evaluator looks something like:

eval : Term -> Term
eval (TmApp (TmAbs (TmVar x) t12) v2)      = subst x t12 v2 -- E-AppAbs
eval (TmApp (TmAbs v1@(TmAbs _ _) t12) t2) = let t2' = eval t2 in let v1' = eval (TmApp v1 t2') in TmAbs v1' t12 -- E-App3
eval (TmApp v1@(TmAbs _ _) t2)             = let t2' = eval t2 in TmApp v1 t2' -- E-App2
eval (TmApp t1 t2)                         = let t1' = eval t1 in TmApp t1' t2 -- E-App1
eval (TmVar x)                             = TmVar x -- E-Var
eval x                                     = error ("No rule applies: " ++ show x)

subst is the same as in the previous post.

So with this system we could “control” the “arguments” in the abstraction through application.
We could represent λx.x and λy.x through a single abstraction, i.e. (λ(λx . x) . x) so we get to either pass x or y to it:

Main> let f1 = TmAbs (TmVar "x") (TmVar "x")
Main> let f2 = TmAbs (TmVar "y") (TmVar "x")
Main> let f3 = TmAbs (TmAbs (TmVar "x") (TmVar "x")) (TmVar "x")
Main> eval (TmApp f3 (TmVar "x")) == f1
True
Main> eval (TmApp f3 (TmVar "y")) == f2
True

Is this useful or not? I didn’t know 🙂 so as always I referred to ##dependent@freenode and learned a bunch of new stuff:

 <pgiarrusso> I suspect many metaprogramming facilities would allow generating `𝜆 foo. x` by picking `foo`, but I don't see any reason to support this specifically
 <pgiarrusso> to be sure, I'm not necessarily saying "there is no point"
 <pgiarrusso> also, for your example, it seems `if foo then λx.x else λy.x` would work as well
 <pgiarrusso> my best answer is, when you write a function, you want to know what are the function params, to refer to them

Some stuff on papers:

 <pgiarrusso> *now*, I'm sure that strange variants of LC exist which do make sense but look surprising
 <pgiarrusso> but I think people usually start from something that is (1) useful but (2) hard, and then build a formal system to support that
 <pgiarrusso> s/people/researchers/
 <bor0> how do they know that it's useful?
 <pgiarrusso> sometimes you do go the opposite way, but that's harder
 <pgiarrusso> bor0: take a PL paper, read the introduction, see the arguments they make
 <bor0> I don't understand how that determines the usefulness of the paper itself?
 <bor0> or the idea even
 <pgiarrusso> bor0: not sure what paper to pick, but http://worrydream.com/refs/Hughes-WhyFunctionalProgrammingMatters.pdf might be an example
 <pgiarrusso> lots of that paper is an argument that lazy evaluation is interesting and useful
 <pgiarrusso> it does not *prove* that it's useful, as that's not something you can prove
 <pgiarrusso> but that paper (and many others) show something that is hard to do, arguably desirable, and give a "better" way of doing it

Bonus 🙂

 <mietek> bor0: <gallais> Someone was talking about a lambda calculus with full terms instead of mere variables in binding position IIRC
 <mietek> bor0: <gallais> I just came across this work: https://academic.oup.com/jigpal/article/26/2/203/4781665
 <bor0> wow thanks mietek!
 <bor0> interesting how when humans think about a certain topic they almost always have overlapping ideas
 <bor0> (:philosoraptor: it just confirms Wadler's statement that mathematics is discovered, not invented, I guess)
 <bor0> but I am happy that I had thought of that, even if for 2 years late (article was published in 2017)

Update: Thanks to /u/__fmease__ for noticing an error in the E-App3 LaTeX rule.

Writing a lambda calculus type-checker in Haskell

This tutorial serves as a very short and quick summary of the first few chapters of TAPL.

My previous post provided an implementation for the untyped lambda calculus.

This post is focused on building the typed lambda calculus – lambda calculus with a type-checker.

3. Typed lambda calculus

For evaluation, the typed lambda calculus will use the same functions as the untyped, with the only change that every TmAbs will carry a type along. So we have:

data Term =
...
    | TmAbs VarName Type Term
...

The corresponding functions (eval, subst, freeVars) need to have the pattern matching clauses updated, and type passed in context where applicable.

3.1. Syntax

Per BNF, the syntax is:

<type> ::= TyVar | TyArr <type> <type>

In Haskell code:

data Type =
   TyVar
   | TyArr Type Type deriving (Eq, Show)

That is, well-formed terms in the lambda calculus are either variables, or functions.

3.2. Inference rules

The type rules for E-AppAbs and the syntax for abstraction (as we have seen) is changed in that it introduced the type information.

Here are the typing rules that we will now implement:

Name	Rule
T-Var	$\frac{a \text{ : }T \in \Gamma}{\Gamma \vdash a \text{ : }T}$
T-Abs	$\frac{\Gamma, a \text{ : }T_1 \vdash t_2 : T_2}{\Gamma \vdash \lambda x : T_1.t_2 : T_1 \to T_2}$
T-App	$\frac{\Gamma \vdash t_1 : T_2 \to T_1, \Gamma \vdash t_2 : T_2}{\Gamma \vdash t_1 t_2 : T_1 }$

We will re-use the environment code from the first tutorial, namely TyEnv, addType, and getTypeFromEnv. The Haskell implementation of the rules in the table above:

typeOf :: TyEnv -> Term -> Either String Type
typeOf env (TmVar n) =
    case getTypeFromEnv env n of
        Just t -> Right t
        _      -> error $ "No var found in env: " ++ show n
typeOf env (TmAbs n t1 te) =
    let newEnv = addType n t1 env
        t2 = typeOf newEnv te in
        case t2 of
            Right t2 -> Right $ TyArr t1 t2
            _        -> Left "Unsupported type for TmAbs"
typeOf env (TmApp t1 t2)   =
    let t1' = typeOf env t1
        t2' = typeOf env t2 in
        case t1' of
           Right (TyArr a b) | Right a == t2' -> Right b
           Right (TyArr a _)                  -> Left $ "Type mismatch between " ++ show t1' ++ " and " ++ show t2'
           _                                  -> Left "Unsupported type for TmApp"

Playing around with it a bit:

Main> typeEnv = addType "x" TyVar []
Main> typeOf typeEnv (TmVar "x")
Right TyVar
Main> let lcid = TmAbs "x" TyVar (TmVar "x") in typeOf typeEnv lcid
Right (TyArr TyVar TyVar)
Main> let lcid = TmAbs "x" TyVar (TmVar "x") in typeOf typeEnv (TmApp lcid (TmVar "x"))
Right TyVar

However, the function lcid only accepts a single variable and returns it. It will not type-check if we pass an abstraction to it. Thus, there are infinite id‘s that we must define in the simply typed lambda calculus: $\lambda x:\text{TyVar}.x$ , $\lambda x:\text{TyVar}\to\text{TyVar}.x$ , etc. This problem can be fixed by implementing the polymorphic lambda calculus.

3.3. Bonus: Recursion

Given the standard semantics, the simply typed lambda calculus is strongly normalizing: that is, well-typed terms always reduce to a value.

This is because recursion is not allowed by the typing rules: it is impossible to find types for fixed-point combinators and the looping term $\Omega =(\lambda x.~x~x)(\lambda x.~x~x)$ .

Recursion can be added to the language by either having a special operator ${\mathtt {fix}}_{\alpha }$ of type $(\alpha \to \alpha )\to \alpha$ or adding general recursive types, though both eliminate strong normalization.

Since it is strongly normalising, it is decidable whether or not a simply typed lambda calculus program halts: in fact, it always halts. We can therefore conclude that the language is not Turing complete.

3.4. Conclusion

Now that we have entered the typed world of lambda calculus, where do we go from here?

The lambda cube was introduced by the logician Hendrik Barendregt in 1991. The simply typed lambda calculus is found in the lower-left corner.

The other extensions include:

$\lambda 2$ – System-F (polymorphism): terms allowed to depend on types
$\lambda \text{P}$ – Lambda-P (dependent types): types depending on terms
$\lambda \omega$ – System-F omega (type operators): types depending on types, corresponding to type operators

The system in the upper-right corner is called Calculus of Constructions and is the most powerful system, both on the logical and computational side.

Writing a lambda calculus evaluator in Haskell

This tutorial serves as a very short and quick summary of the first few chapters of TAPL.

My previous post was a general overview of how we can design an evaluator and a type checker.

This post is more focused on building the lambda calculus from scratch. It provides an interesting overview of some design decisions particularly for the lambda calculus.

2. Untyped lambda calculus

Lambda calculus is a formal system for expressing computation. The grammar rules are divided in two parts: function abstraction and function application. Function abstraction defines what a function does, and function application “computes” a function. For example $f(x) = x + 1$ is a function abstraction and $f(3)$ is a function application. The equality sign = is replaced with a dot, and instead of writing $f(x)$ we write $\lambda x$ . To represent $f(x) = x + 1$ we write $\lambda x.x + 1$ .

It was introduced by the mathematician Alonzo Church in the 1930s as part of his research of the foundations of mathematics.

2.1. Syntax

Our syntax, per BNF is defined as follows:

<var>   ::= x
<abs>   ::= λ<variable>.<term>
<app>   ::= <term><term>

<term>  ::= <var> | <abs> | <app>
<value> ::= <abs>

Note that a value in the system is just an abstraction. In Haskell code:

type VarName = String

data Term =
    TmVar VarName
    | TmAbs VarName Term
    | TmApp Term Term deriving (Eq, Show)

We set variables to additionally hold VarName (which is a string in this case) for pretty printing and comparison of variables.

2.2. Inference rules (evaluator)

The list of inference rules:

It looks like we have a new notation here: $[x \mapsto v_2]t_{12}$ means that we change all occurences of $x$ with $v_2$ within $t_{12}$ .

So the evaluator looks something like:

eval :: Term -> Term
eval (TmApp (TmAbs x t12) v2@(TmAbs _ _)) = subst x t12 v2 -- E-AppAbs
eval (TmApp v1@(TmAbs _ _) t2)            = let t2' = eval t2 in TmApp v1 t2' -- E-App2
eval (TmApp t1 t2)                        = let t1' = eval t1 in TmApp t1' t2 -- E-App1
eval _ = error "No rule applies"

Note how we pattern match against v2@(TmAbs _ _) to detect a variable. We do that because we didn’t “treat” variables separately in our data definition.

The only thing left for us to implement is subst, and we should be good.

TAPL defined it for us in 5.3.5:

We can convert the definitions to Haskell code easily:

-- subst VarName in Term to Term
subst :: VarName -> Term -> Term -> Term
subst x (TmVar v) newVal
    | x == v    = newVal
    | otherwise = TmVar v
subst x (TmAbs y t1) newVal
    | x == y                                  = TmAbs y t1
    | x /= y && (y `notElem` freeVars newVal) = TmAbs y (subst x t1 newVal)
    | otherwise                               = error $ "Cannot substitute '" ++ show x ++ "' in term '" ++ show (TmAbs y t1) ++ "'"
subst x (TmApp t1 t2) newVal = TmApp (subst x t1 newVal) (subst x t2 newVal)

But what is freeVars? Well, it’s the list of free variables in an expression. For example, in $\lambda x.x \ y$ , we have that x is bound (because it appears in an argument in an abstraction), and y is free. TAPL already defined it as FV in 5.3.2:

FV(\lambda x.t_1) = FV(t_1) \setminus \{x\}

Implementation in Haskell is also straight-forward with this:

freeVars :: Term -> [VarName]
freeVars (TmVar x) = [x]
freeVars (TmAbs x t1) = nub (freeVars t1) \\ [x]
freeVars (TmApp t1 t2) = freeVars t1 ++ freeVars t2

Running a few examples:

Main> let lctest1 = (TmAbs "f" (TmAbs "x" (TmVar "x")))
Main> let lctest2 = (TmAbs "f" (TmAbs "x" (TmVar "y")))
Main> eval $ TmApp lctest1 lctest2
TmAbs "x" (TmVar "x")
Main> eval $ TmApp lctest2 lctest1
TmAbs "x" (TmVar "y")
Main> eval (TmApp (eval $ TmApp lctest2 lctest1) lctest1)
TmVar "y"

Note, however:

Main> eval $ TmApp (TmAbs "x" (TmAbs "y" (TmVar "y"))) (TmAbs "x" (TmVar "y"))
*** Exception: Cannot substitute '"x"' in term 'TmAbs "y" (TmVar "y")'

This happens by design, because we chose our system to work up to renaming of bound variables (the otherwise case in subst for abstraction). So, subst isn’t total. Church fixed this issue by using alpha conversion – which is just renaming bound variables. So for example $\lambda x.x$ could be re-written as $\lambda y.y$ .

Now we can “fix” our evaluation error by alpha conversion:

Main> eval $ TmApp (TmAbs "x" (TmAbs "z" (TmVar "z"))) (TmAbs "x" (TmVar "y"))
TmAbs "z" (TmVar "z")

There’s an alternative approach to fix this issue: De Bruijn indices.

2.3. Bonus: De Bruijn indices

The point with De Bruijn indices is that they remove the need of alpha conversion. They allow that by removing the variable argument in the lambda abstraction, and rely on integers (indices) to refer to variables. The index actually denotes the number of binders that are in scope between that occurrence and its corresponding binder.

For example, we have that $\lambda f . \lambda x . x = \lambda \lambda 0$ because the distance from the x in the application to the abstraction is 0. Similar reasoning for $\lambda f . \lambda x . f = \lambda \lambda 1$ , etc.

We start with the following data definitions:

type VarNameB = Int
type Indices = (VarName, Int)

data TermB =
    TmVarB VarNameB
    | TmAbsB TermB
    | TmAppB TermB TermB deriving (Eq, Show)

We will also create a function that will convert an old term Term to a De Bruijn’s term TermB:

bruijn :: Term -> Maybe TermB
bruijn e = go [] e where
    go :: [Indices] -> Term -> TermB
    go m (TmAbs x e)  = TmAbsB $ let m' = map (\(x, y) -> (x, y + 1)) m in go ((x, 0) : m') e
    go m (TmApp e e') = TmAppB (go m e) (go m e')
    go m (TmVar x)    = TmVarB $ findByFst m x
    findByFst :: [Indices] -> VarName -> Int
    findByFst ((x, y) : xs) x' = if x' == x then y else findByFst xs x'
    findByFst _ _              = error "Variable not found"

Running it:

Main> bruijn (TmAbs "f" (TmAbs "x" (TmVar "x")))
TmAbsB (TmAbsB (TmVarB 0))

However:

Main> bruijn (TmAbs "f" (TmAbs "x" (TmVar "y")))
TmAbsB (TmAbsB (TmVarB *** Exception: Variable not found

This could be fixed by the function returning a Maybe TermB, or exposing go to consider the list of free variables and do the proper substitution. However, doing this or writing an evaluator for this system is left as an exercise to the reader 😀

2.4. Conclusion

The lambda calculus has a very minimal system, yet is a bit trickier to implement than the previous system we’ve seen. When we introduce function abstraction and application, we introduce complexity. We needed to account for contextual substitution.

An interesting question would be: is there a system that is simpler to implement, yet as powerful as lambda calculus? Maybe combinatory logic? But it’s not really pleasant to use as it’s not readable to us. It’s almost as what we essentially did is choose the minimal “most” readable system, and solve the side-effects (i.e. solving the issue of alpha conversion) of reading/parsing those systems that we humans have as a limitation. I mean, if it was easy for humans to parse SKIKSIKSKS easily, we wouldn’t use lambda calculus (which requires us as language designers to solve all the side-effects due to complexity), right?

In the next tutorial we’ll be implementing a type-checker for the lambda calculus – the typed lambda calculus.

Writing a simple evaluator and type-checker in Haskell

This tutorial serves as a very short and quick summary of the first few chapters of TAPL. Huge thanks to the ##dependent@freenode community (pgiarrusso, lyxia, mietek, …) for clearing all my questions, which provided a good sense of the design of such systems.

1. Untyped simple language

1.1. Syntax

Our syntax, per BNF is defined as follows:

<term>  ::= <bool> | <num> | If <bool> Then <expr> Else <expr> | <arith>
<bool>  ::= T | F | IsZero <num>
<num>   ::= O
<arith> ::= Succ <num> | Pred <num>

For simplicity, we merge all them together in a single Term.

data Term =
    T
    | F
    | O
    | IfThenElse Term Term Term
    | Succ Term
    | Pred Term
    | IsZero Term
    deriving (Show, Eq)

1.2. Inference rules (evaluator)

The semantics we use here are based on so called small-step style, which state how a term is rewritten to a specific value, written $t \to v$ . In contrast, big-step style states how a specific term evaluates to a final value, written $t \Downarrow v$ .

Evaluation of a term is just pattern matching the inference rules. Given a term, it should produce a term:

eval :: Term -> Term

The list of inference rules:

Name	Rule
E-IfTrue	$\text{If T Then } t_2 \text{ Else } t_3 \to t_2$
E-IfFalse	$\text{If F Then } t_2 \text{ Else } t_3 \to t_3$
E-If	$\frac{t_1 \to t'}{\text{If }t_1 \text{ Then } t_2 \text{ Else } t_3 \to \text{If }t' \text{ Then } t_2 \text{ Else } t_3}$
E-Succ	$\frac{t_1 \to t'}{\text{Succ }t_1 \to \text{ Succ } t'}$
E-PredZero	$\text{Pred O} \to \text{O}$
E-PredSucc	$\text{Pred(Succ } k \text {)} \to k$
E-Pred	$\frac{t_1 \to t'}{\text{Pred }t_1 \to \text{ Pred } t'}$
E-IszeroZero	$\text{IsZero O} \to \text{T}$
E-IszeroSucc	$\text{IsZero(Succ } k \text {)} \to \text{F}$
E-IsZero	$\frac{t_1 \to t'}{\text{IsZero }t_1 \to \text{ IsZero } t'}$

As an example, the rule E-IfTrue written using big-step semantics would be $\frac{t_1 \Downarrow \text{T}, t2 \Downarrow v}{\text{If T} \text{ Then } t_2 \text{ Else } t_3 \Downarrow t_2}$ .

Given the rules, by pattern matching them we will reduce terms. Implementation in Haskell is mostly “copy-paste” according to the rules:

eval (IfThenElse T t2 t3) = t2
eval (IfThenElse F t2 t3) = t3
eval (IfThenElse t1 t2 t3) = let t' = eval t1 in IfThenElse t' t2 t3
eval (Succ t1) = let t' = eval t1 in Succ t'
eval (Pred O) = O
eval (Pred (Succ k)) = k
eval (Pred t1) = let t' = eval t1 in Pred t'
eval (IsZero O) = T
eval (IsZero (Succ t)) = F
eval (IsZero t1) = let t' = eval t1 in IsZero t'
eval _ = error "No rule applies"

1.3. Conclusion

As an example, evaluating the following:

Main> eval $ Pred $ Succ $ Pred O
Pred O

Corresponds to the following inference rules:

             ----------- E-PredZero
             pred O -> O
       ----------------------- E-Succ
       succ (pred O) -> succ O
------------------------------------- E-Pred
pred (succ (pred O)) -> pred (succ O)

However, if we do:

Main> eval $ IfThenElse O O O
*** Exception: No rule applies

It’s type-checking time!

2. Typed simple language

2.1. Type syntax

In addition to the previous syntax, we create a new one for types, so per BNF it’s defined as follows:

<type> ::= Bool | Nat

In Haskell:

data Type =
    TBool
    | TNat

2.2. Inference rules (type)

Getting a type of a term expects a term, and either returns an error or the type derived:

typeOf :: Term -> Either String Type

Next step is to specify the typing rules.

Name	Rule
T-True	$\text{T : TBool}$
T-False	$\text{F : TBool}$
T-Zero	$\text{O : TNat}$
T-If	$\frac{t_1\text{ : Bool}, t_2\text{ : }T, t_3\text{ : }T}{\text{If }t_1 \text{ Then } t_2 \text{ Else } t_3\text{ : }T}$
T-Succ	$\frac{t\text{ : TNat }}{\text{Succ } t \text{ : TNat}}$
T-Pred	$\frac{t\text{ : TNat }}{\text{Pred } t \text{ : TNat}}$
T-IsZero	$\frac{t\text{ : TNat }}{\text{IsZero } t \text{ : TBool}}$

Code in Haskell:

typeOf T = Right TBool
typeOf F = Right TBool
typeOf O = Right TNat
typeOf (IfThenElse t1 t2 t3) =
    case typeOf t1 of
        Right TBool ->
            let t2' = typeOf t2
                t3' = typeOf t3 in
                if t2' == t3'
                then t2'
                else Left "Types mismatch"
        _ -> Left "Unsupported type for IfThenElse"
typeOf (Succ k) =
    case typeOf k of
        Right TNat -> Right TNat
        _ -> Left "Unsupported type for Succ"
typeOf (Pred k) =
    case typeOf k of
        Right TNat -> Right TNat
        _ -> Left "Unsupported type for Pred"
typeOf (IsZero k) =
    case typeOf k of
        Right TNat -> Right TBool
        _ -> Left "Unsupported type for IsZero"

2.3. Conclusion

Going back to the previous example, we can now “safely” evaluate (by type-checking first), depending on type-check results:

Main> typeOf $ IfThenElse O O O
Left "Unsupported type for IfThenElse"
Main> typeOf $ IfThenElse T O (Succ O)
Right TNat
Main> typeOf $ IfThenElse F O (Succ O)
Right TNat
Main> eval $ IfThenElse T O (Succ O)
O
Main> eval $ IfThenElse F O (Succ O)
Succ O

3. Environments

Our neat language supports evaluation and type checking, but does not allow for defining constants. To do that, we will need kind of an environment which will hold information about constants.

type TyEnv = [(String, Type)] -- Type env
type TeEnv = [(String, Term)] -- Term env

We also extend our data type to contain TVar for defining variables, and meanwhile also introduce the Let ... in ... syntax:

data Term =
    ...
    | TVar String
    | Let String Term Term

Here are the rules for variables:

Name	Rule
Add binding	$\frac{\Gamma, a \text{ : }T}{\Gamma \vdash a \text{ : }T}$
Retrieve binding	$\frac{a \text{ : }T \in \Gamma}{\Gamma \vdash a \text{ : }T}$

Haskell definitions:

addType :: String -> Type -> TyEnv -> TyEnv
addType varname b env = (varname, b) : env

getTypeFromEnv :: TyEnv -> String -> Maybe Type
getTypeFromEnv [] _ = Nothing
getTypeFromEnv ((varname', b) : env) varname = if varname' == varname then Just b else getTypeFromEnv env varname

Analogously for terms:

addTerm :: String -> Term -> TeEnv -> TeEnv
addTerm varname b env = (varname, b) : env

getTermFromEnv :: TeEnv -> String -> Maybe Term
getTermFromEnv [] _ = Nothing
getTermFromEnv ((varname', b) : env) varname = if varname' == varname then Just b else getTermFromEnv env varname

3.1. Inference rules (evaluator)

At this point we stop giving mathematical inference rules, but it’s a good homework if you want to do it 🙂

eval' is exactly the same as eval, with the following additions:
1. New parameter (the environment) to support retrieval of values for constants
2. Pattern matching for the new Let ... in ... syntax

eval' :: TeEnv -> Term -> Term
eval' env (TVar v) = case getTermFromEnv env v of
    Just ty -> ty
    _       -> error "No var found in env"
eval' env (Let v t t') = eval' (addTerm v (eval' env t) env) t'

We will modify IfThenElse slightly to allow for evaluating variables:

> eval' env (IfThenElse T t2 t3) = eval' env t2
> eval' env (IfThenElse F t2 t3) = eval' env t3
> eval' env (IfThenElse t1 t2 t3) = let t' = eval' env t1 in IfThenElse t' t2 t3

Copy-pasting the others:

eval' env (Succ t1) = let t' = eval' env t1 in Succ t'
eval' _ (Pred O) = O
eval' _ (Pred (Succ k)) = k
eval' env (Pred t1) = let t' = eval' env t1 in Pred t'
eval' _ (IsZero O) = T
eval' _ (IsZero (Succ t)) = F
eval' env (IsZero t1) = let t' = eval' env t1 in IsZero t'

Also since we modified IfThenElse to recursively evaluate, we also need to consider base types:

eval' _ T = T
eval' _ F = F
eval' _ O = O

3.2. Inference rules (type)

typeOf' is exactly the same as typeOf, with the only addition to support env (for retrieval of types for constants in an env) and the new let syntax.

typeOf' :: TyEnv -> Term -> Either String Type
typeOf' env (TVar v) = case getTypeFromEnv env v of
    Just ty -> Right ty
    _       -> Left "No type found in env"
typeOf' _ T = Right TBool
typeOf' _ F = Right TBool
typeOf' _ O = Right TNat
typeOf' env (IfThenElse t1 t2 t3) =
    case typeOf' env t1 of
        Right TBool ->
            let t2' = typeOf' env t2
                t3' = typeOf' env t3 in
                if t2' == t3'
                then t2'
                else Left $ "Type mismatch between " ++ show t2' ++ " and " ++ show t3'
        _ -> Left "Unsupported type for IfThenElse"
typeOf' env (Succ k) =
    case typeOf' env k of
        Right TNat -> Right TNat
        _ -> Left "Unsupported type for Succ"
typeOf' env (Pred k) =
    case typeOf' env k of
        Right TNat -> Right TNat
        _ -> Left "Unsupported type for Pred"
typeOf' env (IsZero k) =
    case typeOf' env k of
        Right TNat -> Right TBool
        _ -> Left "Unsupported type for IsZero"
typeOf' env (Let v t t') = case typeOf' env t of
   Right ty -> typeOf' (addType v ty env) t'
   _        -> Left "Unsupported type for Let"

Some examples:

Main> let termEnv = addTerm "a" O $ addTerm "b" (Succ O) $ addTerm "c" F []
Main> let typeEnv = addType "a" TNat $ addType "b" TNat $ addType "c" TBool []
Main> let e = IfThenElse T (TVar "a") (TVar "b") in (eval' termEnv e, typeOf' typeEnv e)
(O,Right TNat)
Main> let e = IfThenElse T (TVar "a") (TVar "c") in (eval' termEnv e, typeOf' typeEnv e)
(O,Left "Type mismatch between Right TNat and Right TBool")
Main> let e = IfThenElse T F (TVar "c") in (eval' termEnv e, typeOf' typeEnv e)
(F,Right TBool)
Main> let e = (Let "y" (TVar "a") (Succ (TVar "y"))) in eval' e termEnv 
Succ O
Main> let e = (Let "y" (TVar "a") (Succ (TVar "y"))) in typeOf' e typeEnv
Right TNat

4. Product and union

Products and unions are awesome, so we will implement them! We extend the value constructors as follows:

data Term =
    ...
    | Pair Term Term
    | EitherL Term
    | EitherR Term

…and for the type checker:

data Type =
    ...
    | TyVar String       -- constants, use later
    | TyPair Type Type   -- pairs, use later
    | TyEither Type Type -- sum, use later

We add handling to the evaluator:

eval' _ (Pair a b) = Pair a b
eval' env (EitherL a) = eval' env a
eval' env (EitherR a) = eval' env a

…and the type checker:

> typeOf' env (Pair a b) =
>     let a' = typeOf' env a
>         b' = typeOf' env b in
>         case a' of
>             Right ta -> case b' of
>                 Right tb -> Right $ TyPair ta tb
>                 Left _ -> Left "Unsupported type for Pair"
>             Left _  -> Left "Unsupported type for Pair"
> typeOf' env (EitherL a) = case (typeOf' env a) of
>     Right t -> Right $ TyEither t (TyVar "x")
>     _       -> Left "Unsupported type for EitherL"
> typeOf' env (EitherR a) = case (typeOf' env a) of
>     Right t -> Right $ TyEither (TyVar "x") t
>     _       -> Left "Unsupported type for EitherR"

Where TyVar "x" represents a polymorphic variable, and is unhandled since our system has no real support for polymorphism.

Main> let e = IfThenElse T (Pair (TVar "a") (TVar "b")) O in (eval' termEnv e, typeOf' typeEnv e)
(Pair (TVar "a") (TVar "b"),Left "Type mismatch between Right (TyPair TNat TNat) and Right TNat")
Main> let e = IfThenElse T (Pair (TVar "a") (TVar "b")) (Pair O O) in (eval' termEnv e, typeOf' typeEnv e)
(Pair (TVar "a") (TVar "b"),Right (TyPair TNat TNat))
Main> eval' termEnv (EitherL (TVar "a"))
O
Main> typeOf' typeEnv (EitherL (TVar "a"))
Right (TyEither TNat (TyVar "x"))

However, note for union, we have the following:

Main> let e = IfThenElse T (EitherL O) (EitherR F) in (eval' termEnv e, typeOf' typeEnv e)
(O,Left "Type mismatch between Right (TyEither TNat (TyVar \"x\")) and Right (TyEither TBool (TyVar \"x\"))")

This can be fixed by implementing a new function typeEq, that extends the Eq of Type, which would always pass the check for our fake polymorphic variable. It could look something like:

typeEq :: Type -> Type -> Bool
typeEq (TyEither x (TyVar "x")) (TyEither (TyVar "x") y) = True
typeEq x y = x == y

5. Conclusion

The evaluator and the type checker almost live in two separate worlds — they do two separate tasks. If we want to ensure the evaluator will produce the correct results, the first thing is to assure that the type-checker returns no error. It was really interesting to understand the mechanism of type-checking in-depth 🙂

Another interesting observation is how pattern matching the data type is similar to the hypothesis part of the inference rules. The relationship is due to the Curry-Howard isomorphism. When we have a formula $a \vdash b$ (a implies b), and pattern match on a, it’s as if we assumed a and need to show b.

TAPL starts with untyped lambda calculus and proceeds to typed lambda calculus, but is focusing only on type environment, while leaving the variables environment. While this is a good approach, I feel producing a system like in this tutorial is a good first step before jumping to lambda calculus.

My next steps are to finish reading TAPL, and continue playing with toy languages. The code used in this tutorial is available as a literate Haskell file. I also had some fun with De Bruijn index.

Bonus: Haskell is based on the typed lambda calculus, while Lisp is based on the untyped lambda calculus. It is interesting to note the difference in writing a type-checker in the typed lambda calculus vs the untyped one. Here’s an implementation in Lisp for sections 1 and 2.

Igpay Igpay Atinlay

The other day I watched an interesting episode of Teen Titans Go with my kids. What caught our attention is the song that was part of the episode. You can listen to it here.

Despite the song being fun for my kids to listen to, my daughter asked me if those words made any sense at all. Do they? 🙂

At first I thought it was a simple rotation system. For example, “test” would turn to “estt”, and “scheme” would turn to “chemes”. This is what I told my daughter without any further analysis and we had fun playing with the words.

But then after a while we heard the song again and it seems our system isn’t what was used. Next thinking was, they would rotate the words, and append “ay” at the end of them. So I found the transcript and I noted this works, but you need to keep rotating until you reach a vowel. That’s it! I quickly sat and wrote an encoder in Scheme:

(define (word-to-piglatin str)
  (letrec [(vowels '(#\a #\e #\i #\o #\u))
           (l (string->list str))]
    (cond ((> (length (set-intersect vowels l)) 0)
           (if (member (car l) vowels)
               (list->string (append l '(#\a #\y)))
               (word-to-piglatin (list->string (append (cdr l) (list (car l)))))))
          (else (error "no vowels")))))

Now, just run it:

> (let ([song "pig pig latin is the dopest its real simple i will explain pig pig latin is the dopest say ay always at ends of words pig pig latin is the dopest"])
    (string-join (map word-to-piglatin (string-split song " ")) " "))
"igpay igpay atinlay isay ethay opestday itsay ealray implesay iay illway explainay igpay igpay atinlay isay ethay opestday aysay ayay alwaysay atay endsay ofay ordsway igpay igpay atinlay isay ethay opestday"

Ahh, close, but not really. If you look at the song lyrics, some words do not match. Specifically “isway” vs “isay”. So I thought maybe for some ending letters we append “ay” and for others we append “way”.

Constructing a table was the next step:

Ends in letter | Append
       a       | way
       b       | ay
       c       | ay
       d       | ay|way
       e       | way
       f       | ay|way
       g       |
       h       | ay|way
       i       | way
       j       |
       k       | way
       l       | ay|way
       m       | ay|way
       n       | ay|way
       o       |
       p       | ay|way
       q       |
       r       | ay
       s       | ay|way
       t       | ay|way
       u       |
       v       | ay
       w       | ay
       x       |
       y       | ay|way
       z       |

Nope, for some words it’s “ay”, for some it’s “way”, and for some it’s both. Also, for “on” it’s sometimes “onway” and sometimes “onay”. So it’s pretty random, but was still fun to look into! 🙂

EDIT: Thanks to my coworker Bartosz Budzanowski who told me that Pig Latin is really a thing! The rules are clearly stated on its Wikipedia page. Still, reverse engineering it was fun 🙂

Code updated:

(define (word-to-piglatin str [init #t])
  (letrec [(vowels '(#\a #\e #\i #\o #\u))
           (l (string->list str))]
    (cond ((> (length (set-intersect vowels l)) 0)
           (if (member (car l) vowels)
               (list->string (append l (if init '(#\w #\a #\y) '(#\a #\y))))
               (word-to-piglatin (list->string (append (cdr l) (list (car l)))) #f)))
          (else (error "no vowels")))))

Bonus: Haskell code

WordToPiglatin :: String -> String
WordToPiglatin "" = ""
WordToPiglatin s
    | isVowel (head s) = s ++ "way"
    | any isVowel s    = rotateUntil (isVowel . head) s ++ "ay"
    | otherwise        = s
    where
    isVowel c = c `elem` "aeiouAEIOU"
    rotateUntil _ [] = []
    rotateUntil p s = let res = rotate s in if p res then res else rotateUntil p res
    rotate (x:xs) = xs ++ [x]

Formula	Explanation
$[x \mapsto s]x = s$	Renaming all occurences of “x” to “s” within “x” is just “s”
$[x \mapsto s]y = y$	Renaming all occurences of “x” to “s” within “y” (which doesn’t contain any “x”) is just “x”
$[x \mapsto s](\lambda y . t_1) = \lambda y.[x \mapsto s]t_1, y \neq x$	In an abstraction, as long as the variable in the argument is different, we can recursively substitute the terms
$[x \mapsto s](t_1 t_2) = [x \mapsto s]t_1 [x \mapsto s]t_2, y \neq x \land y \notin FV(s)$	In an application, we recursively substitute all bound variables

Formula	Explanation
$FV(x) = \{x\}$	A single variable is free
$FV(\lambda x.t_1) = FV(t_1) \setminus \{x\}$	Free variables in a term within a lambda abstraction are all free variables of the term except the one bound by the abstraction
$FV(t_1 t_2) = FV(t_1) \cup FV(t_2)$	Free variables of application is the union of the free variables of both terms