Brief introduction to Machine Learning with Gradient Descent

In Mathematics, there are many curves. From a constant curve $f(x) = b$ , to a linear one $f(x) = ax + b$ , to a quadratic one $f(x) = ax^2 + bx = c$ to …, you name it.

Now, imagine that there is a huge data set with some points (think ordered pairs for simplicity), but not all points are defined. Essentially, machine learning is all about coming up with a curve (based on a chosen model) by filling gaps.

Once we agree on the model we want to use (curve we want to represent), we start with some basic equation and then tweak its parameters until it perfectly matches the data set points. This process of tweaking (optimization) is called “learning”. To optimize, we associate a cost function (the error or delta between the values produced by the equation and the data set) and we need to find for which parameters this cost is minimal.

Gradient descent is one algorithm for finding the minimum of a function, and as such it represents the “learning” part in machine learning. I found this video by StatQuest, along with this video by 3Blue1Brown to be super simple explaining these concepts, and naturally, this article will be mostly based on them.

In this article I will assume some basic set theory, and also derivatives. Further, through example, we will:

Define what the curve (model) should be
Come up with a data set
Do the “learning” using gradient descent

Continue reading →

Customer-driven engineering

We’re in the process of renovating our house with my wife.

We have a bunch of “engineers” working on it: electricians, plumbers, wall painters. These guys have direct contact with the customers. What they are working on will be immediately visible to the customer. They must be thinking about the customer all the time, and satisfy their needs. There is no management, customer support, release dates in between (I’m sure there is for larger objects). What they are doing is visible at the spot and they must make sure it looks right, otherwise the customer will be mad.

This sounds very similar to what freelancing software engineers are doing. On the other hand, software engineers that work at big companies have huge layers before they reach the customers. As a consequence, most of them even rarely think about their customers at all.

Lately I’ve been thinking about our customers more and more. What we are building as software engineers are decisions that were discussed with other smart people at meetings, etc. But I think there is still value in constantly asking yourself: “what will the customer think about this?”, “what will their reaction be on what I’m building right now”, etc.

Do you ever think about your customers, or only your code like most of us? 🙂

Lambda calculus with generalized abstraction

Lately I’ve been thinking about this: What do we get if we change lambda calculus’s abstraction from (λx.M) to (λN.M), where M and N are lambda terms, and x is a variable?

That is (following the structure from my previous post), our new syntax becomes:

<var>   ::= x
<abs>   ::= λ<term>.<term>
<app>   ::= <term><term>

<term>  ::= <var> | <abs> | <app>
<value> ::= <abs>

In Haskell code:

type VarName = String
 
data Term =
    TmVar VarName
    | TmAbs Term Term
    | TmApp Term Term deriving (Eq, Show)

With the new list of inference rules:

Name	Rule
E-App1	$\frac{t_1 \to t_1'}{t_1t_2 \to t_1't_2}$
E-App2	$\frac{t_2 \to t_2'}{v_1t_2 \to v_1t_2'}$
E-App3	$\frac{t_2 \to t_2', v_1t_2' \to v_1'}{(\lambda v_1.t_{12}) \ t_2 \to \lambda v_1'.t_{12} }$
E-AppAbs	$(\lambda x.t_{12})v_2 \to [x \mapsto v_2]t_{12}$

So the evaluator looks something like:

eval : Term -> Term
eval (TmApp (TmAbs (TmVar x) t12) v2)      = subst x t12 v2 -- E-AppAbs
eval (TmApp (TmAbs v1@(TmAbs _ _) t12) t2) = let t2' = eval t2 in let v1' = eval (TmApp v1 t2') in TmAbs v1' t12 -- E-App3
eval (TmApp v1@(TmAbs _ _) t2)             = let t2' = eval t2 in TmApp v1 t2' -- E-App2
eval (TmApp t1 t2)                         = let t1' = eval t1 in TmApp t1' t2 -- E-App1
eval (TmVar x)                             = TmVar x -- E-Var
eval x                                     = error ("No rule applies: " ++ show x)

subst is the same as in the previous post.

So with this system we could “control” the “arguments” in the abstraction through application.
We could represent λx.x and λy.x through a single abstraction, i.e. (λ(λx . x) . x) so we get to either pass x or y to it:

Main> let f1 = TmAbs (TmVar "x") (TmVar "x")
Main> let f2 = TmAbs (TmVar "y") (TmVar "x")
Main> let f3 = TmAbs (TmAbs (TmVar "x") (TmVar "x")) (TmVar "x")
Main> eval (TmApp f3 (TmVar "x")) == f1
True
Main> eval (TmApp f3 (TmVar "y")) == f2
True

Is this useful or not? I didn’t know 🙂 so as always I referred to ##dependent@freenode and learned a bunch of new stuff:

 <pgiarrusso> I suspect many metaprogramming facilities would allow generating `𝜆 foo. x` by picking `foo`, but I don't see any reason to support this specifically
 <pgiarrusso> to be sure, I'm not necessarily saying "there is no point"
 <pgiarrusso> also, for your example, it seems `if foo then λx.x else λy.x` would work as well
 <pgiarrusso> my best answer is, when you write a function, you want to know what are the function params, to refer to them

Some stuff on papers:

 <pgiarrusso> *now*, I'm sure that strange variants of LC exist which do make sense but look surprising
 <pgiarrusso> but I think people usually start from something that is (1) useful but (2) hard, and then build a formal system to support that
 <pgiarrusso> s/people/researchers/
 <bor0> how do they know that it's useful?
 <pgiarrusso> sometimes you do go the opposite way, but that's harder
 <pgiarrusso> bor0: take a PL paper, read the introduction, see the arguments they make
 <bor0> I don't understand how that determines the usefulness of the paper itself?
 <bor0> or the idea even
 <pgiarrusso> bor0: not sure what paper to pick, but http://worrydream.com/refs/Hughes-WhyFunctionalProgrammingMatters.pdf might be an example
 <pgiarrusso> lots of that paper is an argument that lazy evaluation is interesting and useful
 <pgiarrusso> it does not *prove* that it's useful, as that's not something you can prove
 <pgiarrusso> but that paper (and many others) show something that is hard to do, arguably desirable, and give a "better" way of doing it

Bonus 🙂

 <mietek> bor0: <gallais> Someone was talking about a lambda calculus with full terms instead of mere variables in binding position IIRC
 <mietek> bor0: <gallais> I just came across this work: https://academic.oup.com/jigpal/article/26/2/203/4781665
 <bor0> wow thanks mietek!
 <bor0> interesting how when humans think about a certain topic they almost always have overlapping ideas
 <bor0> (:philosoraptor: it just confirms Wadler's statement that mathematics is discovered, not invented, I guess)
 <bor0> but I am happy that I had thought of that, even if for 2 years late (article was published in 2017)

Update: Thanks to /u/__fmease__ for noticing an error in the E-App3 LaTeX rule.

Writing a lambda calculus type-checker in Haskell

This tutorial serves as a very short and quick summary of the first few chapters of TAPL.

My previous post provided an implementation for the untyped lambda calculus.

This post is focused on building the typed lambda calculus – lambda calculus with a type-checker.

3. Typed lambda calculus

For evaluation, the typed lambda calculus will use the same functions as the untyped, with the only change that every TmAbs will carry a type along. So we have:

data Term =
...
    | TmAbs VarName Type Term
...

The corresponding functions (eval, subst, freeVars) need to have the pattern matching clauses updated, and type passed in context where applicable.

3.1. Syntax

Per BNF, the syntax is:

<type> ::= TyVar | TyArr <type> <type>

In Haskell code:

data Type =
   TyVar
   | TyArr Type Type deriving (Eq, Show)

That is, well-formed terms in the lambda calculus are either variables, or functions.

3.2. Inference rules

The type rules for E-AppAbs and the syntax for abstraction (as we have seen) is changed in that it introduced the type information.

Here are the typing rules that we will now implement:

Name	Rule
T-Var	$\frac{a \text{ : }T \in \Gamma}{\Gamma \vdash a \text{ : }T}$
T-Abs	$\frac{\Gamma, a \text{ : }T_1 \vdash t_2 : T_2}{\Gamma \vdash \lambda x : T_1.t_2 : T_1 \to T_2}$
T-App	$\frac{\Gamma \vdash t_1 : T_2 \to T_1, \Gamma \vdash t_2 : T_2}{\Gamma \vdash t_1 t_2 : T_1 }$

We will re-use the environment code from the first tutorial, namely TyEnv, addType, and getTypeFromEnv. The Haskell implementation of the rules in the table above:

typeOf :: TyEnv -> Term -> Either String Type
typeOf env (TmVar n) =
    case getTypeFromEnv env n of
        Just t -> Right t
        _      -> error $ "No var found in env: " ++ show n
typeOf env (TmAbs n t1 te) =
    let newEnv = addType n t1 env
        t2 = typeOf newEnv te in
        case t2 of
            Right t2 -> Right $ TyArr t1 t2
            _        -> Left "Unsupported type for TmAbs"
typeOf env (TmApp t1 t2)   =
    let t1' = typeOf env t1
        t2' = typeOf env t2 in
        case t1' of
           Right (TyArr a b) | Right a == t2' -> Right b
           Right (TyArr a _)                  -> Left $ "Type mismatch between " ++ show t1' ++ " and " ++ show t2'
           _                                  -> Left "Unsupported type for TmApp"

Playing around with it a bit:

Main> typeEnv = addType "x" TyVar []
Main> typeOf typeEnv (TmVar "x")
Right TyVar
Main> let lcid = TmAbs "x" TyVar (TmVar "x") in typeOf typeEnv lcid
Right (TyArr TyVar TyVar)
Main> let lcid = TmAbs "x" TyVar (TmVar "x") in typeOf typeEnv (TmApp lcid (TmVar "x"))
Right TyVar

However, the function lcid only accepts a single variable and returns it. It will not type-check if we pass an abstraction to it. Thus, there are infinite id‘s that we must define in the simply typed lambda calculus: $\lambda x:\text{TyVar}.x$ , $\lambda x:\text{TyVar}\to\text{TyVar}.x$ , etc. This problem can be fixed by implementing the polymorphic lambda calculus.

3.3. Bonus: Recursion

Given the standard semantics, the simply typed lambda calculus is strongly normalizing: that is, well-typed terms always reduce to a value.

This is because recursion is not allowed by the typing rules: it is impossible to find types for fixed-point combinators and the looping term $\Omega =(\lambda x.~x~x)(\lambda x.~x~x)$ .

Recursion can be added to the language by either having a special operator ${\mathtt {fix}}_{\alpha }$ of type $(\alpha \to \alpha )\to \alpha$ or adding general recursive types, though both eliminate strong normalization.

Since it is strongly normalising, it is decidable whether or not a simply typed lambda calculus program halts: in fact, it always halts. We can therefore conclude that the language is not Turing complete.

3.4. Conclusion

Now that we have entered the typed world of lambda calculus, where do we go from here?

The lambda cube was introduced by the logician Hendrik Barendregt in 1991. The simply typed lambda calculus is found in the lower-left corner.

The other extensions include:

$\lambda 2$ – System-F (polymorphism): terms allowed to depend on types
$\lambda \text{P}$ – Lambda-P (dependent types): types depending on terms
$\lambda \omega$ – System-F omega (type operators): types depending on types, corresponding to type operators

The system in the upper-right corner is called Calculus of Constructions and is the most powerful system, both on the logical and computational side.

Writing a lambda calculus evaluator in Haskell

This tutorial serves as a very short and quick summary of the first few chapters of TAPL.

My previous post was a general overview of how we can design an evaluator and a type checker.

This post is more focused on building the lambda calculus from scratch. It provides an interesting overview of some design decisions particularly for the lambda calculus.

2. Untyped lambda calculus

Lambda calculus is a formal system for expressing computation. The grammar rules are divided in two parts: function abstraction and function application. Function abstraction defines what a function does, and function application “computes” a function. For example $f(x) = x + 1$ is a function abstraction and $f(3)$ is a function application. The equality sign = is replaced with a dot, and instead of writing $f(x)$ we write $\lambda x$ . To represent $f(x) = x + 1$ we write $\lambda x.x + 1$ .

It was introduced by the mathematician Alonzo Church in the 1930s as part of his research of the foundations of mathematics.

2.1. Syntax

Our syntax, per BNF is defined as follows:

<var>   ::= x
<abs>   ::= λ<variable>.<term>
<app>   ::= <term><term>

<term>  ::= <var> | <abs> | <app>
<value> ::= <abs>

Note that a value in the system is just an abstraction. In Haskell code:

type VarName = String

data Term =
    TmVar VarName
    | TmAbs VarName Term
    | TmApp Term Term deriving (Eq, Show)

We set variables to additionally hold VarName (which is a string in this case) for pretty printing and comparison of variables.

2.2. Inference rules (evaluator)

The list of inference rules:

It looks like we have a new notation here: $[x \mapsto v_2]t_{12}$ means that we change all occurences of $x$ with $v_2$ within $t_{12}$ .

So the evaluator looks something like:

eval :: Term -> Term
eval (TmApp (TmAbs x t12) v2@(TmAbs _ _)) = subst x t12 v2 -- E-AppAbs
eval (TmApp v1@(TmAbs _ _) t2)            = let t2' = eval t2 in TmApp v1 t2' -- E-App2
eval (TmApp t1 t2)                        = let t1' = eval t1 in TmApp t1' t2 -- E-App1
eval _ = error "No rule applies"

Note how we pattern match against v2@(TmAbs _ _) to detect a variable. We do that because we didn’t “treat” variables separately in our data definition.

The only thing left for us to implement is subst, and we should be good.

TAPL defined it for us in 5.3.5:

We can convert the definitions to Haskell code easily:

-- subst VarName in Term to Term
subst :: VarName -> Term -> Term -> Term
subst x (TmVar v) newVal
    | x == v    = newVal
    | otherwise = TmVar v
subst x (TmAbs y t1) newVal
    | x == y                                  = TmAbs y t1
    | x /= y && (y `notElem` freeVars newVal) = TmAbs y (subst x t1 newVal)
    | otherwise                               = error $ "Cannot substitute '" ++ show x ++ "' in term '" ++ show (TmAbs y t1) ++ "'"
subst x (TmApp t1 t2) newVal = TmApp (subst x t1 newVal) (subst x t2 newVal)

But what is freeVars? Well, it’s the list of free variables in an expression. For example, in $\lambda x.x \ y$ , we have that x is bound (because it appears in an argument in an abstraction), and y is free. TAPL already defined it as FV in 5.3.2:

FV(\lambda x.t_1) = FV(t_1) \setminus \{x\}

Implementation in Haskell is also straight-forward with this:

freeVars :: Term -> [VarName]
freeVars (TmVar x) = [x]
freeVars (TmAbs x t1) = nub (freeVars t1) \\ [x]
freeVars (TmApp t1 t2) = freeVars t1 ++ freeVars t2

Running a few examples:

Main> let lctest1 = (TmAbs "f" (TmAbs "x" (TmVar "x")))
Main> let lctest2 = (TmAbs "f" (TmAbs "x" (TmVar "y")))
Main> eval $ TmApp lctest1 lctest2
TmAbs "x" (TmVar "x")
Main> eval $ TmApp lctest2 lctest1
TmAbs "x" (TmVar "y")
Main> eval (TmApp (eval $ TmApp lctest2 lctest1) lctest1)
TmVar "y"

Note, however:

Main> eval $ TmApp (TmAbs "x" (TmAbs "y" (TmVar "y"))) (TmAbs "x" (TmVar "y"))
*** Exception: Cannot substitute '"x"' in term 'TmAbs "y" (TmVar "y")'

This happens by design, because we chose our system to work up to renaming of bound variables (the otherwise case in subst for abstraction). So, subst isn’t total. Church fixed this issue by using alpha conversion – which is just renaming bound variables. So for example $\lambda x.x$ could be re-written as $\lambda y.y$ .

Now we can “fix” our evaluation error by alpha conversion:

Main> eval $ TmApp (TmAbs "x" (TmAbs "z" (TmVar "z"))) (TmAbs "x" (TmVar "y"))
TmAbs "z" (TmVar "z")

There’s an alternative approach to fix this issue: De Bruijn indices.

2.3. Bonus: De Bruijn indices

The point with De Bruijn indices is that they remove the need of alpha conversion. They allow that by removing the variable argument in the lambda abstraction, and rely on integers (indices) to refer to variables. The index actually denotes the number of binders that are in scope between that occurrence and its corresponding binder.

For example, we have that $\lambda f . \lambda x . x = \lambda \lambda 0$ because the distance from the x in the application to the abstraction is 0. Similar reasoning for $\lambda f . \lambda x . f = \lambda \lambda 1$ , etc.

We start with the following data definitions:

type VarNameB = Int
type Indices = (VarName, Int)

data TermB =
    TmVarB VarNameB
    | TmAbsB TermB
    | TmAppB TermB TermB deriving (Eq, Show)

We will also create a function that will convert an old term Term to a De Bruijn’s term TermB:

bruijn :: Term -> Maybe TermB
bruijn e = go [] e where
    go :: [Indices] -> Term -> TermB
    go m (TmAbs x e)  = TmAbsB $ let m' = map (\(x, y) -> (x, y + 1)) m in go ((x, 0) : m') e
    go m (TmApp e e') = TmAppB (go m e) (go m e')
    go m (TmVar x)    = TmVarB $ findByFst m x
    findByFst :: [Indices] -> VarName -> Int
    findByFst ((x, y) : xs) x' = if x' == x then y else findByFst xs x'
    findByFst _ _              = error "Variable not found"

Running it:

Main> bruijn (TmAbs "f" (TmAbs "x" (TmVar "x")))
TmAbsB (TmAbsB (TmVarB 0))

However:

Main> bruijn (TmAbs "f" (TmAbs "x" (TmVar "y")))
TmAbsB (TmAbsB (TmVarB *** Exception: Variable not found

This could be fixed by the function returning a Maybe TermB, or exposing go to consider the list of free variables and do the proper substitution. However, doing this or writing an evaluator for this system is left as an exercise to the reader 😀

2.4. Conclusion

The lambda calculus has a very minimal system, yet is a bit trickier to implement than the previous system we’ve seen. When we introduce function abstraction and application, we introduce complexity. We needed to account for contextual substitution.

An interesting question would be: is there a system that is simpler to implement, yet as powerful as lambda calculus? Maybe combinatory logic? But it’s not really pleasant to use as it’s not readable to us. It’s almost as what we essentially did is choose the minimal “most” readable system, and solve the side-effects (i.e. solving the issue of alpha conversion) of reading/parsing those systems that we humans have as a limitation. I mean, if it was easy for humans to parse SKIKSIKSKS easily, we wouldn’t use lambda calculus (which requires us as language designers to solve all the side-effects due to complexity), right?

In the next tutorial we’ll be implementing a type-checker for the lambda calculus – the typed lambda calculus.

Formula	Explanation
$[x \mapsto s]x = s$	Renaming all occurences of “x” to “s” within “x” is just “s”
$[x \mapsto s]y = y$	Renaming all occurences of “x” to “s” within “y” (which doesn’t contain any “x”) is just “x”
$[x \mapsto s](\lambda y . t_1) = \lambda y.[x \mapsto s]t_1, y \neq x$	In an abstraction, as long as the variable in the argument is different, we can recursively substitute the terms
$[x \mapsto s](t_1 t_2) = [x \mapsto s]t_1 [x \mapsto s]t_2, y \neq x \land y \notin FV(s)$	In an application, we recursively substitute all bound variables

Formula	Explanation
$FV(x) = \{x\}$	A single variable is free
$FV(\lambda x.t_1) = FV(t_1) \setminus \{x\}$	Free variables in a term within a lambda abstraction are all free variables of the term except the one bound by the abstraction
$FV(t_1 t_2) = FV(t_1) \cup FV(t_2)$	Free variables of application is the union of the free variables of both terms