foldl and foldr

First, we start with the definitions:
foldr:

foldr f z (x:xs) = f x (foldr f z xs) (*)
foldr _ z [] = z (**)

foldl:

foldl f z (x:xs) = foldl f (f z x) xs (*)
foldl _ z [] = z (**)

foldr starts from the right and goes on to the left (right-associative), while foldl does exactly the opposite.
As usual, it’s best to see with examples. Assume generic list of [a, b, c] and generic function f:

Case foldr:

foldr f z (x:xs) = f x (foldr f z xs)
=> foldr f z [a, b, c]
= f a (foldr f z [b, c]) (by (*))
= f a (f b (foldr f z [c]) (by (*))
= f a (f b (f c (foldr f z []))) (by (*) and (**))
= f a (f b (f c z))

Case foldl:

foldl f z (x:xs) = foldl f (f z x) xs
=> foldl f z [a, b, c] (by (*))
= foldl f (f z a) [b, c] (by (*))
= foldl f (f (f z a) b) [c] (by (*))
= foldl f (f (f (f z a) b) c) [] (by (*) and (**))
= (f (f (f z a) b) c)

So we’ve shown the associativity of both foldr and foldl.
Why do we need both foldr and foldl? Because of associativity of course. There are some cases where we’d want to use foldr, and some cases where we’d want to use foldl.
Consider some f (binary function) which is not associative (e.g. ‘-‘). Try evaluating:


*Main> foldr (-) 0 [1,2,3]
2
*Main> foldl (-) 0 [1,2,3]
-6

So, what really inspired me to write this article is one of the examples given in LYAH:


solveRPN x = foldl foldingFunction [] $ words x
where foldingFunction (x:y:ys) "*" = (x * y):ys
foldingFunction (x:y:ys) "+" = (x + y):ys
foldingFunction (x:y:ys) "-" = (y - x):ys
foldingFunction xs numberString = read numberString:xs

This is a very elegant solution. I tried writing a RPN calculator some time ago in JS (imperative style) and it involved a lot of checks for side cases, while in Haskell where we have the definitions of all functions that we are using, there is no room for side cases. We know exactly what we are trying to evaluate.

If you want a thorough explanation of the solveRPN function, you can check it at http://learnyouahaskell.com/functionally-solving-problems, while here I’ll only give a brief explanation.

Our foldingFunction acts as a stack storage, and accepts two parameters (the first parameter being the initial parameter that we pass (an empty stack), and the second one being the current element taken from the list), as foldl expects. So, in the where clause we are actually pattern matching on foldingFunction .

In the case where we have two (or more) numbers stored into our stack (x:y:ys) and we arrive at “*” as the current element of the input list, then we multiply those two numbers and store the result back in the stack. Likewise is the definition for “+” and “-“. However, if we match none of the specified operators, then we just store it back into our stack as a number.

Class and Instance

Today, we want a generic Animal class, so that we can instantiate animals out of it in a easy way. How can we do that?
First, we’ll want to create our generic class. We do it this way:

class Animal c where
doSomething :: c -> String
name :: c -> String

But we have a problem here. We can’t do

class Animal c where
doSomething :: String
name :: String

Because Haskell forces us to use the type that will be instantiated in our instances. We’ll cover on that later (*).
So let’s make our Cat and Dog instances now

data Dog = Dog
data Cat = Cat

instance Animal Dog where
doSomething _ = "Woof!"
-- name = "A dog" -- you can't do this
name _ = "A dog"

instance Animal Cat where
doSomething _ = "Meow!"
name _ = "A cat"

Now we can do some fancy stuff like

*Main> doSomething Dog
"Woof!"
*Main> doSomething Cat
"Meow!"
*Main> name Dog
"A dog"
*Main> name Cat
"A cat"
*Main>

So, basically, what we just wrote is “syntactic sugar” for

data DogOrCat = Dog' | Cat'

name' Dog' = "A dog"
name' Cat' = "A cat"

doSomething' Dog' = "Woof!"
doSomething' Cat' = "Meow!"

But the problem with this approach is that, whenever we want to instantiate another animal, we’d have to change the already existing structure (DogOrCat). We don’t want that.
We also risk in calling name’ for an Animal which may not have an instance. Instance forces you to instantiate all functions in the Class.
(*) Basically, this is what triggered this post: I had a discussion with a friend of mine today, and he said that you can’t do something like this in Haskell:

data Dog = Dog { name :: String; age :: Int, bark :: BarkKind }
data Cat = Cat { name :: String, age :: Int, meow :: MeowKind } -- error.

Which is correct. The reason for this is the namespace. name and age (declared in Dog) are global functions, and we are trying to re-declare them in Cat.
He wanted this kind of structure in order to be able to parse some JSON stuff.
I headed over to #haskell and had a discussion, where one guy had this implementation as a suggestion:

instance FromJSON Dog where
parseJSON (Object v) =
Dog v .: "name"
v .: "bites"
parseJSON _ = mzero

instance FromJSON Cat where
parseJSON (Object v) =
Cat v .: "name"
v .: "likes-humans"
parseJSON _ = mzero

This is using the Aeson library. Basically, that’s what Aeson did. They declared some generic class, so that we can make instances out of it in a elegant way where types will not conflict with each other.

Int List Monoid implementation in C

Here is an implementation in C of the monoid ([Int], ++, []):

/*
* Experimental fun. List Monoid implementation in C by Boro Sitnikovski
* 26.12.2013
*
*
boro@boro:~/test$ gcc list.c
boro@boro:~/test$ ./a.out
A = [ 1 2 3 ]
B = [ 5 6 7 ]
Head of list A: 1
Head of list B: 5
B ++ A = [ 5 6 1 2 3 ]
A ++ B = [ 1 2 5 6 7 ]
123:A ++ B = [ 123 1 2 5 6 7 ]
boro@boro:~/test$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> let a = [1, 2, 3]
Prelude> let b = [5, 6, 7]
Prelude> head a
1
Prelude> head b
5
Prelude> b ++ a
[5,6,7,1,2,3]
Prelude> a ++ b
[1,2,3,5,6,7]
Prelude> 123:a ++ b
[123,1,2,3,5,6,7]
Prelude>
*
*/

#include
#include

#define T int
#define EMPTY_LIST NULL

typedef struct ListMonoid {
T value;
struct ListMonoid *next;
} ListMonoid;

ListMonoid *A, *B;

void list_init() {
static ListMonoid a1, a2, a3;
static ListMonoid b1, b2, b3;

a1.next = &a2; a2.next = &a3; a3.next = EMPTY_LIST;
b1.next = &b2; b2.next = &b3; b3.next = EMPTY_LIST;

a1.value = 1;
a2.value = 2;
a3.value = 3;

b1.value = 5;
b2.value = 6;
b3.value = 7;

A = &a1;
B = &b1;
}

T list_head(ListMonoid *L) {
return L->value;
}

void list_append(ListMonoid **L, T value) {
ListMonoid *newL = (ListMonoid *)malloc(sizeof(ListMonoid));
newL->value = value;
newL->next = *L;
*L = newL;
}

void list_print(ListMonoid *L) {
printf("[ ");
while (L) {
printf("%d ", L->value);
L = L->next;
}
printf("]\n");
}

void list_free(ListMonoid *L) {
ListMonoid *tmp = L;
while (tmp) {
L = tmp->next;
free(tmp);
tmp = L;
}
return;
}

ListMonoid *list_add(ListMonoid *L1, ListMonoid *L2) {
ListMonoid *tmp, *L;
tmp = L = (ListMonoid *)malloc(sizeof(ListMonoid));

while (1) { // L1
tmp->value = L1->value;
L1 = L1->next;
if (!L1) {
tmp->next = NULL;
break;
}
tmp = tmp->next = (ListMonoid *)malloc(sizeof(ListMonoid));
}

while (1) { // L2
tmp->value = L2->value;
L2 = L2->next;
if (!L2) {
tmp->next = NULL;
break;
}
tmp = tmp->next = (ListMonoid *)malloc(sizeof(ListMonoid));
}

tmp->next = NULL;

return L;

}

int main() {

ListMonoid *L;

list_init();

printf("A = ");
list_print(A);

printf("B = ");
list_print(B);

printf("Head of list A: %d\n", list_head(A));
printf("Head of list B: %d\n", list_head(B));

L = list_add(B, A);
printf("B ++ A = ");
list_print(L);
list_free(L);

L = list_add(A, B);
printf("A ++ B = ");
list_print(L);
list_free(L);

L = list_add(A, B);
list_append(&L, 123);
printf("123:A ++ B = ");
list_print(L);
list_free(L);


return 0;
}

Definition of a Monad

A monad is just a monoid in the category of endofunctors, what’s the problem?

A monoid is…

  • A set, S
  • An operation, • : S × S -> S
  • An element of S, e : 1 -> S

Examples:
0 + 7 == 7 + 0 == 7
(• ≡ +, S ≡ Natural Numbers, Identity ≡ 0)

[] ++ [1,2,3] == [1,2,3] ++ [] == [1,2,3]
(• ≡ ++, S ≡ [Int], Identity ≡ [])

{} union {apple} == {apple} union {} == {apple}
(• ≡ union, S ≡ {fruits}, Identity ≡ {})

…satisfying these laws:

  • (a • b) • c = a • (b • c), for all a, b and c in S
  • e • a = a = a • e, for all a in S

A monad is…

  • An endofunctor, T : X -> X
  • A natural transformation, μ : T × T -> T, where × means functor composition
  • A natural transformation, η : I -> T, where I is the identity endofunctor on X

…satisfying these laws:

  • μ(μ(T × T) × T)) = μ(T × μ(T × T))
  • μ(η(T)) = T = μ(T(η))

Tying it all together

  • a monad is a structure that defines a way to combine functions,
  • analogously to how a monoid is a structure that defines a way to combine objects,
  • where the method of combination is associative,
  • and where there is a special ‘No-op’ that can be combined with any Something to result in Something unchanged.

(Full article: http://stackoverflow.com/questions/3870088/a-monad-is-just-a-monoid-in-the-category-of-endofunctors-whats-the-problem)

fold vs map

Today I learned that map can be implemented in terms of fold, i.e. if we have fold (reduce) we already have map.

The source code for foldr is:

foldr f z []     = z
foldr f z (x:xs) = f x (foldr f z xs)

So, we can do something like:

Prelude> foldr ((:).(+1)) [] [1,2,3]
[2,3,4]

Instead of using (+1) here, we can compose any function.

We can also do

Prelude> foldr (+) 0 [1,2,3]
6

The beauty of fold is that for a given data of type A, it can return data of type B.
While for map, for a given data of type A it will always return data of type A.

As we’ve shown, fold can return a list, a number, or anything else that we specify. But this is not the case with map.

The reason is that fold has a more generic morphism (catamorphism), and this is what makes it more powerful than map (homomorphism).

Here are some examples:

Prelude> foldr ((++)) [] ["hello", "world"]
"helloworld"
Prelude> foldr ((++).(words)) [] ["hello world", "I am fine"]
["hello","world","I","am","fine"]

And for something more complex:

parsePackets :: String -> [String]
parsePackets [] = []
parsePackets x = a:parsePackets b
where
(a,b) = splitAt 4 x

*Main> parsePackets "hey test 123 packets aligned in 4 bytes"
["hey ","test"," 123"," pac","kets"," ali","gned"," in ","4 by","tes"]

And here’s an analogous example with fold:

*Main> fst $ foldr (\x (chunks,partial) -> if length partial == 3 then
((x:partial):chunks,"") else (chunks,x:partial)) ([],[])
"hey test 123 packets aligned in 4 bytes"
[" tes","t 12","3 pa","cket","s al","igne","d in"," 4 b","ytes"]
Prelude> foldr (\x (chunks, partial, all, cnt) -> if x == 3 || cnt == 3 then
(x:chunks, partial, x:all, cnt+1) else (chunks, x:partial, x:all, cnt+1))
([],[],[],0) [1,2,3,4,5,6,7,3]
([3,5,3],[1,2,4,6,7],[1,2,3,4,5,6,7,3],8)

So we can notice that it is quite analogous to an imperative loop, i.e. we can look at fold’s accumulator as a local variable.

We can also implement a fixed point combinator using fold, proving that iterations can be reduced to folds:

fix f = foldr (\_ -> f) undefined (repeat undefined)

The undefined is just a gap-filler. The expression ‘repeat undefined’ generates a list of unbounded length. We don’t care about its elements, so anything will do. The fact that the list is infinite means that we get as many applications of the f parameter as we need. Note that, because the list is infinite, foldr will never reach the empty list, so it does not matter what value we give for the base case argument. Nonetheless, the type-checker requires that the argument has a polymorphic type, so it is quite convenient to use undefined here also.1

So, now we can experiment some more with recursion:


Prelude> let fix f = foldr (\_ -> f) undefined (repeat undefined)
Prelude> let fact x = if x == 0 then 1 else x * fact (x-1)
Prelude> (fix (\x -> fact)) 4
24

1. Pope, Bernie. “Getting a Fix from the Right Fold“. The Monad.Reader (6): 5–16.