Predicting values with linear regression

I had some fun time reading http://onlinestatbook.com/2/regression/intro.html today. It includes formulas for calculating linear regression of a data set.

Linear regression is used for predicting a value of a variable from a list of known values.

For example if a and b are related variables, then linear regression can predict the value of the one given the value for the other.

Here’s an implementation in Racket:

#lang racket
(require plot)

(define (sum l) (apply + l))

(define (average l) (/ (sum l) (length l)))

(define (square x) (* x x))

(define (variance l)
  (let ((avg (average l)))
    (/
     (sum (map (lambda (x) (square (- x avg))) l))
     (- (length l) 1))))
(define (standard-deviation l) (sqrt (variance l)))

(define (correlation l)
  (letrec
      ((X (map car l))
       (Y (map cadr l))
       (avgX (average X))
       (avgY (average Y))
       (x (map (lambda (x) (- x avgX)) X))
       (y (map (lambda (y) (- y avgY)) Y))
       (xy (map (lambda (x) (apply * x)) (map list x y)))
       (x-squared (map square x))
       (y-squared (map square y)))
    (/ (sum xy) (sqrt (* (sum x-squared) (sum y-squared))))))

(define (linear-regression l)
  (letrec
      ((X (map car l))
       (Y (map cadr l))
       (avgX (average X))
       (avgY (average Y))
       (sX (standard-deviation X))
       (sY (standard-deviation Y))
       (r (correlation l))
       (b (* r (/ sY sX)))
       (A (- avgY (* b avgX))))
    (lambda (x) (+ (* x b) A))))

(define (plot-points-and-linear-regression the-points)
  (plot (list
         (points the-points #:color 'red)
         (function (linear-regression the-points) 0 10 #:label "y = linear-regression(x)"))))

So, for example if we call it with this data set:

(define the-points '(
                 ( 1.00 1.00 )
                 ( 2.00 2.00 )
                 ( 3.00 1.30 )
                 ( 4.00 3.75 )
                 ( 5.00 2.25 )))

(plot-points-and-linear-regression the-points)

This is the graph that we get:

Cool, right?

Correctness on iterative and recursive processes

Iterative processes are proven using loop invariants, and recursive processes are proven using induction. In some cases it might be trickier to find a good loop invariant, where proving recursive processes is just to follow the very own definitions of the process.

Consider the following recursive definition:

maxList [x] = x
maxList (x:xs) = max(x, maxList xs)

We can prove its correctness using induction:

– Base case: Max element of a list of size 1 is the element itself.

– Inductive step: Assume that maxList of xs is maximum element.

Then for maxList (x:xs) we have 2 cases:
1. maxList of xs is >= x, in which case we select maxList xs
2. x is >= maxList xs, in which case we select x
In either case, we pick the larger element which will be the maximum.

Now consider the following iterative definition:

var max = x[0], i;

for (i = 0; i < x.length; i++) {
     if (x[i] >= max) max = x[i];
}

In this case we need to find a loop invariant to use that will hold pre-, during, and post- processing of that code block.

We can use the following loop invariant: max is the biggest element in the subarray x(0, i).

– Before loop: for array of size 1 we have the same element to be maximum. So the loop invariant holds.

– Within the loop, we have two cases:
1. x[i] >= max, in which we set max to be x[i]
2. x[i] < max, in which we don't change max
In either case, the loop invariant holds.

– After loop: max is the biggest element in the subarray x(0, x.length – 1) which is just x.

Capturing abstractions in PHP

I came across Yay today, which allows us to use macros in PHP. Cool right?

Start with

composer require yay/yay:dev-master

Then you can use

vendor/bin/yay

to pre-process your code with macros.

Now we can implement lazy evaluation (kinda) in PHP!

We start with:

<?php
macro {
    __delay( ···expression )
} >> {
    function() { return ···expression; }
}

function force( $x ) {
    return $x();
}

function not_a_delay( $x ) {
    return function() use ( $x ) {
        return is_callable( $x ) ? $x() : "Not a callable function";
    };
}

echo "__delay start\n";
$x = __delay( printf( "The time function returns: %d\n", time() ) );
echo "__delay end\n";

echo "\n----------\n\n";

echo "force start\n";
echo 'Retval: ' . force( $x ) . "\n"; // print here!
echo "force end\n";

echo "\n----------\n\n";

echo "not_a_delay start\n";
$x = not_a_delay( printf( "The time function returns: %d\n", time() ) ); // print here!
echo "not_a_delay end\n";

echo "force start\n";
echo 'Retval: ' . force( $x ) . "\n";
echo "force end\n";

So if we look at the output:

boro@bor0:~/Desktop/test$ vendor/bin/yay test.php | php
__delay start
__delay end

----------

force start
The time function returns: 1494937031
Retval: 38
force end

----------

not_a_delay start
The time function returns: 1494937031
not_a_delay end
force start
Retval: Not a callable function
force end

The two big differences are line 19 and 31 of the code. As we can see from the output, line 19 got “delayed”, while line 31 got evaluated eagerly.

Relations between recursion, stacks, and queues

For the sake of discussion, let’s consider a recursive and an iterative function that converts a number to a list.

(define (number->list l)
  (if (= l 0)
      '()
      (cons
       (remainder l 10)
       (number->list (quotient l 10)))))

(define (number->list-iter l acc)
  (if (= l 0)
      acc
      (number->list-iter
       (quotient l 10)
       (cons (remainder l 10) acc))))

Calling the functions produce:

> (number->list 123)
'(3 2 1)
> (number->list-iter 123 '())
'(1 2 3)

Further, let’s consider this same function rewritten in JavaScript, but implemented with stacks and queues instead of recursion:

function numbertolist_stack(n) {
        var stack = [n];

        while (stack.length) {
                var x = stack.pop();
		stack.push(x % 10);

                if (parseInt(x / 10)) {
                        stack.push(parseInt(x / 10));
                } else {
                        break;
                }
        }

        return stack;
}

function numbertolist_queue(n) {
        var queue = [n];

        while (queue.length) {
		var flag = false;
                var x = queue.shift();

                if (parseInt(x / 10)) {
                        queue.push(parseInt(x / 10));
		} else {
			flag = true;
		}

		queue.push(x % 10);

                if (flag) {
                        break;
                }
        }

        return queue;
}

So, now we have:

> numbertolist_stack(123)
[ 3, 2, 1 ]
> numbertolist_queue(123)
[ 1, 2, 3 ]

So, in terms of cons, what we can conclude is that recursion can be viewed as a stack, and tail recursion can be viewed as a queue. For if we haven’t worked in terms of cons, we can rewrite the Scheme iter function to use (append acc (list (remainder l 10))) instead of cons and achieve the same effect as number->list.

However, note that most interpreters that contain an implementation of tail call optimization wouldn’t probably implement a queue, but only a fixed set of variables in order to save space.

Another interesting observation is that we can look at it in terms of associativity (think foldr and foldl). foldr is implemented using a stack, and foldl is implemented using a queue, so that gives an idea in terms of associativity.

Or that we can implement depth-first search with a stack and breadth-first search with a queue.

IMHO, discovering relations like this is fun 🙂

Usefulness of algebraic structures

Let’s consider the partial order relation.

Namely, it’s a relation that has the following properties: reflexive, transitive, antisymmetric.

One example of such a relation is ≤ on naturals.

Another example is ⊆ on sets.

But if we look at these 2 examples from a higher perspective, the first one is about comparing numbers and the second one is about subsets of a set. E.g. 1 ≤ 2 and {1, 2} ⊆ {1, 2, 3}

It might not be immediately obvious that these 2 examples have anything in common, but with partial orders (and algebraic structures in general) we can capture such abstractions.