Write a mathematical proof generator for AoC 1-1 2022

For last year’s AoC, I decided to use 20 different programming languages. For the year before that, I used Haskell. This year I decided to not participate as I’m already busy with other stuff, but I thought I could try something different.

So the first task of AoC day 1 is essentially giving you a list of lists, and you need to sum each element and find the max. Here’s a one-liner in Python:

max([ sum(x) for x in L ])

But I thought to myself, why write code in a “normal” programming language to solve an easy problem? Why not complicate things a little bit and write a code that generates mathematical proof in Budge-TP? In this post, we will write a Python script that will solve AoC 1-1 by generating Budge-TP code.

Basically, Budge-TP is a theorem prover that allows you to specify formal systems (rules and axioms) and derive theorems based on that. It is a very low-level language; what assembly is to computers, Budge-TP is to mathematics.

Once you start writing proofs in Budge-TP, they tend to become very verbose. While this is good in that it gives insights as to how the internals of a system work, it comes at a price – code starts to get unreadable very fast.

Note that writing a codegen script for low-level languages is not a new idea. In fact, it is a very common approach to problems like these. For example, the theorem prover Coq already has a tactics language on top of lambda calculus, where terms tend to get unreadable fast as well!

For example, for the list [[1], [2], [1]] the codegen script automatically generates the following Budge-TP code:

## Derived theorems
# Representing all necessary numbers
tm1! : rTmS x=tm0!
tm2! : rTmS x=tm1!

# Representing all the lists from the input
tList1Step0! : rTmNil
tList1! : rMkList x=tm1!;y=tList1Step0!
tList2Step0! : rTmNil
tList2! : rMkList x=tm2!;y=tList2Step0!
tList3Step0! : rTmNil
tList3! : rMkList x=tm1!;y=tList3Step0!

# Representing all the sums of the lists
tList1SumStep0! : rSumInit z=tList1!
tList1SumStep1! : rSumRec x=tm0!;y=tm0!;z=tList1Step0! tList1SumStep0!
tList1SumStep2! : rSumRecSkip x=tm1!;z=tList1Step0! tList1SumStep1!
tElf1! : rSumNil x=tm1! tList1SumStep2!
tList2SumStep0! : rSumInit z=tList2!
tList2SumStep1! : rSumRec x=tm0!;y=tm1!;z=tList2Step0! tList2SumStep0!
tList2SumStep2! : rSumRec x=tm1!;y=tm0!;z=tList2Step0! tList2SumStep1!
tList2SumStep3! : rSumRecSkip x=tm2!;z=tList2Step0! tList2SumStep2!
tElf2! : rSumNil x=tm2! tList2SumStep3!
tList3SumStep0! : rSumInit z=tList3!
tList3SumStep1! : rSumRec x=tm0!;y=tm0!;z=tList3Step0! tList3SumStep0!
tList3SumStep2! : rSumRecSkip x=tm1!;z=tList3Step0! tList3SumStep1!
tElf3! : rSumNil x=tm1! tList3SumStep2!

# Combining all the sums into a single list
tElvesSumsStep0! : rTmNil
tElvesSumsStep1! : rMkList x=tElf3!;y=tElvesSumsStep0!
tElvesSumsStep2! : rMkList x=tElf2!;y=tElvesSumsStep1!
tElvesSums! : rMkList x=tElf1!;y=tElvesSumsStep2!

# Take the maximum of the list tElvesSums!
tElvesMaxStep0! : rMaxInit x=tElvesSums!
tElvesMaxStep1! : rLtZ x=tm0!
tElvesMaxStep2! : rMaxRecR x=tm1!;y=tElvesSumsStep2!;z=tm0! tElvesMaxStep1! tElvesMaxStep0!
tElvesMaxStep3! : rLtZ x=tm0!
tElvesMaxStep4! : rLtS x=tm0!;y=tm1! tElvesMaxStep3!
tElvesMaxStep5! : rMaxRecR x=tm2!;y=tElvesSumsStep1!;z=tm1! tElvesMaxStep4! tElvesMaxStep2!
tElvesMaxStep6! : rLtZ x=tm0!
tElvesMaxStep7! : rLtS x=tm0!;y=tm1! tElvesMaxStep6!
tElvesMaxStep8! : rMaxRecL x=tm1!;y=tElvesSumsStep0!;z=tm2! tElvesMaxStep7! tElvesMaxStep5!

tMaxElf : rMaxNil y=tm2! tElvesMaxStep8!

Running the code above will produce S(S(0)) which is the number 2. Cool!

But, what does the generated code for the example input in the task look like? Well, it’s already around 7MB:

$ ls -al aoc1-1.btp
-rw-r--r--  1 bor0  staff  7285128 Dec  2 01:00 aoc1-1.btp
$ du -h aoc1-1.btp
6.9M	aoc1-1.btp

Running this through Budge-TP’s evaluator takes some time:

$ time budge-tp.py aoc1-1.btp > output
real	0m11.983s
user	0m10.334s
sys	0m1.642s
$ du -h output
 24K	output
$ grep -o S output | wc -l # calculate S's
   24000

And what about the actual input for my AoC account? Well, it took around 30 seconds to generate the Budge-TP code by the Python codegen script, and will very likely take a lot longer to evaluate the Budge-TP code.

$ ls -al output
-rw-r--r--  1 bor0  staff  1838703814 Dec  2 01:05 output
$ du -h output
1.7G	output

But hey! If someone asks you to prove that your solution is correct, at least you can give them a valid mathematical proof in only 1.7GB 🙂

The codegen script in Python has several phases for the generation:

The first phase calculates the maximal possible number for the given input and represents all the numbers up to it within Budge-TP.
The second phase uses rMkList and rTmNil from my earlier list example.
The third phase uses sum rules based on the lists. I’d tackled these in a different commit earlier.
The fourth phase combines all the sums into a single list.
The fifth and most complex phase uses rules for calculating the maximum number in a list (together with number inequality)

Some cool next steps would be to implement something that will generate code for more general tasks and not just code that solves a specific task like we did in this case.

Well, that’s a wrap! I wish you a merry Christmas and a happy new year! 🙂