Tuply singleton

I noticed this while at a company team meetup. Whenever we were discussing which place we should eat food at, most of my coworkers were paying attention to two numbers: rating and number of votes.

For example, Restaurant #1 can have a 5.0-star rating with 1 vote, and Restaurant #2 can have a 3.9-star rating with 17 votes.

This metric makes a tuple, where the first element is the rating and the second element is the vote count. It is represented by, e.g. (3.9, 17) in Haskell (and is of type (Double, Int)), or [3.9, 17] (or { fst: 3.9, snd: 17 }) in JavaScript.

So we need two elements (or numbers) to capture this metric. What is a good way to combine them in a single element, such that both can be extracted at any time?

In Haskell terms, fst and snd are of type (a, b) -> a and (a, b) -> b respectively. Since we only care about numbers in this case, we’ll make our lives easier, and essentially we’re looking for fst :: Double -> Double and snd :: Double -> Double.

What is a good way to encode this? Let’s see. Consider we have a prime number $p$ together with some numbers $x_1 \in \mathbb{R}, x_2 \in \mathbb{N}$ – our metric is now $(x_1, x_2)$ .

We know for sure that $x_1 \cdot p^{x_2}$ makes a single number – the argument is as follows: number powered to a number makes a number, and a number multiplied by a number makes a number.

So using this information we can make a single number out of two numbers – that is easy. The trickier bit is extracting two numbers (correctly) out of a single number. Will the way we used work? Let’s check. Remember our number was $x_1 \cdot p^{x_2}$ , which is really just $x_1 \cdot \underbrace{p \cdot p \cdot \ldots \cdot p}_{x_2 \ \text{times}}$ .

To decode we just have to use division. The question is, what do we divide, and can we be sure that it will always produce the correct result? Well, first of all, we have to be careful about which prime number we pick.

We know that $0 \leq x_1 \leq 5$ , so if we pick 5 as a prime number, and we’re dealing with 5-star ratings then the encoding will break since we will have a number like $5 \cdot 5 \cdot ... \cdot 5$ so we can’t really extract any meaningful data out of it⭐️. So we need to make sure that $x_1 < p$ . Let's consider $p = 7$ : $x_1 \cdot \underbrace{7 \cdot 7 \cdot \ldots \cdot 7}_{x_2 \ \text{times}}$ . This is better.

To extract $x_1$ we have to keep dividing our number with 7 until it is less than or equal to 5 and count the number of times we did that. For example if we had $x_1 \cdot 7^5$ , we divide that number by 7 just 5 times and then we reach $\leq 5$ , since $0 \leq x_1 \leq 5$ . This gives our first element.

Now that we have the first element, the second element will be easy. To extract $x_2$ out of $x_1 \cdot 7^{x_2}$ , we divide the number by the first element (which we already know how to get), and then use logarithm with base 7 to get the $x_2$ out of $7^{x_2}$ .

Finally, why is it important that $p$ is a prime number? Well, if it wasn't prime we couldn't really extract it. Assume $p$ is a composite number in $x_1 \cdot p^{x_2}$ . This means that in some cases $p$ may have some factors equal to $x_1$ – e.g. $p = 8 = 4 \cdot 4$ and $x_1 = 4$ ). Earlier we already⭐️ showed why this breaks the encoding.

With all this information, let's write the code in Haskell:

 
mkPair :: Double -> Double -> Double
mkPair a b = if a > 5 || a < 0 then error "Out of bounds" else a * (7 ** b)

fst' :: Double -> Double
fst' x = if x > 5 then fst' (x / 7) else x

snd' :: Double -> Double
snd' x = logBase 7 (x / fst' x)

Some tests:

> let p = mkPair 5.0 1 in printf "Pair: %f, First: %f, Second: %f\n" p (fst' p) (snd' p)
Pair: 35.0, First: 5.0, Second: 1.0
> let p = mkPair 3.9 17 in printf "Pair: %f, First: %f, Second: %f\n" p (fst' p) (snd' p)
Pair: 907259004550107.3, First: 3.9, Second: 17.0

Of course, it may break for some numbers:

> let p = mkPair 4.9 10 in printf "Pair: %f, First: %f, Second: %f\n" p (fst' p) (snd' p)
Pair: 1384128720.1000001, First: 4.8999999999999995, Second: 10.000000000000002

But that's only due to computers' limitation in terms of floating-point numbers, not our mathematical logic.

This concept is very similar to Gödel numbering and FRACTRAN.