Deriving derivative and integral

To reach to here, we first had to work out the point slope formula and then figure out limits. Derivatives are very powerful. This post was inspired by doing gradient descent on artificial neural networks, but I won’t cover that here. Instead we will focus on the very own definition of a derivative.

So let’s get started. A secant is a line that goes through 2 points. In the graph below, the points are A = (x, f(x)) and A' = (x + dx, f(x + dx)).

To derive a formula for this, we can use the point-slope form of a equation of a line: y - y_0 = \frac {y_1 - y_0} {x_1 - x_0} (x - x_0).

Plugging in the values, we get: f(x) - f(x + dx) = - \frac {f(x + dx) - f(x)} {dx} (dx).

What is interesting about this formula using the secant is that, as we will see, it provides us with a neat approximation at f(x).
Let’s define f_{new}(x, dx) = \frac {f(x + dx) - f(x)} {dx}. So now we have: f(x + dx) = f(x) + f_{new}(x, dx) (dx).

The limit as dx approaches 0 for f_{new} will give us the actual slope (according to the definition of an equation of a line) at x.

So, let’s define \lim_{dx \to 0} f_{new}(x, dx) = f'(x). This slope is actually our definition of a derivative. This definition lies at the heart of calculus.

The image below (taken from Wikipedia) demonstrates this for h = dx.

Back to the secant approximation, we now have: f(x + dx) \approx f(x) + f'(x) (dx). This is an approximation rather than an equivalence because we already calculated the limit for one term but not the rest. As dx -> 0, the approximation -> equivalence.

For example, to calculate the square of 1.5, we let x = 1 and dx = 0.5. Additionally, if f(x) = x^2 then f'(x) = x*2. So f(1 + 0.5) = f(x + dx) \approx f(1) + f'(1) 0.5 = 1 + 2 * 1 * 0.5 = 2. That’s an error of just 0.25 for dx = 0.5. Algebra shows for this particular case the error to be dx^2. For dx = 0.1, the error is just 0.01.

Pretty cool, right?

Here are some of the many applications to understand why derivatives are useful:

  • We can use the value of the slope to find min/max using gradient descent
  • We can determine the rate of change given the slope
  • We can find ranges of monotonicity
  • We can do neat approximations, as shown

Integrals allow us to calculate the area under a function’s curve. As an example, we’ll calculate the area of the function f(x) = x in the interval [0, 2]. Recall that the area of a rectangle with size w by h is w \cdot h. Our approach will be to construct many smaller rectangles and sum their area.

We start with the case n = 2 – two rectangles. We have the data points x = (1, 2), which give us two rectangles with width and height (1, f(1)) and (1, f(2)) respectively – note the width is constant because the spaced interval is distributed evenly. To sum the area of this spaced interval, we just sum 1 \cdot f(1) + 1 \cdot f(2) = 3. But note that there’s an error, since the rectangles do not cover the whole area. The main idea is the more rectangles, the less the error and the closer we get to the actual value.

Proceed with case n = 4. We have the data points x = (0.5, 1, 1.5, 2). Since we have four elements, in the range [0, 2] each element has width of \frac{2 - 0}{4} = 0.5. The result is 0.5 [ f(0.5) + f(1) + f(1.5) + f(2) ] = 2.5.

Having looked at these cases gives an idea to generalize. First, note the differences of the points in x – when n = 2, the difference between any consecutive points in x is 1, and when n = 4, the difference is 0.5. Generalizing for n, the difference between x_i and x_{i+1} will be \frac{b-a}{n}. Also, generalizing the summation gives \sum_{i=0}^n f(x_i) \cdot \Delta x_i, and since we only consider evenly spaced intervals we have \Delta x_i = \Delta x, for all i. This is called a Riemann sum and defines the integral \int _{a}^{b}f(x)\,\Delta x=\lim_{\Delta x \to 0}\sum_{i=0}^{n}f(x_{i})\Delta x, where \Delta x = \frac{b - a}{n}. Also, since a is the starting point, that gives x_i = a + i \cdot \frac{b-a}{n}.

Going back to the example, to find the interval for f(x) = x, we need to calculate \sum_{i=0}^{n}f(x_{i}) \frac{b - a}{n} = \frac{b - a}{n} \sum_{i=0}^{n}(a + i \cdot \frac{b-a}{n}). From here, we evaluate the inner sum \sum_{i=0}^{n}(a + i \cdot \frac{b-a}{n}) = (n+1)a + \frac{(b - a)(n+1)}{2}. Plugging back in gives (b - a + \frac{b}{n} - \frac{a}{n}) (a + \frac{b - a}{2}).

Now we can take the limit of this as n \to \infty. Note that \lim_{n \to \infty} \frac{a}{n} = 0 so we have (b - a)(a + \frac{b-a}{2}), which finally gives \frac{b^2 - a^2}{2}. This represents the sum of the area [a, b] under the curve of the function f(x) = x.

Definition of the mathematical limit

This post assumes knowledge in mathematical logic and algebra.
We will stick to sequences for simplicity, but the same reasoning can be extended to functions. For sequences we have one direction: infinity.

Informally, to take the limit of something as it approaches infinity is to determine its eventual value at infinity (even though it may not ever reach it).

As an example, consider the sequence a_n = \frac {1} {n}. The first few elements are: 1, 0.5, 0.(3), 0.25, and so on.

Note that \frac {1} {\infty} is undefined as infinity is not a real number. So here come limits to the rescue.

If we look at its graph, it might look something like this:

We can clearly see a trend that as x -> infinity, 1/x tries to “touch” the horizontal axis, i.e. is equal to 0. We write this as such: \lim_{n \to \infty} \frac {1} {n} = 0.

Formally, to say that \lim_{n \to \infty} a_n = a, we denote: (\forall \epsilon > 0) (\exists N \in \mathbb {N}) (\forall n \geq N) (|a_n - a| < \epsilon). Woot.

It looks scary but it’s actually quite simple. Epsilon is a range, N is usually a member that starts to belong in that range, and the absolute value part says that all values after that N belong in this range.

So for all ranges we can find a number such that all elements after that number belong in this range.

Why does this definition work? It’s because when the range is too small, all elements after N belong in it, i.e. the values of the sequence converge to it endlessly.

As an example, let’s prove that \lim_{n \to \infty} \frac {1} {n} = 0. All we need to do is find a way to determine N w.r.t. Epsilon and we are done.

Suppose Epsilon is arbitrary. Let’s try to pick N s.t. N = \frac {1} {\epsilon}.

Let’s see how it looks for \epsilon = \frac {1} {2}: for n > \frac {1} {\epsilon} = 2, we have: a_n < \epsilon. This is obviously true since 1/4 < 1/3 < 1/2. So there’s our proof.

This bit combined with the slope point formula form the derivative of a function, which will be covered in the next post.

Deriving point-slope from slope-intercept form

In this post we’ll derive the form of a linear equation between two points by simply knowing one thing:

Given y = cx + d, this line passes through the point A = (x, y).

The inspiration of this post is deriving the derivative.

So, let’s get started. We want to find an equation of a line that passes through A = (a, f(a)), B = (b, f(b)).

So we plug the points into the equation of a line to obtain the system:

\begin{cases} f(a) = ma + d \\ f(b) = mb + d \end{cases}

Solving for m, d:

\begin{cases} f(a) - ma = d \\ (f(b) - f(a))/(b - a) = m \end{cases}

To eventually conclude y - f(a) = m(x - a).

In some of the next posts, we will derive the formula of a derivative of a function.

Researching automated theorem provers

Lately I’ve been messing around with automated theorem provers. Specifically Prover9.

The moment you open it, if you use the GUI mode, you will see 2 big text boxes – assumptions and goals. Naturally, when playing one starts with simple stuff.

We leave the assumptions blank and set P \implies (Q \implies P). as our goal. If we click Start, we see that it proves it immediately.

% -------- Comments from original proof --------
% Proof 1 at 0.00 (+ 0.00) seconds.
% Length of proof is 2.
% Level of proof is 1.
% Maximum clause weight is 0.
% Given clauses 0.

1 P -> (Q -> P) # label(non_clause) # label(goal). [goal].
2 $F. [deny(1)].

============================== end of proof ==========================

That was fun, but what stood out for me is that in mathematics we often take axioms for “granted”, i.e. it’s rarely that we ever think about the logic behind it. But with Prover9, we need to express all of it using logic.

For Prover9 there is a general rule, that variables start with u through z lowercase. Everything else is a term (atom).

For start, let’s define set membership. We say \in denotes set membership, and that’s it. Because that’s all there’s to it, it’s like an atom. So let’s say that member(x, y) denotes that.

Now let’s define our sets A = {1}, B = {1, 2}, and C = {1}.

all x ((x = 1) <-> member(x, A)).
all x ((x = 1 | x = 2) <-> member(x, B)).
all x ((x = 1) <-> member(x, C)).

Now if we try to prove member(1, A), it will succeed. But it will not for member(2, A).

What do we know about the equivalence of 2 sets? They are equal if there doesn’t exist an element that is in the first and not in the second, or that is in the second and not in the first.

In other words:

set_equal(x, y) <-> - (exists z ((member(z, x) & - member(z, y)) | (- member(z, x) & member(z, y)))).

So, Prover9 can prove set_equal(A, C), but not set_equal(A, B).

Another example is that we can define the set of the naturals with just:

member(Zero, Nat).
member(x, Nat) -> member(Suc(x), Nat).

So, it can easily prove:
0: member(zero, Nat)
1: member(Suc(Zero), Nat)
2: member(Suc(Suc(Zero)), Nat), etc.

Relation between strict and non-strict identities

First, we’ll start with some definitions:

== is value check. For a and b to have the same value, we will say V(a, b). Thus a == b <=> V(a, b).

=== is value+type check. For a and b to have the same type, we will say T(a, b). Thus a === b <=> V(a, b) and T(a, b).

Now, to prove a === b => a == b, suppose that a === b. By the definitions, we have as givens V(a, b) and T(a, b). So we can conclude that V(a, b), i.e. a == b.

The contrapositive form is a != b => a !== b, which also holds.

However, note that the converse form a == b => a === b doesn’t necessarily hold. To see why, suppose a == b, that is V(a, b). Now we need to prove that V(a, b) and T(a, b). We have V(a, b) as a given, but that’s not the case for T(a, b), i.e. the types may not match.

So, whenever you see a === b you can safely assume that a == b is also true. The same holds for when you see a != b, you can safely assume that a !== b 🙂