Equation of the Day #18: Thoughts on a Fractal

In one of the first videos I watched on his channel, Grant Sanderson (3Blue1Brown) talks about fractals and fractal dimension. It’s a video that’s near and dear to my heart, as my final project for my first computer programming course in college was to compute the fractal dimension of the coastline of Crater Lake. As in his video, my program used the box-counting method to determine fractal dimension.[1] About halfway through the video, Grant shows three fractals and their dimensions, and he makes an interesting mistake.


Ah, but I’m getting ahead of myself – what is a fractal dimension? And for that matter, what is a fractal? If you don’t want to watch Grant’s video linked above (which you should, if you have the time), I’ll give a brief summary of the lesson Grant gives, but if you’ve seen his video then you may want to jump here.

A fractal is a shape that breaks some classical notions of geometry. Take, for example, the Sierpinski triangle, one of the simplest examples of a fractal.

Image source.

This shape is created iteratively. The instructions to create it are

  1. Start with a triangle.
  2. Make a triangle inside the original with its corners at the midpoints of each its sides. Remove this center triangle, leaving three smaller copies of the one you started with.
  3. Repeat step 2 with the smaller triangles.

Technically, none of these triangles is the true fractal. The true fractal is the shape acquired when an infinite number of iterations has been performed. What happens to the area of this shape as we iterate? If we consider the area of the initial triangle is 1, then after one iteration the area is 3/4; after two we get an area of 9/16; after three it’s 27/64. After n iterations, the area is A_n = \left(\frac{3}{4}\right)^n. As we take the limit as n → ∞, the area approaches . . . zero. This gets even more squirrelly if we construct the same fractal with a single, continuous curve.

Image source.

We could do similar math to determine the length of the curve and see if it approaches a respectable limit. We’ll take the side length of the triangle the curve fills as length 1.[2] In the first iteration of the curve, we get a length of 3/2, 9/4 for the second, 27/8 for the third, 81/16 for the fourth, and so on. The pattern, then, for the length of the curve after n iterations is

L_n = \left(\dfrac{3}{2}\right)^n.

Taking the limit as n → ∞ we get a length of . . . infinity. We’re 0 for 2 in finding a way to characterize this curve outside of its iterative rules.

Now, not all fractals are pathological in the same way that the Sierpinski triangle is. Take the Koch snowflake, for example.

Image source.

This fractal bounds a finite area in the infinite iteration limit; however, its boundary becomes infinitely long and is nowhere smooth. No matter how far you zoom in, it will exhibit a jagged, bumpy appearance. This scenario is the one that the father of fractal geometry, Benoit Mandelbrot, wrote about in his paper “How Long Is the Coast of Britain? Statistical Self-Similarity and Fractional Dimension.” In it, he discusses the problem of the coastline paradox. The paradox is that if you attempt to measure the length of the coastline of any landmass, the length you end up with depends on the size of the ruler you measure with. Coastlines are like the Koch snowflake; they are rough and jagged even at tiny scales.[3] So, if you can’t measure the coastline’s length, what can you measure about it?

Enter the fractal dimension. I don’t want to spend too much time explaining fractal dimension here (Grant does an excellent job in his video), so I’ll summarize briefly. If you have a line and you double its length, its one-dimensional measure (length) doubles. Pretty straightforward. If you take a two-dimensional square and double its scale so that its side lengths are all doubled, you end up with an object whose two-dimensional measure – its area – is quadrupled. A box in three dimensions has its volume increase by a factor of eight when all its sides are scaled up by 2.


If we generalize to a d-dimensional measure M, the pattern of these transformations is

L\to 2^1 L,\quad A\to 2^2A,\quad V\to 2^3V \quad \Rightarrow\quad M \to 2^d M.

This means that if we constructed cubes or spheres in higher dimensions, even though we can’t visualize them, we could imagine how they scale and how their higher-dimensional measure gets affected by that scaling. In general, for an arbitrary scale factor s, the formulae would be

L\to s^1 L,\quad A\to s^2A,\quad V\to s^3V \quad \Rightarrow\quad M \to s^d M.

For each of these integer-dimensional objects, the dimension is the power of two (the scaling amount) by which the measure of the object is scaled when all side lengths are doubled.[4] The way we generalize this is to consider how the mass of such an object is scaled with a doubling of all lengths associated with the object. Take the Sierpinski triangle.


The Sierpinski triangle is made up of three smaller copies of itself, each scaled down by a factor of two. Thus, we expect that a Sierpinski triangle with double the side lengths will have a mass three times larger than the original one. If we call the Sierpinski triangle’s measure M, then the pattern for the Sierpinski triangle is

M \to 3 M = 2^d M.

To determine the dimension d of the Sierpinski triangle, we need to find what power of the scaling factor (two) produces the number three. This is the problem that logarithms solve.

\begin{aligned}2^d &= 3\\ \log(2^d) &= \log(3)\\ d\log(2) &= \log(3)\\ d &= \frac{\log(3)}{\log(2)} = \log_2 3 \approx 1.585\\ \end{aligned}

By this math, we obtain a dimension of about 1.585. This is clearly not dimension one of a straight line, nor is it dimension two of a flat plane. It is in between, and this fits the definition of a fractal.[5]

This brings us to the interesting error in Grant’s video. Here’s the image again.


The first two fractals are well analyzed and you can check their fractal dimensions here and here. Interestingly, Grant mislabels the first fractal – known as a pentaflake – with 1.668 as its dimension, while its real dimension is about 1.672. This is just a rounding error, as you get 1.668 when plugging in a scale factor of 1.622 = 2.624 instead of the closer 2.618. The third fractal, which Grant calls “DiamondFractal” in his code for the video, is one that I cannot find an analysis of anywhere. (If anyone is aware of such an analysis, let me know!) The number for the Diamond Fractal’s dimension always seemed pretty high to me, since it doesn’t seem much more fractured than the pentaflake, and thus their dimensions should be similar. So, I did what I tend to do with these mathematical problems – I took a deep dive.[6]

To analyze self-similar fractals, we need to know the number of copies that make up the whole and the scale factor between the smaller copies and the whole. The number of copies is straightforward with the Diamond Fractal – there are four smaller copies of the whole on each side of the fractal. The scale factor is a tougher nut to crack. For this one, it makes sense to look at the first few iterations of the fractal to get a feel for how it is constructed.


The instructions to create this fractal are

  1. Start with a diamond.
  2. Arrange four copies of the last iteration in a 2×2 grid.
  3. Rotate the whole structure 45°.
  4. Scale down to match the height of the last iteration.
  5. Repeat.
Using the visuals above we can determine what the scale factor is by comparing subsequent iterations. If we take the side of an individual square to be length 1, then the diagonal of each square is \sqrt{2} . This means that the width of the first iteration is \sqrt{2} units, while the width of the second iteration is three units. The scale factor, then, is 3/\sqrt{2} , right? Let’s see what the fractal dimension is with that scale factor.

\begin{aligned} \left(\frac{3}{\sqrt{2}}\right)^d &= 4\\ d &= \dfrac{\log(4)}{\log\left(\dfrac{3}{\sqrt{2}}\right)} \approx 1.843 \end{aligned}

Here’s where Grant got the value of 1.843! Mystery solved! And yet, this isn’t the end of the story. Let’s compare the third iteration to the second. The width of the third iteration is 5\sqrt{2} units, and we know the second iteration has a width of three units. Calculating the fractal dimension from these two iterations yields

\begin{aligned} \left(\frac{5\sqrt{2}}{3}\right)^d &= 4\\ d &= \dfrac{\log(4)}{\log\left(\dfrac{5\sqrt{2}}{3}\right)} \approx 1.617 \end{aligned}

It changed. We could have anticipated that, as 3/\sqrt{2} \ne 5\sqrt{2}/3 , but the fact the the fractal dimension changed so much is startling. In fact, it’s now a smaller value than the dimension of the pentaflake! When a self-similar shape has a changing fractal dimension at different iterations, the true fractal dimension is defined as the limit of the calculated dimension as the number of iterations taken approaches infinity. Since the scale factor is the only thing that changes in the calculation, let’s list them and see if there’s a pattern.

\dfrac{3}{\sqrt{2}}, \dfrac{5\sqrt{2}}{3}, \dfrac{16}{5\sqrt{2}}, \dfrac{26\sqrt{2}}{16}, \dots

Do you see the pattern? I didn’t either at first. It’s more obvious if we rationalize the denominator so that the square root of two can be pulled out front.

\dfrac{3}{2}\sqrt{2}, \dfrac{5}{3}\sqrt{2}, \dfrac{8}{5}\sqrt{2}, \dfrac{13}{8}\sqrt{2}, \dots

The numbers that make up the numerator and the denominator follow a specific pattern. They are, in fact, the Fibonacci numbers, and the pattern for the scale factor in each iteration is

\dfrac{a}{b} \sqrt{2} \to \dfrac{a+b}{a} \sqrt{2}.

Taking the limit of this sequence as we iterate infinitely results in an equality:

\dfrac{a}{b} = \dfrac{a+b}{a} = \phi = \dfrac{1+\sqrt{5}}{2}.

Thus, the scale factor approaches the product of the square root of two and the golden ratio. Let’s plug this in to the formula for fractal dimension.

d = \dfrac{\log 4}{\log(\phi\sqrt{2})} \approx 1.675.

This, then, is the true fractal dimension of the Diamond Fractal. Let’s pause and appreciate how beautiful this is. With nothing but squares and iteration, we’ve created a shape whose geometry popped out two prevalent constants of mathematics: the square root of two (by virtue of the squares’ diagonal) and, less expectedly, the golden ratio (by virtue of the recursive nature of the iteration). This is wonderful.

Not only that, we now know why Grant calculated the wrong dimension to start with – he didn’t pursue the pattern beyond the first iteration![7] I like this because it illustrates a point he makes later in the video with the helical shape whose dimension changes at different length scales. It’s just surprising that a similar hiccup appears with a 2D, self-similar fractal.

Now that we’re at the end of the analysis, I’ll leave the more ambitious of you with some homework. Can you prove that the scale factor is always the ratio of two sequential Fibonacci numbers times the square root of two? The proof is involved, but not impossible. A hint is that the next Fibonacci number depends on the two previous Fibonacci numbers. If you can prove that the number of squares along one of the axes of the Diamond Fractal has the same or similar dependence, you’re on your way to a proof.

Footnotes

1. I found that the Crater Lake coastline has a dimension d = 1.18 for those who are curious.

2. Note that this is different than assuming that the area is 1. This is a different scenario, though, so we don’t need to stick to the same standards for each triangle separately. This would only matter if we wanted to compare the two.

3. They aren’t at infinitesimal scales; at some point you hit atoms and molecules, but a cartographer won’t be measuring at such a fine scale in practical application.

4. This also works for a point, which is zero-dimensional. Doubling all “lengths” of a point (which has no lengths) results in a scaling of the point’s measure by 1. In other words, scaling a point does nothing to it.

5. Technically, the definition of a fractal is a shape whose dimension exceeds its topological dimension.

6. This was actually a collaboration with my brother, Zach. He was the one who nailed the scale factor first, and thus deserves the credit.

7. I don’t blame Grant for this – he was just making a video to teach about fractals, and getting the true fractal dimension would have been a pain for his work schedule. It took my brother and me a couple days’ work puzzling it out to actually hit on the answer. The presentation here is very clean compared to the meandering and puzzling and sketching we did.

Equation of the Day #17: Confounding Compounding Interest

Eons ago in math class, I was taught the math behind compound interest. It’s an interesting problem with real world applications in financial investment; eventually, my class was shown an expression that looks like this,

\left(1 + \dfrac{1}{N}\right)^N,

and asked what this expression approaches when N → ∞. (I will set up the expression in a bit.) Naively, I expected that, since N becomes large, 1/N must be really small, so the inside of the parentheses approaches 1. Since 1 to any power is 1, it made sense to me that this expression approaches 1 as N becomes really large. But if you take your calculator and plug in values, you will find that this isn’t the case. In fact, for N = 1000, you get that

\left(1 + \dfrac{1}{1000}\right)^{1000} \approx 2.71692\dots.

Definitely not 1. In fact, this expression approaches a number which is widely recognized in mathematics today:

e = \displaystyle\lim_{N\to \infty} \left( 1 + \dfrac{1}{N} \right)^N \approx 2.71828\dots.

Why is this? What does the strange expression at the top have to do with base of the natural logarithm? My teenage self was utterly confused by this, and I wouldn’t be honest if I said this oddity hasn’t nagged at the back of my head since then. The reason for this confusion, I think, is because I’ve only ever looked at this problem as pure numbers – the expression I started with – but there is another way to see it. By the end of this article, I hope I can convince you that there is no way that the expression at the top could possibly be 1, and in fact must be the number e.

Let’s consider the problem of compound interest. Say you decide to invest $1 into an account which, after one year elapses, an incredibly generous bank gives you an interest rate of 100% of the dollar you invested. Thus, after one year you end up with a total value of $2. If you had instead invested $5, you would end up with $10 after one year, and so on. If we call the principal investment P, then the total amount of money you have after one year is 2P. So far, so good. But in reality, investment percentages are much lower. Say instead of a 100% interest rate, you have an interest rate of 5%. After one year elapsed, your $1 will have accrued interest equal to 5% of that dollar, or 5 cents. So, after one year, your total amount of money from your investment is $1.05. Instead of turning $1 into $2, you’ve turned 20 nickels into 21 nickels. This return is not as good, so you want to get as high an interest rate as possible. To be more general, we’ll call the percentage rate x. If you invest P dollars as principal, after one year you end up with your principal, P, plus the interest accrued, which is x times P. As a mathematical expression, this would be (1+x)P. Perhaps you might see a hint of things to come here.

Nothing I’ve said is particularly groundbreaking, so let’s spice things up a bit. Now let’s say the bank offers not just to apply interest to your account after one year but to apply interest halfway through the year as well. Your annual interest rate is still x, but this rate is divided up twice throughout the year. So, after half a year, you end up with your principal plus half the interest you would have had at the end of the year; in other words, you end up with your principal, P, and an interest of x/2 times P. A little algebraic manipulation gives

P + \dfrac{x}{2} P = \left(1 + \dfrac{x}{2}\right) P.

Then, at the end of the full year, you end up with the money you had halfway through the year, (1 + x/2)P, plus the interest accrued on this money value, (x/2)(1 + x/2)P. The result, after some algebraic manipulation, is

1\cdot\left(1 + \dfrac{x}{2}\right) P + \dfrac{x}{2} \left(1 + \dfrac{x}{2}\right) P = \left(1 + \dfrac{x}{2}\right) \left(1 + \dfrac{x}{2}\right) P = \left(1 + \dfrac{x}{2}\right)^2 P.

Let’s think of this in terms of our $1 principal at 100% interest. (We’ll tack the principal P back on at the end.) After half a year, you end up with $1.50, since half the interest rate is applied to the principal. After the full year, you end up with $2.25, or $1.50 plus half of $1.50 (75 cents). As a savvy investor, you realize that applying interest more often also nets you more cash, so you try to get the bank to compound its interest not just semiannually, but quarterly. The trend is much the same, and at the end of the year, with interest compounded four times, your $1 at 100% interest ends up being $2.44. For a general interest rate of x, your principal investment P becomes (1 + x/4)4P.

This is the origin of the first expression. If N is the number of times the interest is compounded, and the interest rate is 100% (x = 1), we get the expression I showed at the beginning. Plugging in numbers reveals to us that the return on investment indeed goes up, but it’s not clear from looking at the expression why it does so. So, let’s move away from the expression and instead view it by graphing the curve it describes.

First, let’s look at N = 1, or the relationship y = (1 + x/1)1.

Remember, the x-axis represents the percent interest applied to the principal. This graph is a line with slope 1, shifted from the center one unit. The shift is either up or to the left depending on which direction you prefer. As we’ll see shortly, I want to consider it a shift to the left by one unit.

Let’s increment the compounding frequency from N = 1 to 2. How does our return after one year look?

The equation is y = (1 + x/2)2. This, then, is a quadratic equation, so the shape above is a parabola. It curves upward, since the coefficient on x is positive. This parabola is also shifted, since its vertex is not at the origin, and this time the shift is clearly to the left by two units. To recap, the N = 1 curve was a degree 1 polynomial shifted to the left by 1, and the N = 2 curve was a degree 2 polynomial shifted to the left by 2. Will this pattern hold for N = 3?

It seems so! This is a cubic function (degree 3), shifted to the left by 3. In fact, if we rearrange the original expression for compounded interest, we can glean this fact.

\begin{aligned} y &= \left( 1 + \dfrac{x}{N} \right)^N = \left( \dfrac{N}{N} + \dfrac{x}{N} \right)^N\\ &= \left(\dfrac{x + N}{N} \right)^N = \dfrac{1}{N^N}(x + N)^N\\ &= f(x + N)\\ \end{aligned}

The final expression is the one we want chew on for a bit. What f(x + N) means is that we have an unshifted function, called f, whose input (x) has been increased by N units. This results in a shift of the output of f to the left by N units.

Why to the left? By increasing the input by a set amount, the function looks ahead in the input space. By looking ahead, a lower x value receives an output that should occur later, resulting in a leftward shift.

The purple function peaks before the gold function, since its inputs look ahead three units.
The value of the gold function at x = 0 occurs in the purple function at x = −3.

What is the unshifted function f(x) for this problem? It’s a polynomial of degree N with only one term (xN) scaled down by the factor NN,

f(x) = \dfrac{x^N}{N^N}.

Taking all of this together, we get this equation.


In words, the curve describing the money accrued after one year with a principal of $1 compounded N times at x percent interest is a polynomial of degree N shifted to the left by N units and scaled so that when x = 0, y = 1.[1]

The last statement is actually more obvious when the equation is in its original form, but you can check it easily enough by plugging in x = 0. The point (0, 1) is a fixed point when we vary N, and this is actually critical to understanding the limiting behavior of this function as N approaches infinity. As we compound the interest more times per year, the accrued value is described by an ever higher-degree polynomial function appropriately shifted and scaled to pass through (0, 1). In fact, forcing the curve through the point (0, 1) is the mathematical way of saying that an interest rate of 0% returns your principal ($1) with zero interest after one year, which is what should happen.

But let’s not get too far away from our goal: we want to know what happens when N gets really big. Let’s see what the function looks like when, say, N = 100.

That looks very similar to the elementary function exp(x). In fact, we can prove that in the limit N → ∞, the two functions are equal. What defines the function exp(x) is that

  1. The derivative of exp(x) is itself.
  2. exp(0) = 1.

Condition 2 is already met by our function y for all N. The condition left to prove, then, is that dy/dx = y in the limit N → ∞. Since y is just a polynomial, this is straightforward.

\begin{aligned} \dfrac{dy}{dx} &= \lim_{N\to\infty}\dfrac{d}{dx} \left(\dfrac{1}{N^N} (x+N)^N \right) \\[8pt] &= \lim_{N\to\infty}\dfrac{1}{N^N}\dfrac{d}{dx}(x+N)^N \\[8pt] &= \lim_{N\to\infty}\dfrac{N}{N^N} (x+N)^{N-1} \\[8pt] &= \lim_{N\to\infty}\dfrac{1}{N^{N-1}} (x+N)^{N-1} \\[8pt] &= \dfrac{1}{N^N} (x+N)^N \\[8pt] &= y \end{aligned}

The key step is noticing that N − 1 approaches N in the limit as N → ∞, hence the jump between the third- and second-to-last lines. So there we have it; if y describes money accrued after one year compounded N times at a rate of x percent, then y becomes exponential growth for continuously compounded interest (as N → ∞).

And now we can address the limit that kicked off this article. By framing this problem with variable interest rate (x), we get an expression describing a polynomial curve instead of a single number for a single interest rate (100%). The limiting behavior of the value obtained at x = 1 simply falls out of the behavior of the curve rather than arising from some black magic number nonsense. Since we’re talking about a family of polynomials, shifted left by a number of units equal to the degree of the polynomial and scaled to always pass through the point (0, 1), such a curve can only simplify to a value of 1 when the degree is zero, and taking the limit as the degree becomes as large as possible means that we can abandon the idea that y could equal 1 at any other input on the curve.

Actually taking the limit at x = 1 yields exp(1) = e, no confusion necessary.

Additionally, this framing of the problem illustrates a known property of exponential growth, namely that exp(x) grows faster than any polynomial. The reason is clear: exp(x) is the limit of taking a specific polynomial to infinite degree. It’s not even something we have to prove; it just falls out of the limit. I think that’s wonderful.

(Made in Manim.)

This is also an excellent reminder that exp(x) is special, not the base e. In fact, putting e on a pedestal is what leads to the opening equation being taught in math class (and thus the introduction of confusion to the student) instead of the more intuitive expression involving a variable input x.

Finally, I want to address the fact that the expression for y only touches on how much money is accrued after a single year while varying the interest rate, x. If instead we wanted to frame this as the amount of money accrued after t time with an interest rate r, we would just replace x with the product rt. If t is measured in years, r is the annual interest rate. With this shift in variables, N now represents the total number of times the interest was compounded over the entire time period t. In that case, we still end up with the same expressions, just with the horizontal axis scaled by the rate factor r, and we get the good old formula[2]

y = Pe^{rt}

for continuously compounded interest.

Footnotes

1. Shoutout to Kalid Azad at Better Explained for this kind of explanation.

2. Not to be confused with shampoo.

Equation of the Day #16: Spacetime

Special relativity is over a century old, and yet due to its odd nature it still confuses people. This is not necessarily our fault – we don’t really experience the effects of special relativity on human scales, since most of us can’t claim to have ever traveled or seen someone travel appreciably near the speed of light. Light speed is just incredibly fast compared to us, traveling one foot in about 1/1,000,000,000th of a second, or about a billion feet per second. Walking speed is about 5 feet per second, so it’s hard for us to even think of traveling anywhere close to light speed; indeed, most of us cannot even fathom what a billion feet looks like. For reference, the moon is, on average, 1.26 billion feet from Earth, so the Earth-Moon distance is a decent gauge of a billion feet. This means that light takes about 1.26 seconds to travel to the Moon from Earth.

Because special relativity is so inaccessible, the other mind-bending aspects of the model seem like they’re out of a fantasy setting. Discoveries made in the 19th century revealed that the speed of light in a vacuum is measured to be the same value by any observer moving at constant velocity. As long as a measurement device is traveling at a constant speed in a fixed direction, it will detect light traveling at 299,792,458 meters per second in a vacuum. Always. So if I’m driving down the highway at 60 mph (about 100 kilometers per hour), I don’t measure the light leaving my headlights as going 60 mph slower (about 299,792,430 meters per second); I still measure 299,792,458 meters per second! This is one of the two postulates of special relativity.

  • The laws of physics are identical for all non-accelerating observers/measurement devices.
  • The speed of light in a vacuum is the same for all observers/measurement devices, regardless of the motion of the light source.

This, understandably, perplexed scientists for years, as it throws our concepts of relative velocities out the window. After all, in everyday life, if you’re traveling on the highway at 70 mph, and the car next to you is traveling at 75 mph, you see the other car as moving 5 mph relative to your seat in your car.

Let’s see if we can reconcile these two seemingly contradicting ideas and start from the assumption that the speed of light doesn’t differ in moving frames of reference, provided those reference frames are moving at constant velocity. If this assumption is false, an experiment will be able to disprove it. The thing that doesn’t change between reference frames is a speed, which by definition is a distance traveled over some interval of time. Perhaps what can change, then, are the distance traveled and the time interval over which that distance is traveled, but they do so in such a way that the overall speed, the speed of light, is invariant. This would be a way for our assumption to stay true. Can we figure out the degree to which the time and/or distance would be altered to preserve the invariant speed of light?

Let’s apply the idea of a light clock. A light clock is a set of two mirrors set a fixed distance apart between which light bounces back and forth. Each time the light hits one of the mirrors, a tick is registered on the clock. If we set the the two mirrors to be exactly 299,792,458 meters apart, then each tick will be one second. if we set the two mirrors to be 29.9792458 cm apart (about 11.8 inches), the ticks would occur every nanosecond. This clock would be easier to construct, so let’s make a clock that ticks every nanosecond; therefore, after a billion ticks, the clock’s second reading would advance by one.

In the image above, the light clock will advance each time light reflects off of mirror A or mirror B. The length L is the distance between the mirrors, which we can take to be whatever length is needed for the desired ticking interval. At rest, the clock operates normally, which should come as no surprise. The time between ticks, then, is the distance traveled from one mirror to the next divided by the speed of the light, c, or

\Delta t = L/c.

The reason for the symbol c for the speed of light is that it comes from the Latin word celer, which means fast/swift (like in the word accelerate). But what happens if the clock starts moving at a constant velocity? We’ll call the clock’s velocity v. In this moving state, the light’s motion looks different; it’s a diagonal line.

Now the light has traveled a distance that’s longer than the the distance between the two mirrors! Since the speed of light is invariant, that means the clock ticked at a different rate than when it was at rest! Fortunately, we can use trigonometry to find this distance, since the distance D, the mirror length L, and the travel distance (the bottom line segment) all form a right triangle. The travel distance is simply the speed of the mirrors, v, multiplied by the elapsed time between ticks. We’ll call this time between ticks Δt′ to distinguish it from the rest time (since we anticipate something different). The time between ticks for this clock will be

\Delta t' = D/c = \sqrt{L^2+v^2 (\Delta t')^2}/c.

You may have noticed that this equation is a bit self-referential. If we rearrange it so that all of the Δt′ terms are isolated, we end up with the relationship

\Delta t' = \dfrac{L/c}{\sqrt{1-v^2/c^2}} = \gamma \Delta t,

where I substituted in the rest frame ticking interval, and defined the Lorentz factor,

\gamma = \dfrac{1}{\sqrt{1-v^2/c^2}}.

The Lorentz factor is the quantity that tells you you’re working with special relativity. For small values of v/c, γ is approximately 1. That means for speeds that are really small compared to light speed, moving clocks don’t appear to have measurable discrepancy between their ticking rates and the ticking rate of a clock at rest. However, when v is a significant fraction of c, the Lorentz factor γ increases, becoming larger than 1 and approaching infinity as v approaches c.

What does this mean for our moving clock? It means that the time between ticks is longer for the moving clock, making the moving clock appear to run slow. This is the phenomenon of time dilation; moving clocks tick slower. Note that this has nothing to do with the clock being faulty in some way; this is just a consequence of the speed of light being the same for all observers.

Now, because the time interval between events (in this case, ticks of a clock) changes due to the invariance of the speed of light, there must be some sort of trade-off involving distance, since

\text{speed} = \dfrac{\text{distance traveled}}{\text{travel time}}.

If moving clocks run slower, that means the travel time is shorter, so for the speed of light to be unchanging for moving observers, the distance traveled of a moving object must change to keep the speed of light invariant. Think about a light clock oriented on its side, so that the travel direction is along the long side of the clock.

The speed of light is unchanged, so if the clock ticks at a slower rate, the light must traverse a shorter distance, and the factor by which the distance is shortened should be the same factor by which the travel time is dilated:

L' = \dfrac{L}{\gamma}.

This phenomenon is called length contraction. Note that the Lorentz factor is in the denominator, which ensures our distance has gotten smaller. If we wanted to, we could play similar games with the light clock to derive this result, though we would need to factor two ticks of the clock, since the sideways light clock has two time intervals associated with its ticks: a longer “forward tick” and a shorter “backward tick.” Nonetheless, the result is the same.

There are two special quantities that arise from time dilation and length contraction known as the proper time and proper length. The proper time is the time measured by a clock in its own rest frame, and the proper time interval is the ticking rate of a clock in its own rest frame. Everyone agrees on the proper time, because everyone should agree what time is measured by an object in its own rest frame. Similarly, proper length is the length of an object in its own rest frame, which all observers must also agree on. These are two more quantities that can be considered invariant, or unchanging with reference frame, in addition to the speed of light. In fact, quite a bit of special relativity consists of finding invariant quantities since they are so useful in connecting reference frames.

This give and take between space and time is what led to the idea that time and space are linked together such that they are one entity: spacetime. The thing that connects space and time together is the invariant speed of light. In a future entry, I’ll talk about visualizing spacetime and how time dilation and length contraction warp how different observers/detectors view events.

Equation of the Day #15: Rays and Triangles

The way that lenses and mirrors play with light is well described by a branch of physics known as geometric optics or ray optics, and while the equations governing electromagnetic waves (i.e. light) involve an understanding of vector calculus, the principles of ray optics are completely accessible with a knowledge of triangles and no calculus. There are a few basic ideas to cover first, though.

The first two ideas to pin down are the Law of Reflection and Snell’s Law of Refraction. The Law of Reflection states that a ray of light (or a wave) will reflect off of an interface at an angle equal to the angle of approach. Graphically, the law of reflection looks like this:

with θi = θr. The dashed line is known as the “normal line,” where normal means that it is perpendicular to the interface the ray is reflecting off of; it also intersects the point of contact of the ray and the interface. In reality, all reflections by light on all surfaces obey the law of reflection; however, many surfaces are rough at microscopic scales, so when light reflects off of those surfaces you don’t see your reflection. (For instance, white paper reflects most of the light that hits it, but you can’t see your reflection because the paper’s rough surface reflects light randomly.)

Snell’s Law of Refraction says that when light transfers from one medium to another (e.g. air to water), its path is bent according to the rule

n_1 \sin\theta_1 = n_2\sin\theta_2.

This deflection is caused by the fact that light takes the path of least time. So, in a uniform medium (such as air near Earth’s surface), light travels in straight lines. If the speed of light changes, however, this straight line path can be bent. The best analogy I’ve seen to explain this phenomenon is the following: consider a lawnmower moving from a sidewalk to tall grass.

When the first wheel hits the grass, it starts to move more slowly. This causes the lawnmower to start to turn until the next wheel enters the grass, at which point the deflection is complete. Since light is a wave, a similar effect occurs. The reverse is also true (the lawnmower on the right). On the macroscopic level, this appears as light rays deflecting.


Snell’s Law and Law of Reflection in one image. Source: Wikimedia Commons.

These two laws form the basis of how curved mirrors and lenses work. Going forward, I think it’s a bit easier to work with focusing lenses, and then generalize the findings to mirrors.

Let’s consider parallel light rays entering a lens whose surfaces are spherically shaped. What do the outgoing rays look like according to Snell’s Law? Well, one of the special properties of spheres (and circles) is that all lines originating from the center of the sphere intersect the surface at right angles. So, the normal lines emanate from the center of the sphere. Let’s consider a convex lens, or a lens that bulges outward.

As the drawing above shows, the normal lines appear to converge on a single point within the lens. Since air has an index of refraction of nearly 1 (about 1.000293) and glass has an index of refraction around 1.5, the red light rays will bend toward the dashed normal lines upon entering the glass. If the other side of the lens is also convex, the light rays will converge even more, since leaving the glass will cause the light rays to deflect away from the normal lines.

We therefore call convex lenses converging lenses, since light rays entering the lens will be focused. Without getting too detailed about how Snell’s Law works at all points of the lens, we can make some sweeping statements about how curved lenses focus light if we make just a couple assumptions.

  • Light rays enter the lens nearly face-on, so angles of incidence are small.
  • The lens is spherical and thin.

These two assumptions lead to the three rules of ray tracing.

  1. Rays entering the lens parallel to the axis of symmetry of the lens converge on the focal point of the lens.
  2. Rays that enter the lens from the focal point emerge parallel to the lens’ axis of symmetry. (This is the reverse process of rule 1.)
  3. Rays that go through the direct center of the lens do not get deflected.

In fact, we will see that the first two rules lead to the third. Graphically, these ray tracings look like this:

Rays 1, 2, and 3 are represented by the yellow, magenta, and cyan rays in the image, respectively. The horizontal dashed line is called the optical axis and is the axis of rotational symmetry of the lens. The vertical dashed line represents the lens plane. The black arrow is the object being imaged by the lens, and the gray arrow on the right is the image of the black arrow focused by the lens. Finally, the quantity denoted as f is the focal length, which is the distance between the lens plane and the focal point. The question we want to answer is, “How do we predict where the image forms?” Looking at the above image, we can compare four triangles formed by the rays.

This is the previous image with more stuff going on, so let’s go through what’s new. The height of the object is h, and the height of the image is h′ (read “h prime”). The distance of the object from the lens plane is l, and the distance of the image from the lens plane is l′. In short, primed distances correspond to the image, while unprimed quantities correspond to the object. There are four triangles that are color coded to help in the following discussion. Since the two yellow triangles share two angles (one right angle and a pair of vertical angles), they are similar triangles. Therefore, the ratios of the short leg to the long leg for each triangle are equal, which can be rearranged as

-\dfrac{h'}{h} = \dfrac{f}{l-f}.

The negative sign is due to the fact that I’m considering h′, as drawn in the figure, to be a negative number, since it is below the axis of symmetry. By a similar argument, the purple triangles are also similar, and the ratio equality rearranges to be

-\dfrac{h'}{h} = \dfrac{l'-f}{f}.

Substituting the first expression into the second and doing a little more rearranging, we arrive at what’s known as the thin lens equation,

\dfrac{1}{f} = \dfrac{1}{l} + \dfrac{1}{l'}.

Simply put, if I know the focal length of the lens and the placement of my object, I can predict where the image will form using the reciprocal lengths shown above. This happens to work for all lenses, converging and diverging, and even for spherical mirrors, though the image is focused by reflection rather than transmission.

Now, this derivation only used rays 1 and 2 from the ray tracing rules above. To show that ray 3 is valid, I’m going to look at the tangent of the two large triangles in the image which contain the angles θ and θ′. Looking at tan θ,

\tan\theta = \dfrac{h}{l} = -\dfrac{h'f}{l(l'-f)} = -\dfrac{h'f}{fl'} = -\dfrac{h'}{l'} = \tan\theta'.

Note that I used the thin lens equation and the geometric formulas above to substitute various quantities for each other, but the end result that we find is that θ and θ′ are the same! This means that the line connecting those two angles, i.e. the central ray, is undeflected through the lens!

But what happens if the object is placed within the focal length of the lens? Well, that means that 1/l is larger than 1/f, and the image distance 1/l′ is negative! The physical interpretation of a negative image distance is that the image forms on the same side of the lens that the object is sitting on. When this happens, you can’t touch the image or project it onto a wall; it lives inside the lens, similarly to how your reflection in a flat mirror lives on the other side of the mirror. You can’t touch your reflection, since you would have to go through the mirror to do so; similarly, an image with negative image distance can’t be touched since there’s a lens in the way. We call these virtual images, since you cannot touch them or project them onto a wall or screen. A virtual image always has to be viewed through the lens. An example of a virtual image is the image you see when you use a magnifying glass to zoom in on an object.

All these triangles came from working with converging, or focusing, lenses, but what if I have a diverging, or defocusing, lens? It turns out we get several triangles like before, but with one caveat: the focal length is treated as negative. This should make some intuitive sense: a positive focal length positively focuses light, and a negative focal length negatively focuses, or defocuses, light.

Instead of parallel light rays converging on the focal point, they leave the lens appearing to originate from the focal point. When all is said and done, the thin lens equation emerges, exactly as before, so long as you keep track of negative quantities. What’s interesting about diverging lenses is that the image they form is always virtual, no matter where you place the object. Mathematically, this is due to l always being positive and f always being negative. So, you can never project an image with a diverging lens alone.

An everyday application of this equation is in prescription eyeglasses. If you happen to be nearsighted, your eyeglass prescription is for diverging lenses; nearsighted eyes have too powerful a lens (too short a focal length), so the lenses used to correct them have to weaken the focusing power of the eye. Conversely, farsightedness is treated with converging lenses; such eyes need a boost in focusing power.

Finally, I want to leave you with a curious coincidence. If you have a curved mirror that is carved from a large sphere, you can construct ray diagrams similarly to how we constructed the lens ray diagrams. The only difference is that where the rays end up is determined by the Law of Reflection instead of Snell’s Law. If you assume that rays hit the mirror nearly face-on, the rules for ray tracing for curved mirrors are

  1. Rays incident on the mirror parallel to its axis of symmetry converge on the focal point of the mirror, which lies at half the mirror’s radius of curvature.
  2. Rays incident on the mirror from the focal point emerge parallel to the axis of symmetry. (This is the reverse process of rule 1.)
  3. Rays that go through center of curvature of the mirror reflect back the way they entered.

Visually, these rays look like this:

If you play the similar triangles game that we played with lenses, you end up getting the mirror equation,

\dfrac{1}{f} = \dfrac{1}{l} + \dfrac{1}{l'},

which is identical to the thin lens equation! This allows telescopes, for instance, to use mirrors instead of lenses as the focusing elements to form images. To mathematically predict where diverging mirrors form images, simply use negative focal lengths, just like with lenses.

I hope I’ve laid out a nice foundation here, from which one could build the tools to understand cameras, film, telescopes, microscopes, and many other tools we have used not only to explore the vast reaches and smallest microcosms of the universe, but also the basis for the visual entertainment industry to record its productions. May it help you appreciate that small little lens on the back of your phone.

-A

Equation of the Day #14: The Harmonic Oscillator

The career of a young theoretical physicist consists of treating the harmonic oscillator in ever-increasing levels of abstraction.”
Sidney Coleman

One of the first physical systems students get to play with is the harmonic oscillator. An oscillator is a system with some property that varies repeatedly around a central value. This variation can be in displacement (such as a mass on a spring), voltage (like the electricity that comes out of the wall), or field strength (like the oscillations that make up light). When this variation has a fixed period, we call the motion harmonic oscillation, and the simplest harmonic oscillation is known as — you may have guessed — simple harmonic oscillation.

A simple harmonic oscillator is a system under the influence of a force proportional to the displacement of the system from equilibrium. Think of a mass on a spring; the further I pull the mass from the spring’s natural length, the more the spring pulls back, and if I push the mass so that the spring compresses, the spring will push against me. Mathematically, this is known as Hooke’s Law and can be written as

F=-k\Delta x

where F is the net force applied to the system, Δx is the displacement of the system from equilibrium, and k is some proportionality constant (often called the “spring constant”) which tells us how strongly the spring pushes or pulls given some displacement – a larger k indicates a stronger spring. If I let go of the mass when it is displaced from equilibrium, the mass will undergo oscillatory motion. What makes this “simple” is that we’re ignoring the effects of damping or driving the oscillation with an outside force.

How do we know that such a restoring force causes oscillatory motion? Utilizing Newton’s second law,

\begin{aligned} F &= ma\\ &= -kx\\ \Rightarrow \dfrac{d^2x}{dt^2} &= -\dfrac{k}{m} x \end{aligned}

The solution to this equation is sinusoidal,

x(t) = A\cos(\omega t +\phi),

where A is the amplitude of oscillation (the farthest the mass gets from equilibrium),

\omega \equiv \sqrt{\dfrac{k}{m}}

is the angular frequency of oscillation, and ϕ is the phase, which captures the initial position and velocity of the mass at time t = 0. The period is related to the angular frequency by

T=\dfrac{\tau}{\omega}

For this reason, harmonic oscillators are useful timekeepers, since they oscillate at regular, predictable intervals. This is why pendulums, coiled springs, and currents going through quartz crystals have been used as clocks. What other physical systems does this situation apply to? Well, if you like music, simple harmonic oscillation is what air undergoes when you play a wind, string, or membrane instrument. What you’re doing when you play an instrument (or sing) is forcing air, string(s), or electric charge (for electronic instruments) out of equilibrium. This causes the air, string(s), and voltage/current to oscillate, which creates a tone. Patch a bunch of these tones together in the form of chords, melodies, and harmonies, and you’ve created music. A simpler situation is blowing over a soda/pop bottle. When you blow air over the mouth of the bottle, you create an equilibrium pressure for the air above the mouth of the bottle. Air that is slightly off of this equilibrium will oscillate in and out of the bottle, producing a pure tone.


Image: Wikipedia

Now for the fun part: what happens when we enter the quantum realm? Quantum mechanics says that the energy of a bound system is quantized, and an ideal harmonic oscillator is always bound. The total energy of a harmonic oscillator is given by

\begin{aligned} E &= \dfrac{1}{2} mv^2 + \dfrac{1}{2} kx^2\\  &= \dfrac{1}{2m} \left(p^2 + (m\omega x)^2\right), \end{aligned}

where the first term is the kinetic energy, or energy of motion, and the second term is the potential energy, or energy due to location. I used the facts that p = mv and k = 2 to go from the first line to the second line. The quantum prescription says that p and x become mathematical operators, and the energy takes a role in the Schrödinger equation. For the harmonic oscillator, solving the Schrödinger equation yields the differential equation

\begin{aligned} \dfrac{\hbar}{m\omega}\dfrac{d^2\psi}{dx^2} + \left(\dfrac{2E}{\hbar\omega} - \dfrac{m\omega}{\hbar}\, x^2 \right) \psi(x) = 0 \end{aligned}

where ħ is the (reduced) Planck constant, and ψ is the quantum mechanical wave function. After solving this differential equation, the allowed energies turn out to be

E_n = \hbar\omega \left(n+\dfrac{1}{2}\right)

where n = 0, 1, 2, . . . is a nonnegative integer. Unlike the classical picture, the quantum states of the harmonic oscillator with definite energy are stationary and spread out over space, with higher energy states spread out more than lower energy states. There is a way, though, to produce an oscillating state of the quantum harmonic oscillator by preparing a superposition of pure energy states, forming what’s known as a coherent state, which actually does behave like the classical mass on a spring. It’s a weird instance of classical behavior in the quantum realm!


Classical simple harmonic oscillators compared to quantum wave functions of the simple harmonic oscillator.
Image: Wikipedia

An example of a quantum harmonic oscillator is a molecule formed by a pair of atoms. The bond between the two atoms gives rise to a roughly harmonic potential, which results in molecular vibrational states, like two quantum balls attached by a spring. Depending on the mass of the atoms and the strength of the bond, the molecule will vibrate at a specific frequency, and this frequency tells physicists and chemists about the bond-lengths of molecules and what those bonds are made up of. In fact, the quantum mechanical harmonic oscillator is a major topic of interest because the potential energy between quantum objects can often be approximated as a Hooke’s Law potential near equilibrium, even if the actual forces at play are more complex at larger separations.

Additionally, the energy structure of the harmonic oscillator predicts that energies are equally spaced by the amount ħω. This is a remarkable feature of the quantum harmonic oscillator, and it allows us to make a toy model for quantum object creation and annihilation. If we take the energy unit ħω as equal to the rest energy of a quantum object by Einstein’s E = mc2, we can think of the quantum number n as being the number of quantum objects in a system. This idea is one of the basic results of quantum field theory, which treats quantum objects as excitations of quantum fields that stretch over all of space and time. This is what the opening quote is referring to; physicists start off learning the simple harmonic oscillator as classical masses on springs or pendulums oscillating at small angles, then they upgrade to the quantum treatment and learn about its regular energy structure, and then to upgrade to the quantum field treatment where the energies are treated as a number of quantum objects arising from a omnipresent quantum field. I find it to be one of the most beautiful aspects of Nature that such a simple system recurs at multiple levels of our physical understanding of reality.

-A

Equation of the Day #13: Chaos

Chaos is complexity that arises from simplicity. Put in a clearer way, it’s when a deterministic process leads to complex results that seem unpredictable. The difference between chaos and randomness is that chaos is determined by a set of rules/equations, while randomness is not deterministic. Everyday applications of chaos include weather, the stock market, and cryptography. Chaos is why everyone (including identical twins who have the same DNA) has different fingerprints. And it’s beautiful.

How does simplicity lead to complexity? Let’s take, for instance, the physical situation of a pendulum. The equation that describes the motion of a pendulum is

\dfrac{d^2\theta}{dt^2} = -\dfrac{g}{l} \sin\theta

where θ is the angle the pendulum makes with the imaginary line perpendicular to the ground, l is the length of the pendulum, and g is the acceleration due to gravity. This leads to an oscillatory motion; for small angles, the solution of this equation can be approximated as

\theta(t) \approx A\cos\left( \sqrt{\dfrac{g}{l}} t\right)

where A is the amplitude of the swing (in radians). Very predictable. But what happens when we make a double pendulum, where we attach a pendulum to the bottom of the first pendulum?


Can you predict whether the bottom pendulum will flip over the top? (Credit: Wikimedia Commons)

It’s very hard to predict when the outer pendulum flips over the inner pendulum mass; however, the process is entirely determined by a set of equations governed by the laws of physics. And, depending on the initial angles of the two pendula, the motion will look completely different. This is how complexity derives from simplicity.

Another example of beautiful chaos is fractals. Fractals are structures that exhibit self-similarity, are determined by a simple set of rules, and have infinite complexity. An example of a fractal is the Sierpinski triangle.


(Image: Wikipedia)

The rule is simple: start with a triangle, then divide that triangle into four equal triangles. Remove the middle one. Repeat with the new solid triangles you produced. The true fractal is the limit when the number of iterations reaches infinity. Self-similarity happens as you zoom into any corner of the triangle; each corner is a smaller version of the whole (since the iterations continue infinitely). Fractals crop up everywhere, from the shapes of coastlines to plants to frost crystal formation. Basically, they’re everywhere, and they’re often very cool and beautiful.

Chaos is also used in practical applications, such as encryption. Since chaos is hard to predict unless you know the exact initial conditions of the chaotic process, a chaotic encryption scheme can be told to everyone. One example of a chaotic map to disguise data is the cat map. Each iteration is a simple matrix transformation of the pixels of an image. It’s completely deterministic, but it jumbles the image to make it look like garbage. In practice, this map is periodic, so as long as you apply the map repeatedly, you will eventually get the original image back. Another application of chaos is psuedorandom number generators (PRNGs), where a hard-to-predict initial value is manipulated chaotically to generate a “random” number. If you can manipulate the initial input values, you can predict the outcome of the PRNG. In the case of the Pokémon games, the PRNGs have been examined so thoroughly that, using a couple programs, you can capture or breed shininess/perfect stats.

So that’s the beauty of chaos. Next time you look at a bare tree toward the end of autumn or lightning in a thunderstorm, just remember that the seemingly unpredictable branches and forks are created by simple rules of nature, and bask in its complex beauty.

-A

Equation of the Day #12: 12

A while ago I stumbled across this image (which I recreated and cleaned up a bit). It’s a beautiful image. Arranged around the edge is the circle of fifths, which in music is a geometric representation of the twelve tones of the Western scale arranged so the next note is seven semitones up (going clockwise in this figure). The notes are all connected in six different ways to the other notes in the “circle,” known as intervals, which are color-coded at the bottom. I thought, “Wow, this is a really cool way to represent this geometrically. How neat!” However, I found the original website that the image came from, and it’s a pseudoscience site that talks about the fractal holographic nature of the universe. While fractals do show up in Nature a lot, and there are legitimate theories1 proposing that the Universe may indeed be a hologram, what their site is proposing is, to put it lightly, utter nonsense. But instead of tearing their website apart (which would be rather cathartic), I instead want to point out the cool math going on here, because that sounds more fun!

Looking at the bottom of the graphic, you’ll notice six figures. The first (in red) is a regular dodecagon, a polygon with twelve equal sides and angles. This shape is what forms the circle of fifths. The rest of the shapes in the sequence are dodecagrams, or twelve-pointed stars. The first three are stars made up of simpler regular polygons; the orange star is made up of two hexagons, the yellow is made up of three squares, and the green one is made up of four triangles. The final dodecagram (in purple) can be thought of as made up of six straight-sided digons, or line segments. These shapes point to the fact that twelve is divisible by five unique factors (not including itself): one set of twelve, two sets of six, three sets of four, four sets of three, and six sets of two! You could say that the vertices of the dodecagon finalize the set as twelve sets of one, but they’re not illustrated in this image. So really, this image has less to do with musical intervals and more to do with the number 12, which is a rather special number. It is a superior highly composite number, which makes it a good choice as a number base (a reason why feet are divided into twelve inches, for instance, or why our clocks have twelve hours on their faces).

The final dodecagram in cyan is not made up of any simpler regular polygons because the number 12 is not divisible by five. If you pick a note in the circle of fifths to start on, you’ll notice that the two cyan lines that emanate from it connect to notes that are five places away on the “circle,” hence the connection to the number 5. In fact, it would be far more appropriate to redraw this figure with a clock face.

This new image should shed some more light on what’s really going on. The dodecagrams each indicate a different map from one number to another, modulo 12. The only reason this is connected to music at all is due to the fact that a Western scale has twelve tones in it! If we used a different scale, such as a pentatonic scale (with five tones, as the name would suggest), we’d get a pentagon enclosing a pentagram. Really, this diagram can be used to connect any two elements in a set of twelve. The total number of connecting lines in this diagram, then, is

\begin{pmatrix} 12\\2 \end{pmatrix} = T_{11} = \dfrac{1}{2} (12)(11) = 66

where the notation in parentheses is “n choose 2,” and T_n is a triangular number. This figure is known in math as K_{12}, the complete graph with twelve nodes. And it’s gorgeous.

So while this doesn’t really have anything to do with music or some pseudoscientific argument for some fancy-sounding, but ultimately meaningless, view on the universe, it does exemplify the beauty of the number 12, and has a cool application to the circle of fifths.

-A

1 “Holographic principle.” Wikipedia.

Equation of the Day #11: The Fourier Transform

gaussianft

Today I wanted to talk about one of my favorite equations in all of mathematics. However, I won’t do it justice without building up some framework that puts it into perspective. To start out, let’s talk about waves.

A wave, in general, is any function that obeys the wave equation. To simplify things, though, let’s look at repeating wave patterns.

The image above depicts a sine wave. This is the shape of string and air vibration at a pure frequency; as such, sinusoidal waveforms are also known as “pure tones.” If you want to hear what a pure tone sounds like, YouTube is happy to oblige. But sine waves are not the only shapes that a vibrating string could make. For instance, I could make a repeating pattern of triangles (a triangle wave),

or rectangles (a square wave),

Now, making a string take on these shapes may seem rather difficult, but synthesizing these shapes to be played on speakers is not. In fact, old computers and video game systems had synthesizers that could produce these waveforms, among others. But let’s say you only know how to produce pure tones. How would you go about making a square wave? It seems ridiculous; pure tones are curvy sine waves, and square waves are choppy with sharp corners. And yet a square wave does produce a tone when synthesized, and that tone has a pitch that corresponds to how tightly its pattern repeats — its frequency — just like sine waves.

As it turns out, you can produce a complex waveform by adding only pure tones. This was discovered by Jean-Baptiste Joseph Fourier, an 18th century scientist. What he discovered was that sine waves form a complete basis of functions, or a set of functions that can be used to construct other well-behaved, arbitrary functions. However, these sine waves are special. The frequencies of these sine waves must be harmonics of the lowest frequency sine wave.


Image: Wikipedia

The image above shows a harmonic series of a string with two ends fixed (like those of a guitar or violin). Each frequency is an integer multiple of the lowest frequency (that of the top string, which I will call \nu_1 = 1/T , where \nu is the Greek letter “nu.”), which means that the wavelength of each harmonic is an integer fraction of the longest wavelength. The lowest frequency sine wave, or the fundamental, is given by the frequency of the arbitrary wave that’s being synthesized, and all other sine waves that contribute to the model will have harmonic frequencies of the fundamental. So, the tone of a trumpet playing the note A4 (440 Hz frequency) will be composed of pure tones whose lowest frequency is 440 Hz, with all other pure tones being integer multiples of 440 Hz (880, 1320, 1760, 2200, etc.). As an example, here’s a cool animation showing the pure tones that make up a square wave:


Animation: LucasVB on Wikipedia

As you can see in the animation, these sine waves will not add up equally; typically, instrument tones have louder low frequency contributions than high frequency ones, so the amplitude of each sine wave will be different. How do we determine the strengths of these individual frequencies? This is what Fourier was trying to determine, albeit for a slightly different problem. I mentioned earlier that sine waves form a complete basis of functions to describe any arbitrary function (in this case, periodic waveforms). This means that, when you integrate the product of two sine waves within a harmonic series over the period corresponding to the fundamental frequency (T = 1/\nu_1 ), the integral will be zero unless the two sine waves are the same. More specifically,

\displaystyle \int_{-T/2}^{T/2} \sin \left(\dfrac{m\tau t}{T}\right) \sin \left(\dfrac{n\tau t}{T}\right) dt = \begin{cases} 0, & m\ne n\\ T/2, & m = n \end{cases},

where \tau = C/r is the circle constant. Because of this trick, we can extract the amplitudes of each sine wave contributing to an arbitrary waveform. Calling the arbitrary waveform f(t) and the fundamental frequency 1/T,

\begin{aligned} f(t) &= \displaystyle\sum_{m=1}^{\infty} b_m \sin\left(\dfrac{m\tau t}{T}\right) \\[8pt] \displaystyle \int_{-T/2}^{T/2} f(t)\sin\left(\dfrac{n\tau t}{T}\right)\, dt &= \displaystyle\int_{-T/2}^{T/2}\sum_{m=1}^{\infty} b_m \sin\left(\dfrac{m\tau t}{T}\right) \sin\left(\dfrac{n\tau t}{T}\right)\, dt\\[8pt] &= \displaystyle\sum_{m=1}^{\infty} b_m \int_{-T/2}^{T/2}\sin\left(\dfrac{m\tau t}{T}\right) \sin\left(\dfrac{n\tau t}{T}\right)\, dt \\[8pt] &= \dfrac{T}{2} \, b_n \\[8pt] b_n &= \dfrac{2}{T} \displaystyle \int_{-T/2}^{T/2} f(t)\sin\left(\dfrac{n\tau t}{T}\right)\, dt. \end{aligned}

This is how we extract the amplitudes of each pure tone that makes up the tone we want to synthesize. The trick was subtle, so I’ll describe what happened there line by line. The first line shows that we’re breaking up the arbitrary periodic waveform f(t) into pure tones, a sum over sine waves with frequencies m/T, with m running over the natural numbers. The second line multiplies both sides of line one by a sine wave with frequency n/T, with n being a particular natural number, and integrating over one period of the fundamental frequency, T. It’s important to be clear that we’re only summing over m and not n; m is an index that takes on multiple values, but n is one specific value! The third line is just swapping the order of taking the sum vs. taking the integral, which is allowed since integration is a linear operator. The fourth line is where the magic happens; because we’ve integrated the product of two sine waves, we get a whole bunch of integrals on the right hand side of the equation that are zero, since m and n are different for all terms in the sum except when m = n. This integration trick has effectively selected out one term in the sum, in doing so giving us the formula to calculate the amplitude of a given harmonic in the pure tone sum resulting in f(t).

This formula that I’ve shown here is how synthesizers reproduce instrument sounds without having to record the instrument first. If you know all the amplitudes bn for a given instrument, you can store that information on the synthesizer and produce pure tones that, when combined, sound like that instrument. To be completely general, though, this sequence of pure tones, also known as a Fourier series, also includes cosine waves as well. This allows the function to be displaced by any arbitrary amount, or, to put it another way, accounts for phase shifts in the waveform. In general,

\begin{aligned} f(t) &= \dfrac{a_0}{2} + \displaystyle\sum_{n=1}^{\infty} a_n \cos\left(\dfrac{n\tau t}{T}\right) + b_n \sin\left(\dfrac{n\tau t}{T}\right),\\[8pt] a_n &= \dfrac{2}{T} \displaystyle \int_{-T/2}^{T/2} f(t)\cos\left(\dfrac{n\tau t}{T}\right)\, dt,\\[8pt] b_n &= \dfrac{2}{T} \displaystyle \int_{-T/2}^{T/2} f(t)\sin\left(\dfrac{n\tau t}{T}\right)\, dt, \end{aligned}

or, using Euler’s identity,

\begin{aligned} f(t) &= \displaystyle\sum_{n=-\infty}^{\infty} c_n e^{in\tau t/T},\\[8pt] c_n &= \dfrac{1}{T} \displaystyle \int_{-T/2}^{T/2} f(t)e^{-in\tau t/T}\, dt. \end{aligned}

The collection of these coefficients is known as the waveform’s frequency spectrum. To show this in practice, here’s a waveform I recorded of me playing an A (440 Hz) on my trumpet and its Fourier series amplitudes,

Each bar in the cn graph is a harmonic of 440 Hz, and the amplitudes are on the same scale for the waveform and its frequency spectrum. For a trumpet, all harmonics are present (even if they’re really weak). I admittedly did clean up the Fourier spectrum to get rid of noise around the main peaks to simplify the image a little bit, but know that for real waveforms the Fourier spectrum does have “leakage” outside of the harmonics (though the contribution is much smaller than the main peaks). The first peak is the fundamental, or 440 Hz, followed by an 880 Hz peak, then a 1320 Hz peak, a 1760 Hz peak, and so on. The majority of the spectrum is concentrated in these four harmonics, with the higher harmonics barely contributing. I also made images of the Fourier series of a square wave and a triangle wave for the curious. Note the difference in these spectra from each other and from the trumpet series. The square wave and triangle wave only possess odd harmonics, which is why their spectra look more sparse.

One of the best analogies I’ve seen for the Fourier series is that it is a recipe, and the “meal” that it helps you cook up is the waveform you want to produce. The ingredients are pure tones — sine waves — and the instructions are to do the integrals shown above. More importantly, the Fourier coefficients give us a means to extract the recipe from the meal, something that, in the realm of food, is rather difficult to do, but in signal processing is quite elegant. This is one of the coolest mathematical operations I’ve ever learned about, and I keep revisiting it over and over again because it’s so enticing!

Now, this is all awesome math that has wide applications to many areas of physics and engineering, but it has all been a setup for what I really wanted to showcase. Suppose I have a function that isn’t periodic. I want to produce that function, but I still can only produce pure tones. How do we achieve that goal?

Let’s say we’re trying to produce a square pulse.


One thing we could do is start with a square wave, but make the valleys larger to space out the peaks.

As we do this, the peaks become more isolated, but we still have a repeating waveform, so our Fourier series trick still works. Effectively, we’re lengthening the period T of the waveform without stretching it. Lengthening T causes the fundamental frequency \nu_1 to approach 0, which adds more harmonics to the Fourier series. We don’t want \nu_1 to be zero, though, because then n\nu_1 will always be zero, and our Fourier series will no longer work. What we want is to take the limit as T approaches infinity and look at what happens to our Fourier series equations. To make things a bit less complicated, let’s look at what happens to the cn treatment. Let’s reassign some values,

n/T \to \nu_n,\quad 1/T \to \Delta\nu,\\[8pt] \begin{aligned} \Rightarrow f(t) &= \displaystyle\sum_{n=-\infty}^{\infty} c_n e^{i\tau\nu_n t},\\[8pt] c_n &= \Delta \nu \displaystyle \int_{-T/2}^{T/2} f(t)e^{-i\tau\nu_n t}\, dt. \end{aligned}

Here, \nu_n are the harmonic frequencies in our Fourier series, and \Delta \nu is the spacing between harmonics, which is equal for the whole series. Substituting the integral definition of cn into the sum for f(t) yields

\begin{aligned} f(t) &= \displaystyle\sum_{n=-\infty}^{\infty} e^{i\tau\nu_n t}\Delta \nu \displaystyle \int_{-T/2}^{T/2} f(t')e^{-i\tau\nu_n t'}\, dt'\\[8pt] f(t) &= \displaystyle\sum_{n=-\infty}^{\infty} F(\nu_n) e^{i\tau\nu_n t}\Delta \nu,\\[8pt] \end{aligned}

where

F(\nu_n) = \dfrac{c_n}{\Delta\nu} = \displaystyle\int_{-T/2}^{T/2} f(t')e^{-i\tau\nu_n t'}\, dt'.

The reason for the t' variable is to distinguish the dummy integration variable from the time variable in f(t). Now all that’s left to do is take the limit of the two expressions as T goes to infinity. In this limit, the \nu_n smear into a continuum of frequencies rather than a discrete set of harmonics, the sum over frequencies becomes an integral, and \Delta\nu becomes an infinitesimal, d\nu . Putting this together, we arrive at the equations

\begin{aligned} F(\nu) &= \displaystyle\int_{-\infty}^{\infty} f(t)e^{-i\tau\nu t}\, dt,\\[8pt] f(t) &= \displaystyle\int_{-\infty}^{\infty} F(\nu)e^{i\tau\nu t}\, d\nu. \end{aligned}

These equations are the Fourier transform and its inverse. The first takes a waveform in the time domain and breaks it down into a continuum of frequencies, and the second returns us to the time domain from the frequency spectrum. Giving the square pulse a width equal to a, a height of unity, and plugging it into the Fourier transform, we find that

\begin{aligned} F(\nu) &= \displaystyle\int_{-\infty}^{\infty} f(t)e^{-i\tau\nu t}\, dt\\[8pt] &= \displaystyle\int_{-a/2}^{a/2} e^{-i\tau\nu t}\, dt\\[8pt] &= \left . -\dfrac{e^{-i\tau\nu t}}{i\tau\nu}\right|_{t=-a/2}^{a/2} = \dfrac{e^{ia\tau\nu/2}-e^{-ia\tau\nu/2}}{i\tau\nu}\\[8pt] F(\nu) &= \dfrac{\sin(a\tau\nu/2)}{\tau\nu/2} = a\, {\rm sinc}(a\tau\nu/2). \end{aligned}

Or, graphically,

This is one of the first Fourier transform pairs that students encounter, since the integral is both doable and relatively straightforward (if you’re comfortable with complex functions). This pair is quite important in signal processing since, if you reverse the domains of each function, the square pulse represents a low pass frequency filter. Thus, you want an electrical component whose output voltage reflects the sinc function on the right. (I swapped them here for the purposes of doing the easier transform first, but the process is perfectly reversible).

Let’s look at the triangular pulse and its Fourier transform,

If you think the frequency domain looks similar to that of the square pulse, you’re on the right track! The frequency spectrum of the triangular pulse is actually the sinc function squared, but the integral is not so straightforward to do.

And now, for probably the most enlightening example, the Gaussian bell-shaped curve,

The Fourier transform of a Gaussian function is itself, albeit with a different width and height. In fact, the Gaussian function is part of a family of functions which have themselves as their Fourier transform. But that’s not the coolest thing here. What is shown above is that a broad Gaussian function has a narrow range of frequencies composing it. The inverse is also true; a narrow Gaussian peak is made up of a broad range of frequencies. This has applications to laser operation, the limit of Internet download speeds, and even instrument tuning, and is also true of the other Fourier transform pairs I’ve shown here. More importantly, though, this relationship is connected to a much deeper aspect of physics. That a localized signal has a broad frequency makeup and vice versa is at the heart of the Uncertainty Principle, which I’ve discussed previously. As I mentioned before, the Uncertainty Principle is, at its core, a consequence of wave physics, so it should be no surprise that it shows up here as well. However, this made the Uncertainty Principle visceral for me; it’s built into the Fourier transform relations! It also turns out that, in the same way that time and frequency are domains related by the Fourier transform, so too are position and momentum:

\begin{aligned} \phi(p) &= \dfrac{1}{\sqrt{\tau\hbar}}\displaystyle\int_{-\infty}^{\infty} \psi(x)e^{-ipx/\hbar}\, dx,\\[8pt] \psi(x) &= \dfrac{1}{\sqrt{\tau\hbar}}\displaystyle\int_{-\infty}^{\infty} \phi(p)e^{ipx/\hbar}\, dp. \end{aligned}

Here, \psi(x) is the spatial wavefunction, and \phi(p) is the momentum-domain wavefunction.

Whew! That was a long one, but I hope I’ve done justice to one of the coolest — and my personal favorite — equations in mathematics.

UPDATE: In the above figures where I graph cn vs. frequency, the label should really be |cn|, since in general cn are complex numbers.

-A

Equation of the Day #10: Golden Pentagrams

Ah, the pentagram, a shape associated with a variety of different ideas, some holy, some less savory. But to me, it’s a golden figure, and not just because of how I chose to render it here. The pentagram has a connection with a number known as the golden ratio, which is defined as

\begin{aligned} \phi &= \dfrac{a}{b} = \dfrac{a+b}{a} \text{ for } a>b\\[8pt] &= \dfrac{1+\sqrt{5}}{2} \approx 1.618\ldots \end{aligned}

This number is tied to the Fibonacci sequence and the Lucas numbers and seems to crop up a lot in nature (although how much it crops up is disputed). It turns out that the various line segments present in the pentagram are in golden ratio with one another.

In the image above, the ratio of red:green = green:blue = blue:black is the golden ratio. The reason for this is not immediately obvious and requires a bit of digging, but the proof is fairly straightforward and boils down to a simple statement.

First, let’s consider the pentagon at the center of the pentagram. What is the angle at each corner of a pentagon? There’s a clever way to deduce this. It’s not quite clear what the interior angle is (that is, the angle on the inside of the shape at an individual corner), but it’s quite easy to get the exterior angle.

The exterior angle of the pentagon (which is the angle of the base of the triangles that form the points of the pentagram) is equal to 1/5 of a complete revolution around the circle, or 72°. For the moment, let’s call this angle 2θ. To get the angle that forms the points of the pentagram, we need to invoke the fact that the sum of all angles in a triangle must equal 180°. Thus, the angle at the top is 180° – 72° – 72° = 36°. This angle I will call θ. While I’m at it, I’m going to label the sides of the triangle x and s (the blue and black line segments from earlier, respectively).

We’re nearly there! We just have one more angle to determine, and that’s the first angle I mentioned – the interior angle of the pentagon. Well, we know that the interior angle added to the exterior angle must be 180°, since the angles both lie on a straight line, so the interior angle is 180° − 72° = 108° = 3θ. Combining the pentagon and the triangle, we obtain the following picture.

Now you can probably tell why I labeled the angles the way I did; they are all multiples of 36°. What we want to show is that the ratio x/s is the golden ratio. Looking at the obtuse triangle inside the pentagon, the ratio of x/s is equal to the long side of the triangle divided by one of the short sides. This smaller triangle (with sides ss, and x) is similar to the larger obtuse triangle extending to the point at the upper right (with sides x, x, and x + s), which we can tell by seeing that the three angles in each triangle are equal. That means the ratio of sides in the larger obtuse triangle will equal the ratio of sides on the smaller obtuse triangle. Taking these ratios and setting them equal gives us

\dfrac{x}{s} = \dfrac{x+s}{x}.

But this is just the same equation we used at the top to define the golden ratio with the replacements of a with x and b with s. Therefore, x/s is indeed the golden ratio! Huzzah!

The reason the pentagram and pentagon are so closely tied to the golden ratio has to do with the fact that the angles they contain are multiples of the same angle, 36°, or one-tenth of a full rotation of the circle. Additionally, since the regular dodecahedron (d12) and regular icosahedron (d20) contain pentagons, the golden ratio is abound in them as well.

As a fun bonus fact, the two isosceles triangles are known as the golden triangle (all acute angles) and the golden gnomon (obtuse triangle), and are the two unique isosceles triangles whose sides are in golden ratio with one another.

So the next time you see the star on a Christmas tree, the rank of a military officer, or the geocentric orbit of Venus, think of the number that lurks within those five-pointed shapes.

-A

Edit (16-Feb-2021): I’ve updated the logic at the end to be purely geometric. The original algebraic argument is pasted below.

By invoking the law of sines on the two isosceles triangles in the image above, we can show that

\dfrac{x}{s} = \dfrac{\sin 2\theta}{\sin\theta} = \dfrac{\sin 3\theta}{\sin\theta}

This equation just simplifies to sin 2θ = sin 3θ. With some useful trigonometric identities, we get a quadratic equation which we can solve for cos θ.

4\cos^2\theta - 2\cos\theta - 1 =0

Solving this equation with the quadratic formula yields

2\cos\theta = \dfrac{\sin 2\theta}{\sin\theta} = \phi,

which, when taken together with the equation for x/s, shows that x/s is indeed the golden ratio! Huzzah!

Equation of the Day #9: The Uncertainty Principle

The Uncertainty Principle is one of the trickiest concepts for people learning quantum physics to wrap their heads around. In words, the Uncertainty Principle says “you cannot simultaneously measure the position and the momentum of a particle to arbitrary precision.” In equation form, it looks like this:

\Delta x \Delta p \ge \dfrac{\hbar}{2},

where \Delta x is the uncertainty of a measurement of a particle’s position, \Delta p is the uncertainty associated with its measured momentum, and ħ is the reduced Planck constant. What this equation says is that the product of these two uncertainties has to be greater than some constant. This has nothing to do with the tools with which we measure particles; this is a fundamental statement about the way our universe behaves. Fortunately, this uncertainty product is very small, since

\hbar \approx 0.000000000000000000000000000000000105457 \text{ J s}.

The real question to ask is, “Why do particles have this uncertainty associated with them in the first place? Where does it come from?” Interestingly, it comes from wave theory.

Take the two waves above. The one on top is very localized, meaning its position is well-defined. But what is its wavelength? For photons and other quantum objects, wavelength (λ) determines momentum,

p = \dfrac{h}{\lambda},

so here we see a localized wave doesn’t really have a well-defined wavelength, and thus an ill-defined momentum. In fact, the wavelength of this pulse is smeared over a continuous spectrum of momenta (much like how the “color” of white light is smeared over the colors of the rainbow). The second wave has a pretty well-defined wavelength, but where is it? It’s not really localized, so you could say it lies smeared over a set of points, but it isn’t really in one place. This is the heart of the uncertainty principle.

So why does this apply to particles? After all, particles aren’t waves. However, at the quantum level, objects no longer fit into either category. So particles have wavelike properties and waves have particle-like properties. In fact, from here on I will refer to “particles” as “quantum objects” to rid ourselves of this confusing nomenclature. So, because waves exhibit this phenomenon – and quantum objects have wavelike properties – quantum objects also have an uncertainty principle associated with them.

However, this is arguably not the most bizarre thing about the uncertainty principle. There is another facet of the uncertainty principle that says that the shorter the lifetime of a quantum object (how long the object exists before it decays), the less you can know about its energy. Since mass and energy are equivalent via Einstein’s E = mc2, this means that objects that exist for very short times don’t have a well-defined mass. It also means that, if you pulse a laser over a short enough time, the light that comes out will not have a well-defined energy, which means that it will have a spread of colors (our eyes can’t see this spread, of course, but it means a big deal when you want to use very precise wavelengths of light in your experiment and short pulses at the same time). In my graduate research, we used this so-called “energy-time” uncertainty to determine whether certain configurations of the hydrogen molecule, H2, are long-lived or short lived; the longer-lived states exhibit sharper spectral lines, indicating a more well-defined energy, and the short-lived states exhibit wider spectral lines, a less defined energy.

So while we can’t simultaneously measure the position and momentum of an object to arbitrary certainty, we can definitely still use it to glean information about the world of the very, very small.

-A