Equation of the Day #16: Spacetime

Special relativity is over a century old, and yet due to its odd nature it still confuses people. This is not necessarily our fault – we don’t really experience the effects of special relativity on human scales, since most of us can’t claim to have ever traveled or seen someone travel appreciably near the speed of light. Light speed is just incredibly fast compared to us, traveling one foot in about 1/1,000,000,000th of a second, or about a billion feet per second. Walking speed is about 5 feet per second, so it’s hard for us to even think of traveling anywhere close to light speed; indeed, most of us cannot even fathom what a billion feet looks like. For reference, the moon is, on average, 1.26 billion feet from Earth, so the Earth-Moon distance is a decent gauge of a billion feet. This means that light takes about 1.26 seconds to travel to the Moon from Earth.

Because special relativity is so inaccessible, the other mind-bending aspects of the model seem like they’re out of a fantasy setting. Discoveries made in the 19th century revealed that the speed of light in a vacuum is measured to be the same value by any observer moving at constant velocity. As long as a measurement device is traveling at a constant speed in a fixed direction, it will detect light traveling at 299,792,458 meters per second in a vacuum. Always. So if I’m driving down the highway at 60 mph (about 100 kilometers per hour), I don’t measure the light leaving my headlights as going 60 mph slower (about 299,792,430 meters per second); I still measure 299,792,458 meters per second! This is one of the two postulates of special relativity.

  • The laws of physics are identical for all non-accelerating observers/measurement devices.
  • The speed of light in a vacuum is the same for all observers/measurement devices, regardless of the motion of the light source.

This, understandably, perplexed scientists for years, as it throws our concepts of relative velocities out the window. After all, in everyday life, if you’re traveling on the highway at 70 mph, and the car next to you is traveling at 75 mph, you see the other car as moving 5 mph relative to your seat in your car.

Let’s see if we can reconcile these two seemingly contradicting ideas and start from the assumption that the speed of light doesn’t differ in moving frames of reference, provided those reference frames are moving at constant velocity. If this assumption is false, an experiment will be able to disprove it. The thing that doesn’t change between reference frames is a speed, which by definition is a distance traveled over some interval of time. Perhaps what can change, then, are the distance traveled and the time interval over which that distance is traveled, but they do so in such a way that the overall speed, the speed of light, is invariant. This would be a way for our assumption to stay true. Can we figure out the degree to which the time and/or distance would be altered to preserve the invariant speed of light?

Let’s apply the idea of a light clock. A light clock is a set of two mirrors set a fixed distance apart between which light bounces back and forth. Each time the light hits one of the mirrors, a tick is registered on the clock. If we set the the two mirrors to be exactly 299,792,458 meters apart, then each tick will be one second. if we set the two mirrors to be 29.9792458 cm apart (about 11.8 inches), the ticks would occur every nanosecond. This clock would be easier to construct, so let’s make a clock that ticks every nanosecond; therefore, after a billion ticks, the clock’s second reading would advance by one.

In the image above, the light clock will advance each time light reflects off of mirror A or mirror B. The length L is the distance between the mirrors, which we can take to be whatever length is needed for the desired ticking interval. At rest, the clock operates normally, which should come as no surprise. The time between ticks, then, is the distance traveled from one mirror to the next divided by the speed of the light, c, or

\Delta t = L/c.

The reason for the symbol c for the speed of light is that it comes from the Latin word celer, which means fast/swift (like in the word accelerate). But what happens if the clock starts moving at a constant velocity? We’ll call the clock’s velocity v. In this moving state, the light’s motion looks different; it’s a diagonal line.

Now the light has traveled a distance that’s longer than the the distance between the two mirrors! Since the speed of light is invariant, that means the clock ticked at a different rate than when it was at rest! Fortunately, we can use trigonometry to find this distance, since the distance D, the mirror length L, and the travel distance (the bottom line segment) all form a right triangle. The travel distance is simply the speed of the mirrors, v, multiplied by the elapsed time between ticks. We’ll call this time between ticks Δt′ to distinguish it from the rest time (since we anticipate something different). The time between ticks for this clock will be

\Delta t' = D/c = \sqrt{L^2+v^2 (\Delta t')^2}/c.

You may have noticed that this equation is a bit self-referential. If we rearrange it so that all of the Δt′ terms are isolated, we end up with the relationship

\Delta t' = \dfrac{L/c}{\sqrt{1-v^2/c^2}} = \gamma \Delta t,

where I substituted in the rest frame ticking interval, and defined the Lorentz factor,

\gamma = \dfrac{1}{\sqrt{1-v^2/c^2}}.

The Lorentz factor is the quantity that tells you you’re working with special relativity. For small values of v/c, γ is approximately 1. That means for speeds that are really small compared to light speed, moving clocks don’t appear to have measurable discrepancy between their ticking rates and the ticking rate of a clock at rest. However, when v is a significant fraction of c, the Lorentz factor γ increases, becoming larger than 1 and approaching infinity as v approaches c.

What does this mean for our moving clock? It means that the time between ticks is longer for the moving clock, making the moving clock appear to run slow. This is the phenomenon of time dilation; moving clocks tick slower. Note that this has nothing to do with the clock being faulty in some way; this is just a consequence of the speed of light being the same for all observers.

Now, because the time interval between events (in this case, ticks of a clock) changes due to the invariance of the speed of light, there must be some sort of trade-off involving distance, since

\text{speed} = \dfrac{\text{distance traveled}}{\text{travel time}}.

If moving clocks run slower, that means the travel time is shorter, so for the speed of light to be unchanging for moving observers, the distance traveled of a moving object must change to keep the speed of light invariant. Think about a light clock oriented on its side, so that the travel direction is along the long side of the clock.

The speed of light is unchanged, so if the clock ticks at a slower rate, the light must traverse a shorter distance, and the factor by which the distance is shortened should be the same factor by which the travel time is dilated:

L' = \dfrac{L}{\gamma}.

This phenomenon is called length contraction. Note that the Lorentz factor is in the denominator, which ensures our distance has gotten smaller. If we wanted to, we could play similar games with the light clock to derive this result, though we would need to factor two ticks of the clock, since the sideways light clock has two time intervals associated with its ticks: a longer “forward tick” and a shorter “backward tick.” Nonetheless, the result is the same.

There are two special quantities that arise from time dilation and length contraction known as the proper time and proper length. The proper time is the time measured by a clock in its own rest frame, and the proper time interval is the ticking rate of a clock in its own rest frame. Everyone agrees on the proper time, because everyone should agree what time is measured by an object in its own rest frame. Similarly, proper length is the length of an object in its own rest frame, which all observers must also agree on. These are two more quantities that can be considered invariant, or unchanging with reference frame, in addition to the speed of light. In fact, quite a bit of special relativity consists of finding invariant quantities since they are so useful in connecting reference frames.

This give and take between space and time is what led to the idea that time and space are linked together such that they are one entity: spacetime. The thing that connects space and time together is the invariant speed of light. In a future entry, I’ll talk about visualizing spacetime and how time dilation and length contraction warp how different observers/detectors view events.

Equation of the Day #15: Rays and Triangles

The way that lenses and mirrors play with light is well described by a branch of physics known as geometric optics or ray optics, and while the equations governing electromagnetic waves (i.e. light) involve an understanding of vector calculus, the principles of ray optics are completely accessible with a knowledge of triangles and no calculus. There are a few basic ideas to cover first, though.

The first two ideas to pin down are the Law of Reflection and Snell’s Law of Refraction. The Law of Reflection states that a ray of light (or a wave) will reflect off of an interface at an angle equal to the angle of approach. Graphically, the law of reflection looks like this:

with θi = θr. The dashed line is known as the “normal line,” where normal means that it is perpendicular to the interface the ray is reflecting off of; it also intersects the point of contact of the ray and the interface. In reality, all reflections by light on all surfaces obey the law of reflection; however, many surfaces are rough at microscopic scales, so when light reflects off of those surfaces you don’t see your reflection. (For instance, white paper reflects most of the light that hits it, but you can’t see your reflection because the paper’s rough surface reflects light randomly.)

Snell’s Law of Refraction says that when light transfers from one medium to another (e.g. air to water), its path is bent according to the rule

n_1 \sin\theta_1 = n_2\sin\theta_2.

This deflection is caused by the fact that light takes the path of least time. So, in a uniform medium (such as air near Earth’s surface), light travels in straight lines. If the speed of light changes, however, this straight line path can be bent. The best analogy I’ve seen to explain this phenomenon is the following: consider a lawnmower moving from a sidewalk to tall grass.

When the first wheel hits the grass, it starts to move more slowly. This causes the lawnmower to start to turn until the next wheel enters the grass, at which point the deflection is complete. Since light is a wave, a similar effect occurs. The reverse is also true (the lawnmower on the right). On the macroscopic level, this appears as light rays deflecting.

Snell’s Law and Law of Reflection in one image. Source: Wikimedia Commons.

These two laws form the basis of how curved mirrors and lenses work. Going forward, I think it’s a bit easier to work with focusing lenses, and then generalize the findings to mirrors.

Let’s consider parallel light rays entering a lens whose surfaces are spherically shaped. What do the outgoing rays look like according to Snell’s Law? Well, one of the special properties of spheres (and circles) is that all lines originating from the center of the sphere intersect the surface at right angles. So, the normal lines emanate from the center of the sphere. Let’s consider a convex lens, or a lens that bulges outward.

As the drawing above shows, the normal lines appear to converge on a single point within the lens. Since air has an index of refraction of nearly 1 (about 1.000293) and glass has an index of refraction around 1.5, the red light rays will bend toward the dashed normal lines upon entering the glass. If the other side of the lens is also convex, the light rays will converge even more, since leaving the glass will cause the light rays to deflect away from the normal lines.

We therefore call convex lenses converging lenses, since light rays entering the lens will be focused. Without getting too detailed about how Snell’s Law works at all points of the lens, we can make some sweeping statements about how curved lenses focus light if we make just a couple assumptions.

  • Light rays enter the lens nearly face-on, so angles of incidence are small.
  • The lens is spherical and thin.

These two assumptions lead to the three rules of ray tracing.

  1. Rays entering the lens parallel to the axis of symmetry of the lens converge on the focal point of the lens.
  2. Rays that enter the lens from the focal point emerge parallel to the lens’ axis of symmetry. (This is the reverse process of rule 1.)
  3. Rays that go through the direct center of the lens do not get deflected.

In fact, we will see that the first two rules lead to the third. Graphically, these ray tracings look like this:

Rays 1, 2, and 3 are represented by the yellow, magenta, and cyan rays in the image, respectively. The horizontal dashed line is called the optical axis and is the axis of rotational symmetry of the lens. The vertical dashed line represents the lens plane. The black arrow is the object being imaged by the lens, and the gray arrow on the right is the image of the black arrow focused by the lens. Finally, the quantity denoted as f is the focal length, which is the distance between the lens plane and the focal point. The question we want to answer is, “How do we predict where the image forms?” Looking at the above image, we can compare four triangles formed by the rays.

This is the previous image with more stuff going on, so let’s go through what’s new. The height of the object is h, and the height of the image is h′ (read “h prime”). The distance of the object from the lens plane is l, and the distance of the image from the lens plane is l′. In short, primed distances correspond to the image, while unprimed quantities correspond to the object. There are four triangles that are color coded to help in the following discussion. Since the two yellow triangles share two angles (one right angle and a pair of vertical angles), they are similar triangles. Therefore, the ratios of the short leg to the long leg for each triangle are equal, which can be rearranged as

-\dfrac{h'}{h} = \dfrac{f}{l-f}.

The negative sign is due to the fact that I’m considering h′, as drawn in the figure, to be a negative number, since it is below the axis of symmetry. By a similar argument, the purple triangles are also similar, and the ratio equality rearranges to be

-\dfrac{h'}{h} = \dfrac{l'-f}{f}.

Substituting the first expression into the second and doing a little more rearranging, we arrive at what’s known as the thin lens equation,

\dfrac{1}{f} = \dfrac{1}{l} + \dfrac{1}{l'}.

Simply put, if I know the focal length of the lens and the placement of my object, I can predict where the image will form using the reciprocal lengths shown above. This happens to work for all lenses, converging and diverging, and even for spherical mirrors, though the image is focused by reflection rather than transmission.

Now, this derivation only used rays 1 and 2 from the ray tracing rules above. To show that ray 3 is valid, I’m going to look at the tangent of the two large triangles in the image which contain the angles θ and θ′. Looking at tan θ,

\tan\theta = \dfrac{h}{l} = -\dfrac{h'f}{l(l'-f)} = -\dfrac{h'f}{fl'} = -\dfrac{h'}{l'} = \tan\theta'.

Note that I used the thin lens equation and the geometric formulas above to substitute various quantities for each other, but the end result that we find is that θ and θ′ are the same! This means that the line connecting those two angles, i.e. the central ray, is undeflected through the lens!

But what happens if the object is placed within the focal length of the lens? Well, that means that 1/l is larger than 1/f, and the image distance 1/l′ is negative! The physical interpretation of a negative image distance is that the image forms on the same side of the lens that the object is sitting on. When this happens, you can’t touch the image or project it onto a wall; it lives inside the lens, similarly to how your reflection in a flat mirror lives on the other side of the mirror. You can’t touch your reflection, since you would have to go through the mirror to do so; similarly, an image with negative image distance can’t be touched since there’s a lens in the way. We call these virtual images, since you cannot touch them or project them onto a wall or screen. A virtual image always has to be viewed through the lens. An example of a virtual image is the image you see when you use a magnifying glass to zoom in on an object.

All these triangles came from working with converging, or focusing, lenses, but what if I have a diverging, or defocusing, lens? It turns out we get several triangles like before, but with one caveat: the focal length is treated as negative. This should make some intuitive sense: a positive focal length positively focuses light, and a negative focal length negatively focuses, or defocuses, light.

Instead of parallel light rays converging on the focal point, they leave the lens appearing to originate from the focal point. When all is said and done, the thin lens equation emerges, exactly as before, so long as you keep track of negative quantities. What’s interesting about diverging lenses is that the image they form is always virtual, no matter where you place the object. Mathematically, this is due to l always being positive and f always being negative. So, you can never project an image with a diverging lens alone.

An everyday application of this equation is in prescription eyeglasses. If you happen to be nearsighted, your eyeglass prescription is for diverging lenses; nearsighted eyes have too powerful a lens (too short a focal length), so the lenses used to correct them have to weaken the focusing power of the eye. Conversely, farsightedness is treated with converging lenses; such eyes need a boost in focusing power.

Finally, I want to leave you with a curious coincidence. If you have a curved mirror that is carved from a large sphere, you can construct ray diagrams similarly to how we constructed the lens ray diagrams. The only difference is that where the rays end up is determined by the Law of Reflection instead of Snell’s Law. If you assume that rays hit the mirror nearly face-on, the rules for ray tracing for curved mirrors are

  1. Rays incident on the mirror parallel to its axis of symmetry converge on the focal point of the mirror, which lies at half the mirror’s radius of curvature.
  2. Rays incident on the mirror from the focal point emerge parallel to the axis of symmetry. (This is the reverse process of rule 1.)
  3. Rays that go through center of curvature of the mirror reflect back the way they entered.

Visually, these rays look like this:

If you play the similar triangles game that we played with lenses, you end up getting the mirror equation,

\dfrac{1}{f} = \dfrac{1}{l} + \dfrac{1}{l'},

which is identical to the thin lens equation! This allows telescopes, for instance, to use mirrors instead of lenses as the focusing elements to form images. To mathematically predict where diverging mirrors form images, simply use negative focal lengths, just like with lenses.

I hope I’ve laid out a nice foundation here, from which one could build the tools to understand cameras, film, telescopes, microscopes, and many other tools we have used not only to explore the vast reaches and smallest microcosms of the universe, but also the basis for the visual entertainment industry to record its productions. May it help you appreciate that small little lens on the back of your phone.


Equation of the Day #14: The Harmonic Oscillator

The career of a young theoretical physicist consists of treating the harmonic oscillator in ever-increasing levels of abstraction.”
Sidney Coleman

One of the first physical systems students get to play with is the harmonic oscillator. An oscillator is a system with some property that varies repeatedly around a central value. This variation can be in displacement (such as a mass on a spring), voltage (like the electricity that comes out of the wall), or field strength (like the oscillations that make up light). When this variation has a fixed period, we call the motion harmonic oscillation, and the simplest harmonic oscillation is known as — you may have guessed — simple harmonic oscillation.

A simple harmonic oscillator is a system under the influence of a force proportional to the displacement of the system from equilibrium. Think of a mass on a spring; the further I pull the mass from the spring’s natural length, the more the spring pulls back, and if I push the mass so that the spring compresses, the spring will push against me. Mathematically, this is known as Hooke’s Law and can be written as

F=-k\Delta x

where F is the net force applied to the system, Δx is the displacement of the system from equilibrium, and k is some proportionality constant (often called the “spring constant”) which tells us how strongly the spring pushes or pulls given some displacement – a larger k indicates a stronger spring. If I let go of the mass when it is displaced from equilibrium, the mass will undergo oscillatory motion. What makes this “simple” is that we’re ignoring the effects of damping or driving the oscillation with an outside force.

How do we know that such a restoring force causes oscillatory motion? Utilizing Newton’s second law,

\begin{aligned} F &= ma\\ &= -kx\\ \Rightarrow \dfrac{d^2x}{dt^2} &= -\dfrac{k}{m} x \end{aligned}

The solution to this equation is sinusoidal,

x(t) = A\cos(\omega t +\phi),

where A is the amplitude of oscillation (the farthest the mass gets from equilibrium),

\omega \equiv \sqrt{\dfrac{k}{m}}

is the angular frequency of oscillation, and ϕ is the phase, which captures the initial position and velocity of the mass at time t = 0. The period is related to the angular frequency by


For this reason, harmonic oscillators are useful timekeepers, since they oscillate at regular, predictable intervals. This is why pendulums, coiled springs, and currents going through quartz crystals have been used as clocks. What other physical systems does this situation apply to? Well, if you like music, simple harmonic oscillation is what air undergoes when you play a wind, string, or membrane instrument. What you’re doing when you play an instrument (or sing) is forcing air, string(s), or electric charge (for electronic instruments) out of equilibrium. This causes the air, string(s), and voltage/current to oscillate, which creates a tone. Patch a bunch of these tones together in the form of chords, melodies, and harmonies, and you’ve created music. A simpler situation is blowing over a soda/pop bottle. When you blow air over the mouth of the bottle, you create an equilibrium pressure for the air above the mouth of the bottle. Air that is slightly off of this equilibrium will oscillate in and out of the bottle, producing a pure tone.

Image: Wikipedia

Now for the fun part: what happens when we enter the quantum realm? Quantum mechanics says that the energy of a bound system is quantized, and an ideal harmonic oscillator is always bound. The total energy of a harmonic oscillator is given by

\begin{aligned} E &= \dfrac{1}{2} mv^2 + \dfrac{1}{2} kx^2\\  &= \dfrac{1}{2m} \left(p^2 + (m\omega x)^2\right), \end{aligned}

where the first term is the kinetic energy, or energy of motion, and the second term is the potential energy, or energy due to location. I used the facts that p = mv and k = 2 to go from the first line to the second line. The quantum prescription says that p and x become mathematical operators, and the energy takes a role in the Schrödinger equation. For the harmonic oscillator, solving the Schrödinger equation yields the differential equation

\begin{aligned} \dfrac{\hbar}{m\omega}\dfrac{d^2\psi}{dx^2} + \left(\dfrac{2E}{\hbar\omega} - \dfrac{m\omega}{\hbar}\, x^2 \right) \psi(x) = 0 \end{aligned}

where ħ is the (reduced) Planck constant, and ψ is the quantum mechanical wave function. After solving this differential equation, the allowed energies turn out to be

E_n = \hbar\omega \left(n+\dfrac{1}{2}\right)

where n = 0, 1, 2, . . . is a nonnegative integer. Unlike the classical picture, the quantum states of the harmonic oscillator with definite energy are stationary and spread out over space, with higher energy states spread out more than lower energy states. There is a way, though, to produce an oscillating state of the quantum harmonic oscillator by preparing a superposition of pure energy states, forming what’s known as a coherent state, which actually does behave like the classical mass on a spring. It’s a weird instance of classical behavior in the quantum realm!

Classical simple harmonic oscillators compared to quantum wave functions of the simple harmonic oscillator.
Image: Wikipedia

An example of a quantum harmonic oscillator is a molecule formed by a pair of atoms. The bond between the two atoms gives rise to a roughly harmonic potential, which results in molecular vibrational states, like two quantum balls attached by a spring. Depending on the mass of the atoms and the strength of the bond, the molecule will vibrate at a specific frequency, and this frequency tells physicists and chemists about the bond-lengths of molecules and what those bonds are made up of. In fact, the quantum mechanical harmonic oscillator is a major topic of interest because the potential energy between quantum objects can often be approximated as a Hooke’s Law potential near equilibrium, even if the actual forces at play are more complex at larger separations.

Additionally, the energy structure of the harmonic oscillator predicts that energies are equally spaced by the amount ħω. This is a remarkable feature of the quantum harmonic oscillator, and it allows us to make a toy model for quantum object creation and annihilation. If we take the energy unit ħω as equal to the rest energy of a quantum object by Einstein’s E = mc2, we can think of the quantum number n as being the number of quantum objects in a system. This idea is one of the basic results of quantum field theory, which treats quantum objects as excitations of quantum fields that stretch over all of space and time. This is what the opening quote is referring to; physicists start off learning the simple harmonic oscillator as classical masses on springs or pendulums oscillating at small angles, then they upgrade to the quantum treatment and learn about its regular energy structure, and then to upgrade to the quantum field treatment where the energies are treated as a number of quantum objects arising from a omnipresent quantum field. I find it to be one of the most beautiful aspects of Nature that such a simple system recurs at multiple levels of our physical understanding of reality.


Equation of the Day #13: Chaos

Chaos is complexity that arises from simplicity. Put in a clearer way, it’s when a deterministic process leads to complex results that seem unpredictable. The difference between chaos and randomness is that chaos is determined by a set of rules/equations, while randomness is not deterministic. Everyday applications of chaos include weather, the stock market, and cryptography. Chaos is why everyone (including identical twins who have the same DNA) has different fingerprints. And it’s beautiful.

How does simplicity lead to complexity? Let’s take, for instance, the physical situation of a pendulum. The equation that describes the motion of a pendulum is

\dfrac{d^2\theta}{dt^2} = -\dfrac{g}{l} \sin\theta

where θ is the angle the pendulum makes with the imaginary line perpendicular to the ground, l is the length of the pendulum, and g is the acceleration due to gravity. This leads to an oscillatory motion; for small angles, the solution of this equation can be approximated as

\theta(t) \approx A\cos\left( \sqrt{\dfrac{g}{l}} t\right)

where A is the amplitude of the swing (in radians). Very predictable. But what happens when we make a double pendulum, where we attach a pendulum to the bottom of the first pendulum?

Can you predict whether the bottom pendulum will flip over the top? (Credit: Wikimedia Commons)

It’s very hard to predict when the outer pendulum flips over the inner pendulum mass; however, the process is entirely determined by a set of equations governed by the laws of physics. And, depending on the initial angles of the two pendula, the motion will look completely different. This is how complexity derives from simplicity.

Another example of beautiful chaos is fractals. Fractals are structures that exhibit self-similarity, are determined by a simple set of rules, and have infinite complexity. An example of a fractal is the Sierpinski triangle.

(Image: Wikipedia)

The rule is simple: start with a triangle, then divide that triangle into four equal triangles. Remove the middle one. Repeat with the new solid triangles you produced. The true fractal is the limit when the number of iterations reaches infinity. Self-similarity happens as you zoom into any corner of the triangle; each corner is a smaller version of the whole (since the iterations continue infinitely). Fractals crop up everywhere, from the shapes of coastlines to plants to frost crystal formation. Basically, they’re everywhere, and they’re often very cool and beautiful.

Chaos is also used in practical applications, such as encryption. Since chaos is hard to predict unless you know the exact initial conditions of the chaotic process, a chaotic encryption scheme can be told to everyone. One example of a chaotic map to disguise data is the cat map. Each iteration is a simple matrix transformation of the pixels of an image. It’s completely deterministic, but it jumbles the image to make it look like garbage. In practice, this map is periodic, so as long as you apply the map repeatedly, you will eventually get the original image back. Another application of chaos is psuedorandom number generators (PRNGs), where a hard-to-predict initial value is manipulated chaotically to generate a “random” number. If you can manipulate the initial input values, you can predict the outcome of the PRNG. In the case of the Pokémon games, the PRNGs have been examined so thoroughly that, using a couple programs, you can capture or breed shininess/perfect stats.

So that’s the beauty of chaos. Next time you look at a bare tree toward the end of autumn or lightning in a thunderstorm, just remember that the seemingly unpredictable branches and forks are created by simple rules of nature, and bask in its complex beauty.


Equation of the Day #12: 12

A while ago I stumbled across this image (which I recreated and cleaned up a bit). It’s a beautiful image. Arranged around the edge is the circle of fifths, which in music is a geometric representation of the twelve tones of the Western scale arranged so the next note is seven semitones up (going clockwise in this figure). The notes are all connected in six different ways to the other notes in the “circle,” known as intervals, which are color-coded at the bottom. I thought, “Wow, this is a really cool way to represent this geometrically. How neat!” However, I found the original website that the image came from, and it’s a pseudoscience site that talks about the fractal holographic nature of the universe. While fractals do show up in Nature a lot, and there are legitimate theories1 proposing that the Universe may indeed be a hologram, what their site is proposing is, to put it lightly, utter nonsense. But instead of tearing their website apart (which would be rather cathartic), I instead want to point out the cool math going on here, because that sounds more fun!

Looking at the bottom of the graphic, you’ll notice six figures. The first (in red) is a regular dodecagon, a polygon with twelve equal sides and angles. This shape is what forms the circle of fifths. The rest of the shapes in the sequence are dodecagrams, or twelve-pointed stars. The first three are stars made up of simpler regular polygons; the orange star is made up of two hexagons, the yellow is made up of three squares, and the green one is made up of four triangles. The final dodecagram (in purple) can be thought of as made up of six straight-sided digons, or line segments. These shapes point to the fact that twelve is divisible by five unique factors (not including itself): one set of twelve, two sets of six, three sets of four, four sets of three, and six sets of two! You could say that the vertices of the dodecagon finalize the set as twelve sets of one, but they’re not illustrated in this image. So really, this image has less to do with musical intervals and more to do with the number 12, which is a rather special number. It is a superior highly composite number, which makes it a good choice as a number base (a reason why feet are divided into twelve inches, for instance, or why our clocks have twelve hours on their faces).

The final dodecagram in cyan is not made up of any simpler regular polygons because the number 12 is not divisible by five. If you pick a note in the circle of fifths to start on, you’ll notice that the two cyan lines that emanate from it connect to notes that are five places away on the “circle,” hence the connection to the number 5. In fact, it would be far more appropriate to redraw this figure with a clock face.

This new image should shed some more light on what’s really going on. The dodecagrams each indicate a different map from one number to another, modulo 12. The only reason this is connected to music at all is due to the fact that a Western scale has twelve tones in it! If we used a different scale, such as a pentatonic scale (with five tones, as the name would suggest), we’d get a pentagon enclosing a pentagram. Really, this diagram can be used to connect any two elements in a set of twelve. The total number of connecting lines in this diagram, then, is

\begin{pmatrix} 12\\2 \end{pmatrix} = T_{11} = \dfrac{1}{2} (12)(11) = 66

where the notation in parentheses is “n choose 2,” and T_n is a triangular number. This figure is known in math as K_{12}, the complete graph with twelve nodes. And it’s gorgeous.

So while this doesn’t really have anything to do with music or some pseudoscientific argument for some fancy-sounding, but ultimately meaningless, view on the universe, it does exemplify the beauty of the number 12, and has a cool application to the circle of fifths.


1 “Holographic principle.” Wikipedia.

Equation of the Day #11: The Fourier Transform


Today I wanted to talk about one of my favorite equations in all of mathematics. However, I won’t do it justice without building up some framework that puts it into perspective. To start out, let’s talk about waves.

A wave, in general, is any function that obeys the wave equation. To simplify things, though, let’s look at repeating wave patterns.

The image above depicts a sine wave. This is the shape of string and air vibration at a pure frequency; as such, sinusoidal waveforms are also known as “pure tones.” If you want to hear what a pure tone sounds like, YouTube is happy to oblige. But sine waves are not the only shapes that a vibrating string could make. For instance, I could make a repeating pattern of triangles (a triangle wave),

or rectangles (a square wave),

Now, making a string take on these shapes may seem rather difficult, but synthesizing these shapes to be played on speakers is not. In fact, old computers and video game systems had synthesizers that could produce these waveforms, among others. But let’s say you only know how to produce pure tones. How would you go about making a square wave? It seems ridiculous; pure tones are curvy sine waves, and square waves are choppy with sharp corners. And yet a square wave does produce a tone when synthesized, and that tone has a pitch that corresponds to how tightly its pattern repeats — its frequency — just like sine waves.

As it turns out, you can produce a complex waveform by adding only pure tones. This was discovered by Jean-Baptiste Joseph Fourier, an 18th century scientist. What he discovered was that sine waves form a complete basis of functions, or a set of functions that can be used to construct other well-behaved, arbitrary functions. However, these sine waves are special. The frequencies of these sine waves must be harmonics of the lowest frequency sine wave.

Image: Wikipedia

The image above shows a harmonic series of a string with two ends fixed (like those of a guitar or violin). Each frequency is an integer multiple of the lowest frequency (that of the top string, which I will call \nu_1 = 1/T , where \nu is the Greek letter “nu.”), which means that the wavelength of each harmonic is an integer fraction of the longest wavelength. The lowest frequency sine wave, or the fundamental, is given by the frequency of the arbitrary wave that’s being synthesized, and all other sine waves that contribute to the model will have harmonic frequencies of the fundamental. So, the tone of a trumpet playing the note A4 (440 Hz frequency) will be composed of pure tones whose lowest frequency is 440 Hz, with all other pure tones being integer multiples of 440 Hz (880, 1320, 1760, 2200, etc.). As an example, here’s a cool animation showing the pure tones that make up a square wave:

Animation: LucasVB on Wikipedia

As you can see in the animation, these sine waves will not add up equally; typically, instrument tones have louder low frequency contributions than high frequency ones, so the amplitude of each sine wave will be different. How do we determine the strengths of these individual frequencies? This is what Fourier was trying to determine, albeit for a slightly different problem. I mentioned earlier that sine waves form a complete basis of functions to describe any arbitrary function (in this case, periodic waveforms). This means that, when you integrate the product of two sine waves within a harmonic series over the period corresponding to the fundamental frequency (T = 1/\nu_1 ), the integral will be zero unless the two sine waves are the same. More specifically,

\displaystyle \int_{-T/2}^{T/2} \sin \left(\dfrac{m\tau t}{T}\right) \sin \left(\dfrac{n\tau t}{T}\right) dt = \begin{cases} 0, & m\ne n\\ T/2, & m = n \end{cases},

where \tau = C/r is the circle constant. Because of this trick, we can extract the amplitudes of each sine wave contributing to an arbitrary waveform. Calling the arbitrary waveform f(t) and the fundamental frequency 1/T,

\begin{aligned} f(t) &= \displaystyle\sum_{m=1}^{\infty} b_m \sin\left(\dfrac{m\tau t}{T}\right) \\[8pt] \displaystyle \int_{-T/2}^{T/2} f(t)\sin\left(\dfrac{n\tau t}{T}\right)\, dt &= \displaystyle\int_{-T/2}^{T/2}\sum_{m=1}^{\infty} b_m \sin\left(\dfrac{m\tau t}{T}\right) \sin\left(\dfrac{n\tau t}{T}\right)\, dt\\[8pt] &= \displaystyle\sum_{m=1}^{\infty} b_m \int_{-T/2}^{T/2}\sin\left(\dfrac{m\tau t}{T}\right) \sin\left(\dfrac{n\tau t}{T}\right)\, dt \\[8pt] &= \dfrac{T}{2} \, b_n \\[8pt] b_n &= \dfrac{2}{T} \displaystyle \int_{-T/2}^{T/2} f(t)\sin\left(\dfrac{n\tau t}{T}\right)\, dt. \end{aligned}

This is how we extract the amplitudes of each pure tone that makes up the tone we want to synthesize. The trick was subtle, so I’ll describe what happened there line by line. The first line shows that we’re breaking up the arbitrary periodic waveform f(t) into pure tones, a sum over sine waves with frequencies m/T, with m running over the natural numbers. The second line multiplies both sides of line one by a sine wave with frequency n/T, with n being a particular natural number, and integrating over one period of the fundamental frequency, T. It’s important to be clear that we’re only summing over m and not n; m is an index that takes on multiple values, but n is one specific value! The third line is just swapping the order of taking the sum vs. taking the integral, which is allowed since integration is a linear operator. The fourth line is where the magic happens; because we’ve integrated the product of two sine waves, we get a whole bunch of integrals on the right hand side of the equation that are zero, since m and n are different for all terms in the sum except when m = n. This integration trick has effectively selected out one term in the sum, in doing so giving us the formula to calculate the amplitude of a given harmonic in the pure tone sum resulting in f(t).

This formula that I’ve shown here is how synthesizers reproduce instrument sounds without having to record the instrument first. If you know all the amplitudes bn for a given instrument, you can store that information on the synthesizer and produce pure tones that, when combined, sound like that instrument. To be completely general, though, this sequence of pure tones, also known as a Fourier series, also includes cosine waves as well. This allows the function to be displaced by any arbitrary amount, or, to put it another way, accounts for phase shifts in the waveform. In general,

\begin{aligned} f(t) &= \dfrac{a_0}{2} + \displaystyle\sum_{n=1}^{\infty} a_n \cos\left(\dfrac{n\tau t}{T}\right) + b_n \sin\left(\dfrac{n\tau t}{T}\right),\\[8pt] a_n &= \dfrac{2}{T} \displaystyle \int_{-T/2}^{T/2} f(t)\cos\left(\dfrac{n\tau t}{T}\right)\, dt,\\[8pt] b_n &= \dfrac{2}{T} \displaystyle \int_{-T/2}^{T/2} f(t)\sin\left(\dfrac{n\tau t}{T}\right)\, dt, \end{aligned}

or, using Euler’s identity,

\begin{aligned} f(t) &= \displaystyle\sum_{n=-\infty}^{\infty} c_n e^{in\tau t/T},\\[8pt] c_n &= \dfrac{1}{T} \displaystyle \int_{-T/2}^{T/2} f(t)e^{-in\tau t/T}\, dt. \end{aligned}

The collection of these coefficients is known as the waveform’s frequency spectrum. To show this in practice, here’s a waveform I recorded of me playing an A (440 Hz) on my trumpet and its Fourier series amplitudes,

Each bar in the cn graph is a harmonic of 440 Hz, and the amplitudes are on the same scale for the waveform and its frequency spectrum. For a trumpet, all harmonics are present (even if they’re really weak). I admittedly did clean up the Fourier spectrum to get rid of noise around the main peaks to simplify the image a little bit, but know that for real waveforms the Fourier spectrum does have “leakage” outside of the harmonics (though the contribution is much smaller than the main peaks). The first peak is the fundamental, or 440 Hz, followed by an 880 Hz peak, then a 1320 Hz peak, a 1760 Hz peak, and so on. The majority of the spectrum is concentrated in these four harmonics, with the higher harmonics barely contributing. I also made images of the Fourier series of a square wave and a triangle wave for the curious. Note the difference in these spectra from each other and from the trumpet series. The square wave and triangle wave only possess odd harmonics, which is why their spectra look more sparse.

One of the best analogies I’ve seen for the Fourier series is that it is a recipe, and the “meal” that it helps you cook up is the waveform you want to produce. The ingredients are pure tones — sine waves — and the instructions are to do the integrals shown above. More importantly, the Fourier coefficients give us a means to extract the recipe from the meal, something that, in the realm of food, is rather difficult to do, but in signal processing is quite elegant. This is one of the coolest mathematical operations I’ve ever learned about, and I keep revisiting it over and over again because it’s so enticing!

Now, this is all awesome math that has wide applications to many areas of physics and engineering, but it has all been a setup for what I really wanted to showcase. Suppose I have a function that isn’t periodic. I want to produce that function, but I still can only produce pure tones. How do we achieve that goal?

Let’s say we’re trying to produce a square pulse.

One thing we could do is start with a square wave, but make the valleys larger to space out the peaks.

As we do this, the peaks become more isolated, but we still have a repeating waveform, so our Fourier series trick still works. Effectively, we’re lengthening the period T of the waveform without stretching it. Lengthening T causes the fundamental frequency \nu_1 to approach 0, which adds more harmonics to the Fourier series. We don’t want \nu_1 to be zero, though, because then n\nu_1 will always be zero, and our Fourier series will no longer work. What we want is to take the limit as T approaches infinity and look at what happens to our Fourier series equations. To make things a bit less complicated, let’s look at what happens to the cn treatment. Let’s reassign some values,

n/T \to \nu_n,\quad 1/T \to \Delta\nu,\\[8pt] \begin{aligned} \Rightarrow f(t) &= \displaystyle\sum_{n=-\infty}^{\infty} c_n e^{i\tau\nu_n t},\\[8pt] c_n &= \Delta \nu \displaystyle \int_{-T/2}^{T/2} f(t)e^{-i\tau\nu_n t}\, dt. \end{aligned}

Here, \nu_n are the harmonic frequencies in our Fourier series, and \Delta \nu is the spacing between harmonics, which is equal for the whole series. Substituting the integral definition of cn into the sum for f(t) yields

\begin{aligned} f(t) &= \displaystyle\sum_{n=-\infty}^{\infty} e^{i\tau\nu_n t}\Delta \nu \displaystyle \int_{-T/2}^{T/2} f(t')e^{-i\tau\nu_n t'}\, dt'\\[8pt] f(t) &= \displaystyle\sum_{n=-\infty}^{\infty} F(\nu_n) e^{i\tau\nu_n t}\Delta \nu,\\[8pt] \end{aligned}


F(\nu_n) = \dfrac{c_n}{\Delta\nu} = \displaystyle\int_{-T/2}^{T/2} f(t')e^{-i\tau\nu_n t'}\, dt'.

The reason for the t' variable is to distinguish the dummy integration variable from the time variable in f(t). Now all that’s left to do is take the limit of the two expressions as T goes to infinity. In this limit, the \nu_n smear into a continuum of frequencies rather than a discrete set of harmonics, the sum over frequencies becomes an integral, and \Delta\nu becomes an infinitesimal, d\nu . Putting this together, we arrive at the equations

\begin{aligned} F(\nu) &= \displaystyle\int_{-\infty}^{\infty} f(t)e^{-i\tau\nu t}\, dt,\\[8pt] f(t) &= \displaystyle\int_{-\infty}^{\infty} F(\nu)e^{i\tau\nu t}\, d\nu. \end{aligned}

These equations are the Fourier transform and its inverse. The first takes a waveform in the time domain and breaks it down into a continuum of frequencies, and the second returns us to the time domain from the frequency spectrum. Giving the square pulse a width equal to a, a height of unity, and plugging it into the Fourier transform, we find that

\begin{aligned} F(\nu) &= \displaystyle\int_{-\infty}^{\infty} f(t)e^{-i\tau\nu t}\, dt\\[8pt] &= \displaystyle\int_{-a/2}^{a/2} e^{-i\tau\nu t}\, dt\\[8pt] &= \left . -\dfrac{e^{-i\tau\nu t}}{i\tau\nu}\right|_{t=-a/2}^{a/2} = \dfrac{e^{ia\tau\nu/2}-e^{-ia\tau\nu/2}}{i\tau\nu}\\[8pt] F(\nu) &= \dfrac{\sin(a\tau\nu/2)}{\tau\nu/2} = a\, {\rm sinc}(a\tau\nu/2). \end{aligned}

Or, graphically,

This is one of the first Fourier transform pairs that students encounter, since the integral is both doable and relatively straightforward (if you’re comfortable with complex functions). This pair is quite important in signal processing since, if you reverse the domains of each function, the square pulse represents a low pass frequency filter. Thus, you want an electrical component whose output voltage reflects the sinc function on the right. (I swapped them here for the purposes of doing the easier transform first, but the process is perfectly reversible).

Let’s look at the triangular pulse and its Fourier transform,

If you think the frequency domain looks similar to that of the square pulse, you’re on the right track! The frequency spectrum of the triangular pulse is actually the sinc function squared, but the integral is not so straightforward to do.

And now, for probably the most enlightening example, the Gaussian bell-shaped curve,

The Fourier transform of a Gaussian function is itself, albeit with a different width and height. In fact, the Gaussian function is part of a family of functions which have themselves as their Fourier transform. But that’s not the coolest thing here. What is shown above is that a broad Gaussian function has a narrow range of frequencies composing it. The inverse is also true; a narrow Gaussian peak is made up of a broad range of frequencies. This has applications to laser operation, the limit of Internet download speeds, and even instrument tuning, and is also true of the other Fourier transform pairs I’ve shown here. More importantly, though, this relationship is connected to a much deeper aspect of physics. That a localized signal has a broad frequency makeup and vice versa is at the heart of the Uncertainty Principle, which I’ve discussed previously. As I mentioned before, the Uncertainty Principle is, at its core, a consequence of wave physics, so it should be no surprise that it shows up here as well. However, this made the Uncertainty Principle visceral for me; it’s built into the Fourier transform relations! It also turns out that, in the same way that time and frequency are domains related by the Fourier transform, so too are position and momentum:

\begin{aligned} \phi(p) &= \dfrac{1}{\sqrt{\tau\hbar}}\displaystyle\int_{-\infty}^{\infty} \psi(x)e^{-ipx/\hbar}\, dx,\\[8pt] \psi(x) &= \dfrac{1}{\sqrt{\tau\hbar}}\displaystyle\int_{-\infty}^{\infty} \phi(p)e^{ipx/\hbar}\, dp. \end{aligned}

Here, \psi(x) is the spatial wavefunction, and \phi(p) is the momentum-domain wavefunction.

Whew! That was a long one, but I hope I’ve done justice to one of the coolest — and my personal favorite — equations in mathematics.

UPDATE: In the above figures where I graph cn vs. frequency, the label should really be |cn|, since in general cn are complex numbers.


Equation of the Day #10: Golden Pentagrams

Ah, the pentagram, a shape associated with a variety of different ideas, some holy, some less savory. But to me, it’s a golden figure, and not just because of how I chose to render it here. The pentagram has a connection with a number known as the golden ratio, which is defined as

\begin{aligned} \phi &= \dfrac{a}{b} = \dfrac{a+b}{a} \text{ for } a>b\\[8pt] &= \dfrac{1+\sqrt{5}}{2} \approx 1.618\ldots \end{aligned}

This number is tied to the Fibonacci sequence and the Lucas numbers and seems to crop up a lot in nature (although how much it crops up is disputed). It turns out that the various line segments present in the pentagram are in golden ratio with one another.

In the image above, the ratio of red:green = green:blue = blue:black is the golden ratio. The reason for this is not immediately obvious and requires a bit of digging, but the proof is fairly straightforward and boils down to a simple statement.

First, let’s consider the pentagon at the center of the pentagram. What is the angle at each corner of a pentagon? There’s a clever way to deduce this. It’s not quite clear what the interior angle is (that is, the angle on the inside of the shape at an individual corner), but it’s quite easy to get the exterior angle.

The exterior angle of the pentagon (which is the angle of the base of the triangles that form the points of the pentagram) is equal to 1/5 of a complete revolution around the circle, or 72°. For the moment, let’s call this angle 2θ. To get the angle that forms the points of the pentagram, we need to invoke the fact that the sum of all angles in a triangle must equal 180°. Thus, the angle at the top is 180° – 72° – 72° = 36°. This angle I will call θ. While I’m at it, I’m going to label the sides of the triangle x and s (the blue and black line segments from earlier, respectively).

We’re nearly there! We just have one more angle to determine, and that’s the first angle I mentioned – the interior angle of the pentagon. Well, we know that the interior angle added to the exterior angle must be 180°, since the angles both lie on a straight line, so the interior angle is 180° − 72° = 108° = 3θ. Combining the pentagon and the triangle, we obtain the following picture.

Now you can probably tell why I labeled the angles the way I did; they are all multiples of 36°. What we want to show is that the ratio x/s is the golden ratio. Looking at the obtuse triangle inside the pentagon, the ratio of x/s is equal to the long side of the triangle divided by one of the short sides. This smaller triangle (with sides ss, and x) is similar to the larger obtuse triangle extending to the point at the upper right (with sides x, x, and x + s), which we can tell by seeing that the three angles in each triangle are equal. That means the ratio of sides in the larger obtuse triangle will equal the ratio of sides on the smaller obtuse triangle. Taking these ratios and setting them equal gives us

\dfrac{x}{s} = \dfrac{x+s}{x}.

But this is just the same equation we used at the top to define the golden ratio with the replacements of a with x and b with s. Therefore, x/s is indeed the golden ratio! Huzzah!

The reason the pentagram and pentagon are so closely tied to the golden ratio has to do with the fact that the angles they contain are multiples of the same angle, 36°, or one-tenth of a full rotation of the circle. Additionally, since the regular dodecahedron (d12) and regular icosahedron (d20) contain pentagons, the golden ratio is abound in them as well.

As a fun bonus fact, the two isosceles triangles are known as the golden triangle (all acute angles) and the golden gnomon (obtuse triangle), and are the two unique isosceles triangles whose sides are in golden ratio with one another.

So the next time you see the star on a Christmas tree, the rank of a military officer, or the geocentric orbit of Venus, think of the number that lurks within those five-pointed shapes.


Edit (16-Feb-2021): I’ve updated the logic at the end to be purely geometric. The original algebraic argument is pasted below.

By invoking the law of sines on the two isosceles triangles in the image above, we can show that

\dfrac{x}{s} = \dfrac{\sin 2\theta}{\sin\theta} = \dfrac{\sin 3\theta}{\sin\theta}

This equation just simplifies to sin 2θ = sin 3θ. With some useful trigonometric identities, we get a quadratic equation which we can solve for cos θ.

4\cos^2\theta - 2\cos\theta - 1 =0

Solving this equation with the quadratic formula yields

2\cos\theta = \dfrac{\sin 2\theta}{\sin\theta} = \phi,

which, when taken together with the equation for x/s, shows that x/s is indeed the golden ratio! Huzzah!

Equation of the Day #9: The Uncertainty Principle

The Uncertainty Principle is one of the trickiest concepts for people learning quantum physics to wrap their heads around. In words, the Uncertainty Principle says “you cannot simultaneously measure the position and the momentum of a particle to arbitrary precision.” In equation form, it looks like this:

\Delta x \Delta p \ge \dfrac{\hbar}{2},

where \Delta x is the uncertainty of a measurement of a particle’s position, \Delta p is the uncertainty associated with its measured momentum, and ħ is the reduced Planck constant. What this equation says is that the product of these two uncertainties has to be greater than some constant. This has nothing to do with the tools with which we measure particles; this is a fundamental statement about the way our universe behaves. Fortunately, this uncertainty product is very small, since

\hbar \approx 0.000000000000000000000000000000000105457 \text{ J s}.

The real question to ask is, “Why do particles have this uncertainty associated with them in the first place? Where does it come from?” Interestingly, it comes from wave theory.

Take the two waves above. The one on top is very localized, meaning its position is well-defined. But what is its wavelength? For photons and other quantum objects, wavelength (λ) determines momentum,

p = \dfrac{h}{\lambda},

so here we see a localized wave doesn’t really have a well-defined wavelength, and thus an ill-defined momentum. In fact, the wavelength of this pulse is smeared over a continuous spectrum of momenta (much like how the “color” of white light is smeared over the colors of the rainbow). The second wave has a pretty well-defined wavelength, but where is it? It’s not really localized, so you could say it lies smeared over a set of points, but it isn’t really in one place. This is the heart of the uncertainty principle.

So why does this apply to particles? After all, particles aren’t waves. However, at the quantum level, objects no longer fit into either category. So particles have wavelike properties and waves have particle-like properties. In fact, from here on I will refer to “particles” as “quantum objects” to rid ourselves of this confusing nomenclature. So, because waves exhibit this phenomenon – and quantum objects have wavelike properties – quantum objects also have an uncertainty principle associated with them.

However, this is arguably not the most bizarre thing about the uncertainty principle. There is another facet of the uncertainty principle that says that the shorter the lifetime of a quantum object (how long the object exists before it decays), the less you can know about its energy. Since mass and energy are equivalent via Einstein’s E = mc2, this means that objects that exist for very short times don’t have a well-defined mass. It also means that, if you pulse a laser over a short enough time, the light that comes out will not have a well-defined energy, which means that it will have a spread of colors (our eyes can’t see this spread, of course, but it means a big deal when you want to use very precise wavelengths of light in your experiment and short pulses at the same time). In my graduate research, we used this so-called “energy-time” uncertainty to determine whether certain configurations of the hydrogen molecule, H2, are long-lived or short lived; the longer-lived states exhibit sharper spectral lines, indicating a more well-defined energy, and the short-lived states exhibit wider spectral lines, a less defined energy.

So while we can’t simultaneously measure the position and momentum of an object to arbitrary certainty, we can definitely still use it to glean information about the world of the very, very small.


Equation of the Day #8: Absolute Zero and Negative Temperatures

If you’re reading this indoors, the room you are currently sitting in is probably around 20°C, or 68°F (within reasonable error, since different people like their rooms warmer or colder or have no control over the temperature of the room they’re reading this entry in). But what does it mean to be at a certain temperature? Well, we often define temperature as an average of the movement of an ensemble of constituent particles – usually atoms or molecules. For instance, the temperature of a gas in a room is given as a relation to the gas’ rms molecular speed:

v_{\rm rms} = \sqrt{\langle v^2\rangle} = \sqrt{\dfrac{3kT}{m}},

where T is the absolute temperature (e.g. Kelvin scale), m is the mass per particle making up the gas, and k is the Boltzmann constant, and the angular brackets mean “take the average of the enclosed quantity.” For reference, room temperature nitrogen (which makes up 78% of the atmosphere) has an rms speed of half a kilometer (one third of a mile) per second. But this definition is a specific case. In general, we need a more encompassing definition. There is a quantity that arises in thermodynamics known as entropy, which basically quantifies the disorder of a system. It is related to the number of ways to arrange the elements of a system without changing the energy.

For instance, there are a lot of ways of having a messy room. You can have clothes on the floor, you can track mud into it, you can leave dishes and food everywhere. But there are very few ways to have an immaculately clean room, where everything is tidy and put in its proper place. Thus, the messy room has a larger entropy, while the clean room has very low entropy. It is this quantity that helps to define temperature generally. Denoting entropy as S, the more robust definition is

T \equiv \left( \dfrac{\partial E}{\partial S} \right)_V = \left( \dfrac{\partial H}{\partial S} \right)_P,

or, in words, temperature is defined as the change in energy divided by the corresponding change in entropy of something with fixed volume, which is equivalent to the change in enthalpy (heat content) divided by the change in entropy at a fixed pressure. Thus, if you increase the energy of an object and find that it becomes more disordered, the temperature is positive. This is what we are used to. When you heat up air, it becomes more disorderly because the particles making it up are moving faster and more randomly, so it makes sense that the temperature must be positive. If you cool air, the particles making it up slow down and it tends to become more orderly, so the temperature is still positive, but decreasing. What happens when you can’t pull any more energy out of the air? Well, that means that the temperature has gone to zero, and movement has stopped. Since the movement has stopped, the gas must be in a very ordered state, and the entropy isn’t changing. When the speed of the gas particles is zero, we call its temperature absolute zero, when all motion has stopped.

It is impossible to reach absolute zero temperature, but it isn’t intuitive as to why at first. The main reason is due to quantum mechanics. If all atomic motion of an object stopped, its momentum would be known exactly, and this violates the Uncertainty Principle. But there is also another reason. In thermodynamics, there is a quantity related to temperature that is defined as

\beta = \dfrac{1}{kT}.

Since k is just a constant, β can be thought of as inverse temperature. This sends absolute zero to β being infinity! Now, this makes much more sense as to why achieving absolute zero is impossible – it means we have to make a quantity go to infinity! It turns out that β is the more fundamental quantity to deal with in thermodynamics for this reason (among others).

Now, you’re probably thinking, “Well, that’s all well and good, but, are you saying that this means that you can get to infinite temperature?” In actuality, you can, but you need a special system to be able to do it. To get temperature to infinity, you need β to go to zero. How do we do that? Well, once you cross zero, you end up with a negative quantity, so if we could somehow get a negative temperature, then we would have to cross β equals zero. But how do we get a negative temperature, and what would that be like? Well, we would need entropy to decrease when energy is added to our system.

It turns out that an ensemble of magnets in an external magnetic field would do the trick. See, when a compass is placed in a magnetic field, it wants to align with the field (call that direction north). But if I put some energy into the system (i.e. I push the needle), I can get the needle of the compass to point in the opposite direction (south). When less than half of the compasses are pointing opposite the external field, each time I flip a compass needle I’m increasing entropy (since the perfect order of all the compasses pointing north has been tampered with). But once more than half of those compasses are pointing south, I am decreasing the disorder of the system when I flip another magnet south! This means that the temperature must be negative! In practice, the compasses are actually molecules with an electric dipole moment or electrons with a certain spin (which act like magnets), but the same principles apply. So, β equals zero is when exactly half of the compasses are pointing north and the other half are pointing south, and β equals zero is when T is infinite, and it is at this infinity that the sign on T swaps.

Lasers are a more realistic physical system that employs negative temperatures. For lasers to work, atoms or electrons are excited to a higher energy state. When the higher energy state is populated by more than half of the atoms or electrons in the system, a population inversion occurs, which puts the system at a negative temperature in the same way as the compass needles described above.

It’s interesting to note that negative temperatures are actually hotter than any positive temperature, since you have to add energy to get to negative temperature. One could define a quantity as –β, so that plotting it on a line would be a more intuitive way to see that the smaller the quantity, the colder the object is, while preserving the infinities of absolute zero and “absolute hot.”


Equation of the Day #7: E=mc^2 and the Pythagorean Theorem

I doubt I’m the first person to introduce you to either of these equations, but if I am, then you’re in for a treat! The first equation is courtesy of Albert Einstein,


Just bask in that simplicity. The constant c is the speed of light, which is a rather large number. In fact, light takes just over a second to travel to Earth from the moon, while the Apollo missions took three days. The fastest we’ve ever sent anything toward the moon was the New Horizons mission to Pluto, and it passed the moon after a little over eight-and-a-half hours. So, light is pretty fast, and therefore c2 is a gigantic number. The E and m in this equation are energy and mass, respectively. If we ignore the c2, which acts as a conversion rate, E=mc2 says that energy and mass are equivalent, and that things with mass have energy as a result of that mass, regardless of what they’re doing or where they are in the universe. This equation is at the heart of radioactive decay, matter/antimatter annihilation, and the processes occurring at the center of the sun. Now, that all may sound foreign, but it’s at work constantly, and we take advantage of it. For instance, positron emission tomography scans, or PET scans, are used to image the inside of the body using a radioactive substance that emits positrons (the antimatter counterpart to the electron). These positrons annihilate with the electrons in your body and release light, which is then detected by a special camera. This information is then used to reconstruct an image of your insides.

As numerous as the phenomena are that E=mc2 covers, it’s actually not the full story. After all, objects have energy that isn’t a result of their mass; they can have energy of motion (kinetic energy) and energy due to where they are in the universe (potential energy). Ignoring potential energy for the moment, you may be wondering how to include the kinetic energy in our simple equation. As it turns out, special relativity says that the energy contributions obey the Pythagorean theorem,

a^2 + b^2 = h^2,

which relates the three sides of a triangle whose largest angle is 90°.

(I’ve called the longest side h instead of c to avoid confusion with the speed of light.) In our example, the total energy E is the longest side, and the “legs” are the rest energy (mc2) and the energy contribution due to momentum, written as pc.

In equation form,

E^2 = (pc)^2 + (mc^2)^2.

This equation only holds for objects moving at constant velocity, but the geometric relationship between the total energy and its contributions is very simple to grasp when shown as a right triangle. In fact, this isn’t the only place in special relativity where Pythagorean relations pop up! The formulas for time dilation and length contraction, two phenomena that pop up in special relativity, are governed by a similar equation. For instance, time dilation follows the triangle

where tmov is the time elapsed according to a moving clock, trest is the time read by a clock at rest, and s is the distance that the moving clock has covered over the elapsed time. How can two clocks read different times? I’ll save that question for another day.