2025-12-12 23:31:14
The expression
converges to the golden ratio φ. Another way to say this is that the sequence defined by x0 = 1 and
for n > 0 converges to φ. This post will be about how it converges.
I wrote a little script to look at the error in approximating φ by xn and noticed that the error is about three times smaller at each step. Here’s why that observation was correct.
The ratio of the error at one step to the error at the previous step is
If x = φ + ε the expression above becomes
when you expand as a Taylor series in ε centered at 0. This says the error multiplied by a factor of about
at each step. The next term in the Taylor series is approximately −0.03ε, so the exact rate of convergence is a slightly faster at first, but essentially the error is multiplied by 0.309 at each iteration.
The post Golden iteration first appeared on John D. Cook.2025-12-12 02:37:41
When I was a kid, I suppose sometime in my early teens, I was interested in music theory, but I couldn’t play piano. One time I asked a lady who played piano at our church to play a piece of sheet music for me so I could hear how it sounded. The music was in the key of A, but she played it in A♭. She didn’t say she was going to change the key, but I could tell from looking at her hands that she had.

I was shocked by the audacity of changing the music to be what you wanted it to be rather than playing what was on the page. I was in band, and there you certainly don’t decide unilaterally that you’re going to play in a different key!
In retrospect what the pianist was doing makes sense. Hymns are very often in the key of A♭. One reason is it’s a comfortable key for SATB singing. Another is that if many hymns are in the same key, that makes it easy to go from one directly into another. If a traditional hymn is not in A♭, it’s probably in a key with flats, like B♭ or D♭. (Contemporary church music is often in keys with sharps because guitarists like open strings, which leads to keys like A or E.)
The pianist wasn’t a great musician, but she was good enough. Picking her key was a coping mechanism that worked well. Unless someone in the congregation has perfect pitch, you can change a song from the key of D to the key of D♭ and nobody will know.
There’s something to be said for clever coping mechanisms, especially if they’re declared, “You asked for A. Is it OK if I give you B?” It’s better than saying “Sorry, I can’t help you.”
The post Just change the key first appeared on John D. Cook.2025-12-10 23:08:20

Say you have a common 6-sided die and need to roll it until the sum of your rolls is at least 6. How many times would you need to roll?
If you had a 20-sided die and you need to roll for a sum of at least 20, would that take more rolls or fewer rolls on average?
According to [1], the expected number of rolls of an n-sided dice for the sum of the rolls to be n or more equals
So for a 6-sided die, the expected number of rolls is (7/6)5 = 2.1614.
For a 20-sided die, the expected number of rolls is (21/20)19 = 2.5270.
The expected number of rolls is an increasing function of n, and it converges to e.

Here’s a little simulation script for the result above.
from numpy.random import randint
def game(n):
s = 0
i = 0
while s
This produced 2.5273.
[1] Enrique Treviño. Expected Number of Dice Rolls for the Sum to Reach n. American Mathematical Monthly, Vol 127, No. 3 (March 2020), p. 257.
The post Rolling n-sided dice to get at least n first appeared on John D. Cook.
2025-12-10 22:33:53
There are numerous memes floating around with the words “Being weak is nothing to be ashamed of; staying weak is.” Or some variation. I thought about this meme in the context of weak derivatives.

The last couple posts have talked about distributions, also called generalized functions. The delta function, for example, is not actually a function but a generalized function, a linear functional on a space of test functions.
Distribution theory lets you take derivatives of functions that don’t have a derivative in the classical sense. View the function as a regular distribution, take its derivative as a distribution, and if this derivative is a regular distribution, that function is called a weak derivative of the original function.
You can use distribution theory to complete a space of functions analogous to how the real numbers complete the rational numbers.
To show that an equation has a rational solution, you might first show that it has a real solution, then show that the real solution is in fact a rational. To state the strategy more abstractly, to find a solution in a small space, you first look for solutions in a larger space where solutions are easier to find. Then you see whether the solution you found lies in the smaller space.
This is the modern strategy for studying differential equations. You first show that a differential equation has a solution in a weak sense, then if possible prove a regularity result that shows the solution is a classical solution. There’s no shame in finding a weak solution. But from a classical perspective, there’s shame in stopping there.
2025-12-08 23:00:37
The previous post showed how we can take the Fourier transform of functions that don’t have a Fourier transform in the classical sense.
The classical definition of the Fourier transform of a function f requires the integral of |f| over the real line to be finite. This implies f(x) must approach zero as x goes to ∞ and −∞. A constant function won’t do, and yet we got around that in the previous post. Distribution theory even lets you take the Fourier transform of functions that grow as their arguments go off to infinity, as long as they don’t grow too fast, i.e. like a polynomial but not like an exponential.
In this post we want to take the Fourier transform of functions like sine and cosine. If you read that sentence as saying Fourier series, you have the right instinct for classical analysis: you take the Fourier series of periodic functions, not the Fourier transform. But with distribution theory you can take the Fourier transform, unifying Fourier series and Fourier transforms.
For this post I’ll be defining the classical Fourier transform using the convention
and generalizing this definition to distributions as in the previous post.
With this convention, the Fourier transform of 1 is δ, and the Fourier transform of δ is 2π.
One can show that the Fourier transform of a cosine is a sum of delta functions, and the Fourier transform of a sine is a difference of delta functions.
It follows that the Fourier transform of a Fourier series is a sum of delta functions shifted by integers. In fact, if you convert the Fourier series to complex form, the coefficients of the deltas are exactly the Fourier series coefficients.
2025-12-08 20:30:58
Suppose you have a constant function f(x) = c. What is the Fourier transform of f?
We will show why the direct approach doesn’t work, give two hand-wavy approaches, and a rigorous definition.
Unfortunately there are multiple conventions for defining the Fourier transform.
For this post, we will define the Fourier transform of a function f to be
If f(x) = c then the integral diverges unless c = 0.
The more concentrated a function is in the time domain, the more it spreads out in the frequency domain. And the more spread out a function is in the time domain, the more concentrated it is in the frequency domain. If you think this sounds like the Heisenberg uncertainty principle, you’re right: there is a connection.
A constant function is as spread out as possible, so it seems that its Fourier transform should be as concentrated as possible, i.e. a delta function. The delta function isn’t literally a function, but it can be made rigorous. More on that below.
The Fourier transform of the Gaussian function exp(−x²/2) is the same function, i.e. the Gaussian function is a fixed point of the Fourier transform. More generally, the Fourier transform of the density function for a normal random variable with standard deviation σ is the density function for a normal random variable with standard deviation 1/σ.
As σ gets larger, the density becomes flatter. So we could think of our function f(x) = c as some multiple of a Gaussian density in the limit as σ goes to infinity. The Fourier transform is then some multiple of a Gaussian density with σ = 0, i.e. a point mass or delta function.
If f and φ are two well-behaved functions then
In other words, we can move the “hat” representing the Fourier transform from one function to the other. The equation above is a theorem when f and φ are nice functions. We can use it to motivate a definition when the function f is not so nice but the function φ is very nice. Specifically, we will assume φ is an infinitely differentiable function that goes to zero at infinity faster than any polynomial.
Given a Lebesgue integrable function f, we can think of f as a linear operator via the map
More generally, we can define a distribution to be any continuous [1] linear operator from the space of test functions to the complex numbers. A distribution that can be defined by integral as above is called a regular distribution. When we say we’re taking the Fourier transform of the constant function f(x) = c, we’re actually taking the Fourier transform of the regular distribution associated with f. [2]
Not all distributions are regular. The delta “function” δ(x) is a distribution that acts on test functions by evaluating them at 0.
We define the Fourier transform of (the regular distribution associated with) a function f to be the distribution whose action on a test function φ equals the integral of the product of f and the Fourier transform of φ. When a function is Lebesgue integrable, this definition matches the classical definition.
With this definition, we can calculate that the Fourier transform of a constant function c equals
Note that with a different convention for defining the Fourier transform, you might get 2π c δ or just c δ.
An advantage of the convention that we’re using is that the Fourier transform of the Fourier transform of f(x) is f(−x) and not some multiple of f(−x). This implies that the Fourier transform of √2π δ is 1 and so the Fourier transform of δ is 1/√2π.
[1] To define continuity we need to put a topology on the space of test functions. That’s too much for this post.
[2] The constant function doesn’t have a finite integral, but its product with a test function does because test functions decay rapidly. In fact, even the product of a polynomial with a test function is integrable
The post Fourier transform of a flat line first appeared on John D. Cook.