2025-11-20 09:03:10
I mentioned in the previous post that the harmonic numbers Hn are never integers for n > 1. In the spirit of that post, I’d like to find the value of n such that Hn is closest to a given integer m.
We have two problems to solve. First, how do we accurately and efficiently compute harmonic numbers? For small n we can directly implement the definition. For large n, the direct approach would be slow and would accumulate floating point error. But in that case we could use the asymptotic approximation
from this post. As is often the case, the direct approach gets worse as n increases, but the asymptotic approximation gets better as n increases. Here γ is the Euler-Mascheroni constant.
The second problem to solve is how to find the value of n so that Hn comes closest to m without trying too many possible values of n? We can discard the higher order terms above and see that n is roughly exp(m − γ).
Here’s the code.
import numpy as np
gamma = 0.57721566490153286
def H(n):
if n < 1000:
return sum([1/k for k in range(1, n+1)])
else:
n = float(n)
return np.log(n) + gamma + 1/(2*n) - 1/(12*n**3)
# return n such that H_n is closest harmonic number to m
def nearest_harmonic_number(m):
if m == 1:
return 1
guess = int(np.exp(m - gamma))
if H(guess) < m:
i = guess
while H(guess) < m: guess += 1 j = guess else: j = guess while H(guess) > m:
guess -= 1
i = guess
x = np.array([abs(H(k) - m) for k in range(i, j+1)])
return i + np.argmin(x)
We can use this, for example, to find the closest harmonic number to 10.
>>> nearest_harmonic_number(10) 12366 >>> H(12366) 9.99996214846655
I wrote the code with integer values of m in mind, but the code works fine with real numbers. For example, we could find the harmonic number closest to √20.
>>> nearest_harmonic_number(20**0.5) 49 >>> H(49)**2 20.063280462918804
2025-11-20 07:48:52
József Kürschák proved in 1908 that the function
is never an integer for 0 < m < n. In particular, the harmonic numbers
are never integers for n > 1.
The function f(m, n) can get arbitrarily close to any integer value by taking m and n large enough, but it can never exactly equal an integer.
For this post, I’d like to look at how close f(m, n) comes to an integer value when 0 < m < n ≤ N for some large N, say N = 100,000.
The most naive way to approach this would be to compute f(m, n) for all m and n and keep track of which values were closest to integers. This would be wasteful since it would recompute the same terms over and over. Instead, we could take advantage of the fact that
Instead of working with f(m, n), it will be convenient to work with just its fractional part
because it won’t hurt to throw away the integer parts as we go. The values of m and n minimizing g(m, n) will be the values for which f(m, n) comes closest to an integer from above. The values of m and n maximizing g(m, n) will be the values for which f(m, n) comes closest to an integer from below.
We could calculate a matrix with all values of g(m, n), but this would take O(N²) memory. Instead, for each n we will calculate g(m, n), save the maximum and minimum values, then overwrite that memory with g(m, n + 1). This approach will only take O(N) memory.
When we compute f(m, n) for large values of n, can we rely on floating point arithmetic?
If N = 100,000, f(m, n) < 16 = 24. A floating point fraction has 53 bits, so we’d expect each addition to be correct to within an error of 2−49 and so we’d expect our total error to be less than 2−49N.
The following code computes the values of g(m, n) closest to 0 and 1.
import numpy as np
N = 100_000
f_m = np.zeros(N+1) # working memory
# best values of m for each n
min_fm = np.zeros(N+1)
max_fm = np.zeros(N+1)
n = 2
f_m[1] = 1.5
for n in range(3, N+1):
f_m[n-1] = 1/(n-1)
f_m[1:n] += 1/n
f_m[1:n] -= np.floor(f_m[1:n])
min_fm[n] = np.min(f_m[1:n])
max_fm[n] = np.max(f_m[1:n])
print(min(min_fm[3:]))
print(max(max_fm))
This reports a minimum value of 5.2841953035454026e-11 and a maximum value of 0.9999999996613634. The minimum value is closer to 0 than our (pessimistic) error estimate, though the maximum value is further from 1 than our error estimate.
Modifying the code a bit shows that the minimum occurs at (27134, 73756), and this the input to g that is within our error estimate. So we can be confident that it is the minimum, though we can’t be confident of its value. So next we turn to Mathematica to find the exact value of g(27133, 73756) as a rational number, a fraction with 32024 digits in the numerator and denominator, and convert it to a floating point number. The result agrees with our estimate in magnitude and to four significant figures.
So in summary
with an error on the order of 10−11, and this is the closest value of f(m, n) to an integer, for 0 < m < n ≤ 100,000.
2025-11-18 21:13:35
Five posts on Pythagorean triangles and Pythagorean triples
2025-11-18 16:22:52
The last couple posts have been about group pairings, specifically Tate pairings as they’re used in cryptography. This post will show that RSA encryption can be seen as a special case of pairing-based cryptography.
The idea comes from Ben Lynn’s 2007 dissertation. Lynn is the “L” in BLS signatures—one of the topics in his dissertations—and in BLS elliptic curves.
A pairing is a bilinear mapping from two groups to a third group
e: G1 × G2 → GT.
Here bilinear means that if P is an element of G1 and Q is an element of G2, and a and b are nonnegative integers, then
e(aP, bQ) = e(P, Q)ab.
There are more criteria for a pairing to be useful in cryptography, but we won’t need those for this post.
Ben Lynn’s dissertation mentions that exponentiation is a special case of pairing if you let G1 and GT be the multiplicative group of the integers mod r and let G2 be the additive group of integers mod (r − 1). Then you can define a pairing by
e(g, a) = ga.
Typically you can’t just write down a simple expression for a pairing, but in this case you can.
RSA encryption corresponds to r = pq where p and q are large primes. The product pq is made public but the factorization into p and q is held secret. A message [1] is encrypted by exponentiation mod n where the exponent is the public key. In Lynn’s notation, the message is g and the public key is a.
The security of RSA encryption depends on the fact that you can’t recover g from ga mod n unless you know a trapdoor, the factorization of n [2]. This is true of pairings more generally: it is not practical to recover the inputs to a pairing from the output unless you know a trapdoor.
[1] In practice, RSA isn’t used to encrypt entire messages. Instead, it is used to encrypt a key for a symmetric encryption algorithm such as AES, and that key is used to encrypt the message. This is done for efficiency.
[2] Or, more specifically, a private key that can easily be computed if you know the factorization of n. It’s conceivable that breaking RSA encryption is easier than factoring, but so far that does not appear to be the case.
The post RSA as a pairing first appeared on John D. Cook.2025-11-18 00:51:24
Given a point P on an elliptic curve E, and a random number a, aP means to add P to itself a times, using the addition on E. The point aP can be computed efficiently, even if a is a very large number [1]. However, if E has a large number of points, and if a is chosen at random from a large range, then it is not practical to compute a given P and aP.
This is the elliptic curve version of the discrete logarithm problem, and its presumed difficulty is the basis of the security of Diffie-Hellman key exchange.
With two-party Diffie-Hellman key exchange, two parties, Alice and Bob, generate random private keys a and b respectively. They agree on a point P on an elliptic curve E. Alice computes aP and sends it to Bob. Simultaneously Bob computes bP and sends it to Alice. Then Alice can compute
a(bP) = (ab)P
and Bob can compute
b(aP) = (ba)P = (ab)P.
Then both Alice and Bob know a shared secret, the point (ab)P on E, but neither party has revealed a private key.
You could extend the approach above to three parties, say adding Carol, but this would require extra communication: Alice could send (ab)P to Carol, which she could multiply by her private key c to obtain abcP. Similarly, everyone else could arrive at abcP. Each person has to do a computation, send and receive a message, do another computation, and send an receive another message.
Joux [2] came up with a way to do Diffie-Hellman key exchange with three people and only one round of sending and receiving messages. The set up uses a pairing e( , ) of two elliptic curve subgroups, G1 and G2, as in the previous post. Fix generators P ∈ G1 and Q ∈ G2. Each party multiplies P and Q by their private key and sends the results to the other two parties.
Alice receives bP from Bob and cQ from Carol. This is enough for her to compute
e(bP, cQ)a = e(P, Q)abc.
Similarly, Bob receives aP from Alice and cQ from Carol, enabling him to compute
e(aP, cQ)b = e(P, Q)abc.
And finally, Carol receives aP from Alice and bQ from Bob, enabling her to compute
e(aP, bQ)c = e(P, Q)abc.
So all three parties can compute the shared secret e(P, Q)abc. but no party knows the other parties’ private keys.
[1] If you want to multiply a point by 2100, for example, you don’t carry out 2100 additions; you carry out 100 doublings. Of course not every positive integer is a power of 2, but every positive integer is the sum of powers of 2, i.e. it can be written in binary. So as you’re doing your doublings, sum the terms that correspond to 1s in the binary representation of the number you’re multiplying by.
[2] Antoine Joux. A One Round Protocol for Tripartite Diffie–Hellman. Journal of Cryptology (2004) 17: 263–276.
The post Three-party Diffie-Hellman in one shot first appeared on John D. Cook.2025-11-17 04:27:09
Pairings can mean a variety of related things in group theory, but for our purposes a pairing is a bilinear mapping from two groups to a third group.
e: G1 × G2 → GT
Typically the group operation on G1 and G2 is written addititvely and the group operation on GT is written multiplicatively. In fact, GT will always be the multiplicative group of a finite field, i.e. GT consists of the non-zero elements of a finite field under multiplication. (The “T” stands for “target.”)
Here bilinear [1] means that if P is an element of G1 and Q is an element of G2, and a and b are nonnegative integers,
e(aP, bQ) = e(P, Q)ab.
There are a few provisos …
First, the pairing must be non-degenerate, i.e. e(P, Q) ≠ 1 for some P and Q.
Second, the pairing must be efficiently computable.
Third, the embedding degree must not be “too high.” This means that if GT is the multiplicative group of a field with pk elements, k is not too big. We will look at two examples in which k = 12.
The second and third provisos are important even though they’re not stated rigorously.
Cryptography often speaks of pairing elliptic curves, but in fact it uses pairings of prime-order subgroups of the additive groups of elliptic curves. Because the subgroups have prime order, they are cyclic, and so the pairing is determined by its value on a generator from each subgroup.
The previous post briefly mentioned a pairing between two elliptic curves, BN254 and alt_bn128, that is used in Ethereum and was used in Zcash in the original Sprout shielded protocol.
The elliptic curve BN254 is defined over the field Fp, the integers mod p, where
p = 21888242871839275222246405745257275088696311157297823662689037894645226208583.
and the elliptic curve alt_bn128 is defined over the field Fp[i], i.e. the field Fp, with an imaginary element i adjoined.
Both elliptic curves have a subgroup of order
r = 21888242871839275222246405745257275088548364400416034343698204186575808495617,
which is prime. So in the pairing the groups G1 and G2 are isomorphic to the integers mod r. The target group GT has order p12 − 1 and so the embedding degree k equals 12, and so the embedding degree is “not too high.”
Another example also comes from Ethereum and Zcash. Ethereum uses BN254 in smart contracts, but it uses BLS12-381 in its consensus layer. Zcash switched from BN254 to BLS12-381 in the Sapling release.
BLS12-381 is defined over a prime order field with on the order of 2381 elements and has embedding order 12, hence 12-381. The BLS stands for Paulo Barreto, Ben Lynn, and Michael Scott. Elliptic curve names often look mysterious, but they’re actually pretty descriptive. I discuss BLS12-381 in more detail here. As in the example above, BLS12-381 is defined over a field Fp and is paired with a curve over Fp[i], i.e. the same field with an imaginary element adjoined. The equation for BLS12-381 is
y² = x³ + 4
and the equation for the curve it is paired with is
y² = x³ + 4(1 + i)
As before the target group is the multiplicative group of a finite field of order p12.
[1] You’ll also see bilinearity defined by
e(P + Q, R) = e(P, R) e(Q, R)
and
e(P, R + S) = e(P, R) e(P, S).
These definitions are equivalent. To see that the definition here implies the definition at the top, write out aP as P + P + … + P etc.
Since we’re working in subgroups of prime order, there is a generator for each subgroup. Write out each element as a multiple of a generator, then the definition at the top implies the definition here.
The post Elliptic curve pairings in cryptography first appeared on John D. Cook.