The Irwin-Hall Distribution Is a B-Spline

I added the Irwin-Hall distribution to SciPy in spring 2024 (#20481), released in 1.14.0 as scipy.stats.irwinhall. The implementation is compact because the density function is a cardinal B-spline, and SciPy already has B-spline infrastructure.

Background

The Irwin-Hall distribution with parameter $n$ is the distribution of the sum of $n$ independent $\text{Uniform}(0,1)$ random variables:

$$X = \sum_{k=1}^n U_k$$

It’s supported on $[0, n]$ with mean $n/2$ and variance $n/12$. Irwin and Hall published on it independently in the same 1927 issue of Biometrika.J.O. Irwin, "On the frequency distribution of the means of samples...", and P. Hall, "The distribution of means for samples of size N...", both in Biometrika Vol. 19 No. 3–4, December 1927. The PDF for $x \in [0, n]$ is

$$f_X(x;n) = \frac{1}{(n-1)!}\sum_{k=0}^{\lfloor x \rfloor}(-1)^k\binom{n}{k}(x-k)^{n-1}$$

which is a piecewise polynomial of degree $n-1$ with integer knots. A prior PR from 2019 implemented this formula directly but went stale. I came across the feature request while looking at Rao’s spacing test, which involves the distribution of sums of uniform order statistics.

The B-spline formulation

A cardinal B-spline of degree $d$ with integer knots is defined as the $(d+1)$-fold convolution of $\mathbf{1}_{[0,1]}$ — the uniform density — with itself. The Irwin-Hall PDF is the $n$-fold self-convolution of this same density, so it’s a B-spline of degree $n-1$ with knots at $0, 1, \ldots, n$.

scipy.interpolate.BSpline can construct this, and the CDF and survival function fall out of it:

@staticmethod
def _cardbspl(n):
    t = np.arange(n + 1)
    return BSpline.basis_element(t)

def _pdf(self, x, n):
    return np.vectorize(
        lambda x, n: self._cardbspl(n)(x)
    )(x, n)

def _cdf(self, x, n):
    return np.vectorize(
        lambda x, n: self._cardbspl(n).antiderivative()(x)
    )(x, n)

def _sf(self, x, n):
    return np.vectorize(
        lambda x, n: self._cardbspl(n).antiderivative()(n - x)
    )(x, n)

The CDF is the B-spline’s antiderivative, which BSpline can compute. The SF uses symmetry around $n/2$ — without it, computing 1 - cdf(x) in the upper tail gives zero when cdf(x) is within a ULP of 1.0.Warren Weckesser caught this during review: irwinhall(10).sf(9.9) returns 0.0 without an explicit SF, instead of the correct $2.76 \times 10^{-17}$. The symmetry-based SF avoids the cancellation by evaluating the CDF from the other tail.

Numerical precision

B-spline evaluation via de Boor’s algorithm gives roughly 10 ULPs of precision, degrading with $n$. This is comparable to Boost’s cardinal_b_spline. For large $n$ the Irwin-Hall distribution converges to a normal by the CLT, so in practice you’d switch to a Gaussian approximation before the precision ceiling matters.The old trick of summing 12 uniform draws and subtracting 6 to approximate $\mathcal{N}(0,1)$: 12 is chosen because $n/12 = 1$ makes the variance come out clean. irwinhall(12) gives the exact distribution of that sum. Its excess kurtosis is $-0.1$ — slightly thinner tails than a Gaussian.

Moments

The $m$-th raw moment has a closed form in terms of Stirling numbers of the second kind:

$$E[X^m] = \frac{\left\lbrace{n+m \atop n}\right\rbrace}{\binom{n+m}{n}}$$

I didn’t start with this formula. My first version derived moments from the cumulants (the CGF is $n\log\!\left(\frac{e^t - 1}{t}\right)$, so the cumulants are simple). I removed that code after review because it wasn’t obviously better than the generic numerical approach. Later, rlucas7 pointed out the Stirling identity in a related issue. Since scipy.special.stirling2 already existed, adding it back was a one-liner using exact integer arithmetic.

The first four moments don’t need the Stirling formula since they follow from the uniform’s moments directly: mean $n/2$, variance $n/12$, zero skewness, excess kurtosis $-6/(5n)$. These are hardcoded in _stats.

Other details

Random variate generation is just the definition: generate a $(n, \ldots)$-shaped array of uniforms and sum along the first axis. My first version inherited the generic _rvs; Haberland suggested overriding it.

The Bates distribution (the mean of $n$ IID uniforms) is irwinhall(n, scale=1/n). No separate class.

The fit method raises NotImplementedError because generic MLE doesn’t converge when the shape parameter is constrained to positive integers.There was some back and forth about whether to raise unconditionally or fall back to super().fit when $n$ is fixed. Unconditional was simpler and the consensus.

Parameter validation rejects non-integer $n$ in _argcheck. I initially didn’t know how to do this for a continuous distribution; the pattern is more common among discrete distributions. Haberland pointed me to the right examples.

Review

The PR was open for about five weeks. Matt Haberland was the primary reviewer, with later input from Warren Weckesser and steppi. Most of the iteration was about SciPy conventions — getting vectorization to work for arbitrary array shapes, and sorting out Sphinx formatting and the fit question. My first cut only handled 1D arrays, and the reference formatting was fiddly. Weckesser raised the point of requiring tighter per-distribution numerical tests. Haberland pushed back: many existing distributions in scipy.stats have the same tail-precision problems, and holding new additions to stricter undocumented standards doesn’t make sense. That infrastructure issue is tracked in #17832.

The naming question — irwinhall vs uniformsum (what Mathematica uses) — came up in the required discussion forum post. The statistics literature uses the eponymous name and that’s what stuck.

PR #20481 · docs