In recent years, certain journals have published
history-of-astronomy research papers that attempt to find the reliability
of data-fits for two-unknown Gaussian cases.
None of the papers' authors has understood
how to solve Gaussian bivariate problems analytically
— or, indeed, by any means besides monovariate testing,
trial&error, or guesswork.
So it will be of assistance, for researchers without extensive scientific experience, to delineate here a simple path to solving two-unknown statistical problems.
[DIO has occasional experience in this area. Our non-metric photogrammetry on R.Peary's 1909 “North Pole” photos was an analytic least-squares simultaneous solution for 22 unknowns. See Scientific American 1990 June's coverage of our photogrammetry; or DIO 1.1 ‡4 [p.29].]
Likewise for 1975&2002
mis-attempts to solve the famous Hipparchos-Strabo
“klimata” data, the correct solution to which had already
in 1934 been discovered&proven
by the late eminent philologist Aubrey Diller (Indiana University).
So, below, we will use
this modern klimata controversy as a useful mathematical illustration
of how to apply bivariate statistics with felicitous ease.
[From antiquity to the Renaissance, scientists defined a “klima” (ancestor to the word “climate”) as a latitude L where the longest day M was some discrete value. Ancient tables of klimata (e.g., Almajest 2.6) were usually computed at convenient intervals, such as every quarter-hour of M. Klimata-circles are visible on numerous globes well into the Renaissance.]
We will now lay out a direct method that requires no knowledge of
how to compute a least-squares problem analytically.
And we start by supplying some upbeat news for the uninitiated: computing a two-unknown probability P is in some respects easier than for any other number of unknowns.
It helps that when computing
probability P from probability density pd
for even numbers of unknowns, we do not
need to use the difficult error-function.
Here, throughout, we compute P as
the cumulative probability
exterior to the locus of points with the same pd as
the point of interest
— exactly as is routinely done
for 1-dimensional statistical analysis.
Most textbooks give a formula for the bivariate probability density pd that is not only messy but requires specialized analytic investigation to determine the inputs: the two unknowns' standard deviations and their correlation. So some (especially historians & classicists with non-extensive math backgrounds) may find it a pleasant surprise to encounter a clear, accurate, & efficient method for finding P while avoiding all that bother & clutter.
We have a set of N values of a variable j for specified values of an independent variable t, and there is a proposed function f(x,y;t) that is intended to fit the values of j as closely as possible — for the best choices of two unknowns: x&y.
We compute function f 's value corresponding to each of the N values of t for which values of j are available. Each difference between a datum j and the computed function f(x,y;t) [which is supposed ideally to equal j] is called a “residual” Δ, with sign-convention such that Δ = j − f. The probability of any given pair of values of x&y is measured by the sum S of the residuals-squared:
At the x&y where S is minimized (values which could be found by trial without sophisticated analysis, as already noted), a subscript m is appended to x&y and to S.
Our quick&undirty exact solution-for-amateurs will start with a trivial calculation, finding the unknowns' normalized deviations from their best fitted values, via the relative Difference D between the sum S at the point of interest (x,y) vs that (Sm) at S's minimum (best-fit):
Then our solution for P is simply:
where (again) N is just the number of data.
That's all. It's that simple. And it's as accurate as any other method.
[It should be noted that all standard procedures must find the foregoing quantities (S & its minimum, etc) anyway, en route to their more elaborate solutions — e.g., finding standard deviations σ, correlation(s), pd, and so on. So if finding P is the goal (and it is obviously the main one), our equation makes P easily accessible at the outset, obviating any necessity for getting into complexities beyond. (The main advantage of sophisticated least-squares analysis is that, without trial&error's tedium, it will discover xm, ym, Sm, P, and all σs & ρs.)]
An example always helps. So let us examine
well-known case of the
data which are discussed and explanatorially tabulated in
DIO 16 
‡3 (Tables 1&2 [pp.20-21]).
These data Aubrey Diller discovered in 1934 were anciently
calculated via spherical trigonometry
for Earth-obliquity ε = 23°2/3.
Yet, in one of the statistically inelegant papers cited
it is proposed
(Journal for the History of Astronomy 33:15-19 )
[a] that Diller's obliquity ε should have been not 23°2/3 but 23°51'20", and
[b] that a hitherto-undetected constant A might have been added in antiquity to all Hipparchan klimata.
In the following probe of these proposals, we will cooperatively treat ε as an unknown and (to test the JHA's 2nd speculation) add another unknown A to the anciently standard sph trig equation for latitude L (in degrees) as a function of longest-day M (in hours), thus:
Comparing this function to the Hipparchos-Strabo klimata data:
we seek [a] its probability, also
[b] the best-fit choice
of ε&A, as well as
[c] a way of gauging
the probability of all other ε&A pairs,
Here, L, M, ε, & A take the respective rôles of j, t, x, & y in our earlier explanatory paragraph. The data L are the latitude values given by Strabo, for 13 distinct klimata M, ranging from 12h3/4 to 19h.
A table is provided (below) of the Hipparchos-Strabo klimata latitude L
data (in stades, consistently reported by Strabo in hundreds of stades),
adjacent to competing theories' computed L
(rounded here to 1 stade) and their residuals Δ,
with sign-convention Δ = Hipparchos-minus-calculation.
The last rows of the table display residual-sum S,
normalized r (see below),
and probability P.
[A long-orthodox but poorly-fitting 1975 Princetitute solution (Neugebauer Hist Ancient Math Astron 1975 pp.305-306 & 334) is also tabulated at far right, though it was rejected even by JHA 2002.]
|Klima||M||Hipp L||best-fit L||best-fit Δ||Diller L||Diller Δ||DIO L||DIO Δ||JHA L||JHA Δ||Princttt L||Princttt Δ|
One easily sees that the column which perfectly matches Strabo's figures (to the 100-stade precision of his reportage) is the DIO one, which realizes (e.g., DIO 4.2  p.55 n.6) that each L in degrees had previously (before conversion to stades at 700 /degree) been rounded to the nearest 1°/12 (5'), according to ancient geographical convention for klimata (Geographical Directory 1.23; c.160 AD).
Analysis or testing on our table's data finds S is minimized at
ε = 23°37'.60
(primes signify arcmin = 60ths of degrees)
& A = −2'.44 (−28 stades,
at Eratosthenes-Hipparchos-Almajest scale
of 1° = 700 stades); minimum Sm
= 11279 square stades or
82.86 square arcmin.
As noted above, JHA 2002
argues that Diller's values and data are both invalid, contending that
ε ought to be 23°51'20" (the Eratosthenes-Ptolemy value) and that
A looks like it ought to be +100 stades or 8'.57.
[JHA 2002 rejects Diller yet implicitly assents to Diller's central contention (that Hipparchos used spherical trig) when he says (ibid p.17) “one may well suspect that one or two modest changes in the intervals, through either scribal error or deliberate tampering, could have introduced systematic errors which would affect the value of the obliquity best fitting the data.” But JHA 2002 approaches the problem of solving for the speculated tampering by adducing Hipparchan data exterior to the Strabo data-set, rather than searching by math analysis, as we do here.]
We next test JHA 2002's proposed values, as well as Diller's, by our compact least-squares method.
For each Strabo M, one compares the computed L (found by substituting JHA 2001's ε&A into the above formula) to the L given by Strabo. (See Table.) The residuals Δ (Hipparchos L − computed L) are each squared and the sum of those Δ-squares is formed. This sum is: S = 539.8 square arcmin.
Our simple method computes as follows: relative Diff D = 539.8/82.86 − 1 = 5.598; multiply D by 11 (since N − 2 = 13 − 2 = 11); divide by 2; & invert the natural anti-log:
For the JHA proposal (ε = 23°51'20" & A = +100 stades), we find:
On the other hand, for Diller's values (ε = 23°2/3 & A = 0), we find S = 88.78 square arcmin, so D = 0.0718, and
(Both values are of course found in
the P row of our table.)
Given the contrast
between the P, there is no need for subtle analysis to choose between
the Diller-DIO & the JHA solutions.
[NB: Our table shows that the Diller-DIO fit's tightness actually exceeds even that of the least-squares-test's best-fit — and by a considerable margin, S being less by 25% at merely 62.24 square arcmin.
This dramatizes how spectacularly precise is the fit effected (to a problem of extraordinary sensitivity), by introduction of standard ancient geographical rounding of angles to 5' precision.]
We now temporarily return to general analysis (which will ultimately show the equivalence of the simply & the elaborately obtained values for P).
A contraction (followed by a generalization) of our simple method can be effected through the standard definition of “degrees of freedom”:
where N = the number of data, and U = the number of unknowns (2 in the bivariate case, by definition). Thus, the earlier equation is seen to be but one instance of the more general rule:
[One of the advantages of this rendition of our solution is that while it provides P only for the 2-unknown case, it supplies pdn (the normalized pd) for any number of unknowns. (Normalization in this application refers to re-scaling pd such that its total integral out to infinity equals unity.) However, keep in mind that P, the true measure of a point's probability, is the exterior volume under the pd surface — just as for the 1-unknown problem, the area under the normal curve's tails (likewise the region exterior to all [both] points of the same pd) is the true measure of a point's likelihood for that case. (Which is why the common 1-unknown problem's P is expressed with the error-function, instead of the much simpler function we've seen will express pd. The error-function is not analytically integrable, so it is customarily dealt with via series approximations — or tables pre-computed therefrom.)]
Despite its brevity, our simple equation
is not an approximation.
[Except insofar as most such problems only stay virtually Gaussian near the minimum point. But that warning applies equally to the sophisticated standard methods we will glance at below.]
The usual approach to bivariate analysis sets up an x-y-z coordinate frame where x & y are the unknowns, and the z-axis is for the probability density pd, which is an elliptical-cross-section Gaussian function (“normal surface”) on the x-y plane. In x-y space, the point providing the best fit (maximum pd, minimum S) is at x = xm & y = ym,
We will call the standard deviation of a single datum σ, which is:
In the case of the ancient klimata, the above equation gives σ = 2'.74 = 32.0 stades for testing JHA 2002's 2-unknown theory.
However, note that, though (actually somewhat because) Diller used but one unknown, the same equation for σ shows he has a tighter solution with his 1-unknown theory: σ = 2'.72 = 31.7 stades. (Less than the 32.0 stades yielded by 2-unknown analysis, since the 1-unknown computation of σ uses a bigger F: 13 − 1 = 12, instead of the F = 13 − 2 = 11 which we have used for 2 unknowns. The best 1-unknown least-squares fit is at ε = 23°39'.61±0'.62. Resort to the error function shows that (for normdev = [23°2/3 − 23°39'.61]/0'.62 = 0.63), 23°2/3's two-tailed probability P = 0.53. So from either 1-unknown or 2-unknown perspective, the Diller solution is easily compatible with the Hipparchos klimata data.
Returning to 2-unknown analysis:
We signify each unknown's standard deviation by σ subscripted by that unknown. Even when simplified by translation of the x-y origin to the best-fit position, the standard expression is still cumbersome:
where ρ is the correlation of x&y.
But let us try a more elegant approach, which will end up confirming the simple method given earlier, determining P, the sum of all probability on the ε-A plane outside the locus of points (on said plane) whose S and (thus pd) equals that of any ε-A point we wish to investigate (e.g., JHA 2002's ε = 23°51'20" & A = +100 stades).
The excess SS of the sum of the square residuals S above the minimum such sum, gauges the probability density of any point (ε,A). If we normalize the pd to pdn through dividing it by its value at the minimum point, we have:
In the present problem, the number of degrees of freedom F = 13 − 2 = 11; and we know how to find σ.
We note in passing that all this gives for Diller's ε & A (where, as already noted, S = 88.78 square arcmin & σ = 2'.74):
— where we see that our math (for density pdn) obviously parallels that of our simple method's earlier computation of cumulative probability P.
The surface generated by our latest pdn equation is (but for scale) the same as that for pd. Yet both bear the inconvenience that — because x&y are correlated (highly so for our test equation) — their elliptical cross-sections (which are, after all, loci of constant S and thus pd) are not oriented such that the major&minor axes are along the x&y axes (ε&A axes in the Diller-klimata investigation).
But that difficulty can be eliminated by rotating the x-y plane so that the new x'&y' axes are the eigenvectors of the old frame.
For the Strabo-Diller case we've been using as an example,
the required rotation is almost exactly 50°.
The matrix, which relates the unknowns' uncertainties (and correlation)
to σ, is diagonalized by a corresponding similarity transformation.
For the Diller-klimata case, the resulting matrix's diagonal elements are
in the rather dramatic ratio of c.100-to-1,
showing that the relative standard deviations of
the new unknowns (x', y') are in c.10-to-1 ratio.
[The quadratic secular equation is: E2 − 3.3126E + 0.1054 = 0. This produces eigenvalues of 3.280 & 0.03213 (ratio c.100-to-1), which are the diagonal elements of the newly transformed matrix.]
Correlation ρ — so inconveniently high in the former standard equation — is now mercifully zero, because the new normal surface is symmetric about both the x'&y' axes, reducing said equation to:
In passing, we check pd for JHA 2002's proposed
where x' & y' are 17'.36 & −3'.441, respectively,
and the corresponding standard deviations are 4'.91 & 0'.491, resp,
so the normalized deviations
are 3.54 & −7.00.
The pre-diagonalization norm-devs are 4.34 & 2.94, which are extremely misleading as to the actual odds here (a recommendation for our method, which is simpler and never misleading); this is due to high correlation: ρ = 0.980 (while in the new frame ρ = 0). Nonetheless, if pdn is computed by normalizing the more complex earlier standard formula, the result is the same as that computed from the formula provided just above. Likewise for Diller, where the pre-rotation normdevs are 0.753 & 0.644, we find pdn:
When the result is divided by the pd computed for the minimum point
(i.e., by 1/2π), this normalized pdn < 1 in 10 trillion,
which predictably agrees with P.
[A precise equality for 2 unknowns, which holds only crudely for other cases: at points far from the minimum-pt, P is about equal to pdn multiplied by the product F·D taken to the N − 2 power. (We ignore a proportionality constant of trivial effect on the exponent — and which cancels out anyway when computing P.)]
Our numerical example reminds us that further simplification of the situation can be effected by general normalization of the unknowns: adjust (divide) each new (primed) unknown by its own standard deviation.
It is obvious from the equation for un-normalized pd that we now have:
This new probability density is a radially isotropic (circular-cross-section) Gaussian surface on the normalized x'-y' plane. We exploit a well-known conversion to polar coordinates r&θ, where
Integrating over θ from 0 to 2π produces a probability density pd ' that is a function only of normalized r:
This is a much easier expression to deal with than the usual Cartesian pd.
For probability P, we integrate from any point's r to ∞, to find the volume (under the pd ' function) exterior to the (circular) locus of points with the same r (and thus pd '); this produces:
much simpler than the error function which is unavoidably involved in parallel integration for the familiar case of 1 unknown (or indeed any odd number of unknowns).
For Diller's ε = 23°2/3 & A = 0, where r = 0.889, the same formula gives
Returning to general analysis: using (with our definition of D) the fact that
— the very same compact shortcut earlier proposed here.
In passing, we may toss in a general asymptotic rule, useful when r is large: for any number U of unknowns,
where K is a constant of little effect in this context.
Question: Why do we find U − 2 in the exponent,
regardless of U's magnitude?
Answer: Because shells in hyperspace have area proportional to U − 1; and integrating, for a range of such shells at remote r values, will obviously knock this down another notch, producing an asymptotic expression whose exponent contains U − 2.
with dimensionless r related to v by the equation
where (in units consistent with v's) k is Boltzmann's constant; T, the Kelvin temperature; & m, the molecular weight.
The probability density pd for the same case can be precisely expressed as
which isn't analytically integrable in general, though of course the definite integral from 0 to ∞ is unity.
Retrospective Comment on the Foregoing:
The reason our various bivariate methods give consistent answers is that P, the proportion of the volume (beneath the normal surface) exterior to the locus of constant S, is invariant under our successive transformations: translation, rotation, normalization-rescaling.
Thus, our compact method is as accurate as any. While far simpler.