Without Loss of Precision

Condensed 2012

In recent years, certain journals have published
history-of-astronomy research papers that try to find the reliability
of data-fits for 2-unknown Gaussian cases.
None of the papers' authors has understood
how to solve Gaussian bivariate problems analytically
— or, indeed, by any means besides monovariate testing,
trial&error, or guesswork.

So it will be of assistance, for researchers without extensive
scientific experience, to delineate here
a simple path to solving two-unknown statistical problems.

[** DIO** has occasional experience in this area.
Our non-metric photogrammetry on R.Peary's 1909
“North Pole” photos
was an analytic least-squares simultaneous solution for 22 unknowns.
See

Likewise for 1975&2002
mis-attempts to solve the famous Hipparchos-Strabo
“klimata” data, the correct solution to which had already
in 1934 been discovered&proven
(** Klio 27**:258-269)
by the late eminent philologist Aubrey Diller (Indiana University).
So, below, we will use
this modern klimata controversy as a useful mathematical illustration
of how to apply bivariate statistics with felicitous ease.

[From antiquity to the Renaissance, scientists defined a “klima” (ancestor to the word “climate”) as a latitude

We will now lay out a direct method that requires no knowledge of
how to compute a least-squares problem analytically.

And we start by supplying some upbeat news for the uninitiated:
computing a two-unknown probability *P*
**is in some respects easier than for any other number of unknowns**.

It helps that when computing
probability *P* from probability density *pd*
for even numbers of unknowns, we do not
need to use the difficult error-function.
Here, throughout, we compute *P* as
**the cumulative probability
exterior to the locus of points with the same pd as
the point of interest**
— exactly as is routinely done
for 1-dimensional statistical analysis.

Most textbooks give a formula for the bivariate probability density *pd*
that is not only messy but requires specialized analytic
investigation to determine the inputs: the two unknowns' standard deviations
and their correlation. So some (especially historians & classicists
with non-extensive math backgrounds)
may find it a pleasant surprise to learn of
a clear, accurate, & efficient method for finding *P*
while avoiding all that bother & clutter.

We have a set of *N*
values of a variable *j* for specified values of
an independent variable *t*, and there is
a proposed function
*f*(*x*,*y*;*t*)
that is intended to fit the values of *j* as closely as possible
— for the best choices of two unknowns: *x*&*y*.

We compute function *f* 's value
corresponding to each of the *N*
values of *t* for which values of *j* are available.
Each difference between a datum *j* and the computed function
*f*(*x*,*y*;*t*)
[which is supposed ideally to equal *j*] is called
a “residual” Δ, with sign-convention such that
Δ = *j* − *f*.
The probability of any given pair of values of *x*&*y* is
measured by the sum *S* of the residuals-squared:

At the *x*&*y* where *S* is minimized (values which
could be found by trial without sophisticated analysis, as already noted),
a subscript m is appended to *x*&*y* and to *S*.

Our quick&undirty exact solution-for-amateurs will start
with a trivial calculation, finding
the unknowns' normalized deviations from their best fitted values,
via the relative Difference *D* between
the sum *S* at the point of interest (*x*,*y*)
vs that (*S*_{m}) at *S*'s minimum (best-fit):

Then our solution for *P* is simply:

where (again) *N* is just the number of data.
*That's all*. It's that simple. And it's as accurate as any other method.

[It should be noted that all standard procedures must find
the foregoing quantities (*S* & its minimum, etc) anyway,
*en route* to their more elaborate solutions
— e.g., finding standard deviations σ,
correlation(s), *pd*, and so on.
So if finding *P* is the goal (and it is obviously the main one), our
equation makes *P* easily accessible at the outset,
obviating any necessity for getting into complexities beyond.
(The main advantage of sophisticated least-squares analysis is that,
without trial&error's tedium, it will discover
*x*_{m}, *y*_{m}, *S*_{m},
*P*, and all σs & ρs.)]

An example always helps. So let us examine
the above-cited
well-known case of the
Hipparchos-Strabo klimata,
data which are discussed and explanatorially tabulated in
** DIO 16** [2009]
‡3 (Tables 1&2 [pp.20-21]).
These data Aubrey Diller discovered in 1934 were anciently
calculated via spherical trigonometry
for Earth-obliquity ε = 23°2/3.

Yet, in one of the statistically inelegant papers cited
earlier here,
it is proposed
(** Journal for the History of Astronomy 33**:15-19 [2002])

[a] that Diller's obliquity ε should have been not 23°2/3 but 23°51'20", and

[b] that a hitherto-undetected constant

In the following probe of these proposals, we will cooperatively treat ε as an unknown and (to test the

Comparing this function to the Hipparchos-Strabo klimata data:
we seek [a] its probability, also
[b] the best-fit choice
of ε&*A*, as well as
[c] a way of gauging
the probability of all other ε&*A* pairs,
e.g., Diller's.

Here, *L*, *M*, ε, & *A*
take the respective rôles of *j, t, x,* & *y*
in our earlier explanatory paragraph.
The data *L* are the latitude values given by Strabo,
for 13 distinct klimata *M*, ranging from 12h3/4 to 19h.

A table is provided (below) of the Hipparchos-Strabo klimata latitude *L*
data (in stades, consistently reported by Strabo in hundreds of stades),
adjacent to competing theories' computed *L*
(rounded here to 1 stade) and their residuals Δ,
with sign-convention Δ = Hipparchos-minus-calculation.
The last rows of the table display residual-sum *S*,
normalized *r* (see below),
and probability *P*.

[A long-orthodox but poorly-fitting 1975 Princetitute solution
(Neugebauer *Hist Ancient Math Astron* 1975 pp.305-306 & 334)
is also tabulated at far right, though it was rejected even
by ** JHA** 2002.]

Klima | M | Hipp L |
best-fit L | best-fit Δ | Diller L | Diller Δ | DIO L |
DIO Δ | JHAL | JHA Δ |
Princttt L | Princttt Δ |

Cinnamon | 12h3/4 | 8800 | 8812 | −12 | 8824 | −24 | 8808 | −8 | 8848 | −48 | 10202 | −1402 |

Meroë | 13h | 11600 | 11602 | −2 | 11609 | −9 | 11608 | −8 | 11611 | −11 | 12800 | −1200 |

Syene | 13h1/2 | 16800 | 16797 | 3 | 16797 | 3 | 16800 | 0 | 16764 | 36 | 17569 | −769 |

LowerEgypt | 14h | 21400 | 21399 | 1 | 21394 | 6 | 21408 | −8 | 21338 | 62 | 21800 | −400 |

Phoenicia | 14h1/4 | 23400 | 23469 | −69 | 23462 | −62 | 23450 | −50 | 23398 | 2 | 23726 | −326 |

Rhodos | 141/2 | 25400 | 25388 | 12 | 25380 | 20 | 25375 | 25 | 25309 | 91 | 25531 | −131 |

Hellespont | 15h | 28800 | 28798 | 2 | 28788 | 12 | 28817 | −17 | 28711 | 89 | 28800 | 0 |

Massalia | 15h1/4 | 30300 | 30305 | −5 | 30295 | 5 | 30275 | 25 | 30216 | 84 | 30273 | 27 |

Pontus | 15h1/2 | 31700 | 31692 | 8 | 31683 | 17 | 31675 | 25 | 31604 | 96 | 31644 | 56 |

Borysthenes | 16h | 34100 | 34144 | −44 | 34135 | −35 | 34125 | −25 | 34057 | 43 | 34100 | 0 |

Tanais | 17h | 38000 | 37981 | 19 | 37974 | 26 | 37975 | 25 | 37903 | 97 | 38000 | 0 |

S.LittleBritain | 18h | 40800 | 40752 | 48 | 40746 | 54 | 40775 | 25 | 40685 | 115 | 40800 | 0 |

N.LittleBritain | 19h | 42800 | 42761 | 39 | 42758 | 42 | 42758 | 42 | 42705 | 95 | 42800 | 0 |

Residual-Square Sums S(stades squared): | 11279 | 12084 | 8495 | 73470 | 4284816 | |||||||

Residual-Square Sums S(arcmin squared): | 82.86 | 88.78 | 62.24 | 539.8 | 31480 | |||||||

Isotropic Normalized Deviation r: | 0 | 0.89 | 7.8 | |||||||||

Probability P: | 1 | 0.674 | 10^{−13} |

One easily sees that the column which perfectly matches
Strabo's figures (to the 100-stade precision of his reportage) is
the ** DIO** one, which realizes (e.g.,

Analysis or testing on our table's data finds *S* is minimized at
ε = 23°37'.60
(primes signify arcmin = 60ths of degrees)
& *A* = −2'.44 (−28 stades,
at Eratosthenes-Hipparchos-** Almajest** scale
of 1° = 700 stades); minimum

As noted above, ** JHA** 2002
argues that Diller's values and data are both invalid, contending that
ε ought to be 23°51'20" (the Eratosthenes-Ptolemy value) and that

[

We next test ** JHA** 2002's proposed values, as well as
Diller's, by our compact least-squares method.

For each Strabo *M*, one compares the computed *L*
(found by substituting ** JHA** 2001's ε&

Our simple method computes as follows:
relative Diff *D* = 539.8/82.86 − 1 = 5.598;
multiply *D* by 11 (since *N* − 2 = 13 − 2 = 11);
divide by 2; & invert the natural anti-log:

For the ** JHA** proposal
(ε = 23°51'20" &

On the other hand, for Diller's values (ε = 23°2/3 & *A*
= 0), we find *S* =
88.78 square arcmin, so *D* = 0.0718, and

(Both values are of course found in
the *P* row of our table.)
Given the contrast
between the *P*, there is no need for subtle analysis to choose between
the Diller-** DIO** & the

[NB: Our table shows that the Diller-

This dramatizes how spectacularly precise is the fit effected (to a problem of extraordinary sensitivity), by introduction of standard ancient geographical rounding of angles to 5' precision.]

We now temporarily return to general analysis (which will ultimately show
the equivalence of the simply & the elaborately
obtained values for *P*).

A contraction (followed by a generalization) of our simple method can be effected through the standard definition of “degrees of freedom”:

where *N* = the number of data, and *U* = the number of unknowns
(2 in the bivariate case, by definition).
Thus, the earlier equation is
seen to be but one instance of the more general rule:

[One of the advantages of this rendition of our solution
is that while it provides *P* only for the 2-unknown case,
it supplies *pdn* (the normalized *pd*) for any number of unknowns.
(Normalization in this application refers to re-scaling
*pd* such that its total integral out to infinity equals unity.)
However, keep in mind that *P*, the true measure of
a point's probability,
is the exterior **volume** under the *pd* surface
— just as for the 1-unknown
problem, the **area** under the normal curve's tails
(likewise the region exterior to all [both] points of the same *pd*)
is the true measure of a point's likelihood for that case.
(Which is why the common 1-unknown problem's *P*
is expressed with the error-function, instead of the much
simpler function we've seen will express *pd*. The error-function
is not analytically integrable, so it is customarily dealt
with via series approximations — or tables pre-computed therefrom.)]

Despite its brevity, our simple equation
is not an approximation.

[Except insofar as most such problems only stay virtually Gaussian
near the minimum point. But that warning applies equally
to the sophisticated standard methods we will glance at below.]

The usual approach to bivariate analysis sets up an *x-y-z*
coordinate frame where *x* & *y* are the unknowns,
and the *z*-axis is for the probability density *pd*, which is
an elliptical-cross-section Gaussian function (“normal surface”)
on the *x-y* plane. In *x*-*y* space,
the point providing the best fit
(maximum *pd*, minimum *S*) is at
*x* = *x*_{m} & *y* = *y*_{m},

We will call the standard deviation of a single datum σ, which is:

In the case of the ancient klimata, the above equation gives σ = 2'.74
= 32.0 stades for testing ** JHA** 2002's
2-unknown theory.

However, note that, though (actually somewhat *because*) Diller used but
one unknown, the same equation
for σ shows he has a tighter solution with his
1-unknown theory:
σ = 2'.72 = 31.7 stades.
(Less than the 32.0 stades yielded by 2-unknown analysis,
since the 1-unknown computation of σ uses
a bigger *F*: 13 − 1 = 12, instead of the *F* = 13 − 2
= 11 which we have used for 2 unknowns. The best 1-unknown least-squares
fit is at ε = 23°39'.61±0'.62. Resort to the error function
shows that (for normdev = [23°2/3 − 23°39'.61]/0'.62 = 0.63),
23°2/3's two-tailed probability *P* = 0.53.
So from either 1-unknown or 2-unknown perspective,
the Diller solution is easily compatible with the Hipparchos klimata data.

Returning to 2-unknown analysis:

We signify each unknown's standard deviation by σ subscripted by that
unknown. Even when simplified by translation of the *x-y* origin
to the best-fit position, the standard expression is still cumbersome:

where ρ is the correlation of *x*&*y*.

But let us try a more elegant approach, which will end up confirming
the simple method given earlier, determining
*P*, the sum of all probability on the *ε-A* plane
outside the locus
of points (on said plane) whose *S* and (thus *pd*)
equals that of any *ε-A* point we wish to investigate (e.g.,
** JHA** 2002's
ε = 23°51'20" &

The excess *SS* of the sum of the square residuals *S* above
the minimum such sum, gauges the probability density of
any point (ε,*A*). If we normalize the *pd* to *pdn*
through dividing it by its value at the minimum point, we
have:

In the present problem, the number of degrees of freedom *F* =
13 − 2 = 11; and we know how to find σ.

We note in passing that all this
gives for Diller's ε & *A* (where, as already noted,
*S* = 88.78 square arcmin & σ = 2'.74):

— where we see that our math
(for density *pdn*) obviously parallels that of
our simple method's earlier computation of
cumulative probability *P*.

The surface generated by our latest *pdn* equation
is (but for scale) the same as that for *pd*.
Yet both bear the inconvenience that
— because *x*&*y* are correlated
(**highly** so for our test equation) —
their elliptical cross-sections (which are, after all,
loci
of constant *S* and thus *pd*)
are not oriented such that the major&minor axes are along
the *x*&*y* axes
(ε&*A* axes in the Diller-klimata investigation).

But that difficulty can be eliminated by rotating
the *x*-*y* plane so that the new *x*'&*y*' axes
are the eigenvectors of the old frame.

For the Strabo-Diller case we've been using as an example,
the required rotation is almost exactly 50°.

The matrix, which relates the unknowns' uncertainties (and correlation)
to σ, is diagonalized by a corresponding similarity transformation.
For the Diller-klimata case, the resulting matrix's diagonal elements are
in the rather dramatic ratio of c.100-to-1,
showing that the relative standard deviations of
the new unknowns (*x*', *y*') are in c.10-to-1 ratio.

[The quadratic secular equation is:
*E*^{2} − 3.3126*E* + 0.1054 = 0.
This produces eigenvalues of 3.280 & 0.03213 (ratio c.100-to-1),
which are the diagonal elements of the newly transformed matrix.]

Correlation ρ — so inconveniently high
in the former standard equation — is
now mercifully zero, because the new normal surface is symmetric
about both the *x*'&*y*' axes, reducing said equation to:

In passing, we check *pd* for ** JHA** 2002's proposed
ε&

The pre-diagonalization norm-devs are 4.34 & 2.94,
which are *extremely* misleading as to the actual odds here
(a recommendation for our method, which is simpler
and never misleading); this is due to high correlation:
ρ = 0.980 (while in the new frame ρ = 0).
Nonetheless, if *pdn* is computed by normalizing
the more complex earlier standard formula,
the result is the same as that computed from the formula
provided just above.
Likewise for Diller, where the pre-rotation
normdevs are 0.753 & 0.644, we find *pdn*:

as previously.

Returning to the ** JHA** values and σs:
for the primed-frame normdevs,
the post-rotation equation for
un-normalized

When the result is divided by the *pd* computed for the minimum point
(i.e., by 1/2π), this normalized *pdn* < 1 in 10 trillion,
which predictably agrees with *P*.

[A precise equality for 2 unknowns, which holds only crudely
for other cases: at points far from the minimum-pt, *P* is about
equal to *pdn*
multiplied by the product *F*·*D*
taken to the *N* − 2 power.
(We ignore a proportionality constant of trivial effect on the exponent
— and which cancels out anyway when computing *P*.)]

Our numerical example reminds us that further simplification of the situation can be effected by general normalization of the unknowns: adjust (divide) each new (primed) unknown by its own standard deviation.

It is obvious from the equation for un-normalized
*pd* that we now have:

This new probability density is a radially isotropic (circular-cross-section)
Gaussian surface on the normalized *x*'-*y*' plane.
We exploit a well-known conversion
to polar coordinates *r*&θ, where

Integrating over θ from 0 to 2π produces a probability density
*pd* ' that is a function
only of normalized *r*:

This is a much easier expression to deal with
than the usual Cartesian *pd*.

For probability *P*, we integrate from any point's *r* to ∞,
to find the volume (under the *pd* ' function) exterior to
the (circular) locus of points with the same *r*
(and thus *pd* '); this produces:

much simpler than the error function which is unavoidably involved in parallel integration for the familiar case of 1 unknown (or indeed any odd number of unknowns).

Applying it to the ** JHA** proposal is easy
since its

as before.

For Diller's ε = 23°2/3 & *A* = 0,
where *r* = 0.889, the same formula gives

Returning to general analysis: using (with
our definition of *D*) the fact that

plus the equation for σ
and that for *P* as a function of *r*, we have:

— the very same compact shortcut earlier proposed here.

In passing, we may toss in
a general asymptotic rule, useful when *r*
is large: for any number *U* of unknowns,

where K is a constant of little effect in this context.

Question: Why do we find *U* − 2 in the exponent,
regardless of *U*'s magnitude?

Answer: Because shells in hyperspace have area proportional
to *U* − 1; and integrating, for a range of such shells
at remote *r* values, will obviously knock this down another notch,
producing an asymptotic expression whose exponent contains *U* − 2.

A 3-dimensional isotropic example familiar to physicists is
the Maxwell Distribution for gas molecules' speeds *v*,
where *U* − 2 = 1, as we see from its asymptotic *P*:

with dimensionless *r* related to *v* by the equation

where (in units consistent with *v*'s) *k* is Boltzmann's constant;
*T*, the Kelvin temperature; & *m*, the molecular weight.

The probability density *pd* for the same case
can be precisely expressed as

which isn't analytically integrable in general, though of course the definite integral from 0 to ∞ is unity.

Retrospective Comment on the Foregoing:

The reason our various bivariate methods give consistent answers is
that *P*, the proportion of the volume (beneath the normal surface)
exterior to the locus of
constant *S*, is *invariant under our successive transformations*:
translation, rotation, normalization-rescaling.

Thus, our compact method is as accurate as any.
While far simpler.

Original Version Uploaded 2009/9/8.