Gravitational Attraction
What would happen if two people out in space a few meters apart, abandoned by their spacecraft, decided to wait until gravity pulled them together? My initial thought was that …
In #statistics #probability #education #science #sie #projects

I just added an example of simple model construction to my textbook, Statistical Inference for Everyone. It's a process I don't think I've ever seen in an intro stats book, but is common in scientific work. The idea is that you start off with a simple model, collect data, then notice where your simple model breaks, propose a new more complex model, and do the analysis again.
The entire data set I use is here, where I have the mass of US Pennies for several years:
| Year | Mass |
|---|---|
| 1960 | 3.133 |
| 1961 | 3.083 |
| 1962 | 3.175 |
| 1963 | 3.120 |
| 1964 | 3.100 |
| 1965 | 3.060 |
| 1966 | 3.100 |
| 1967 | 3.100 |
| 1968 | 3.073 |
| 1969 | 3.076 |
| 1970 | 3.100 |
| 1971 | 3.110 |
| 1972 | 3.080 |
| 1973 | 3.100 |
| 1974 | 3.093 |
| 1989 | 2.516 |
| 1990 | 2.500 |
| 1991 | 2.500 |
| 1992 | 2.500 |
| 1993 | 2.503 |
| 1994 | 2.500 |
| 1995 | 2.497 |
| 1996 | 2.500 |
| 1997 | 2.494 |
| 1998 | 2.512 |
| 1999 | 2.521 |
| 2000 | 2.499 |
| 2001 | 2.523 |
| 2002 | 2.518 |
| 2003 | 2.520 |
One starts this analysis loading the first part (earlier than 1975), and applying a model which states that there is a single "true" value. The best estimate of this value is the sample mean, and the posterior distribution is normal. A plot of this looks like

If you apply it to all the data, you get something that clearly looks ridiculous:

It is then that it makes sense to change the model to a two "true" values model.
With this model, we have separate means for the pre- and post-1975 data, and can look at the overlap of the credible intervals, or the posterior distribution of the difference, both of which clearly show a statistically significant difference.


This approach has several advantages over the typically methods used to teach this topic: