Estimating the Significance of Our Detections
(I am getting some strange results that suggest there is something wrong in my algorithm. I am in the process of looking for a fix; this page should not yet be considered as complete/trustworthy)
We want to get a sense of the statistical significance of the signals that we think we may have found.
Let's start with the basics: a reduced chi-square test where we say that the reduced chi_square statistic is defined as \chi_r^2 = Sum((Obs - Exp)^2/Obs_err^2) / (n_obs - n_params)
- We start by just doing the computation for the 3 surveys of interest for which we have lots of SN (SDSS, SNLS, PS1MD) fitting them to a straight line
- The 'Obs' data then counts as the lumped together residuals of all SN from all 3 surveys
- The expected values are just the best-fit constant offset for a particular survey
- n_obs is the sum of the numbers of SN from all 3 surveys (1057 in total)
- n_params is one constant offset for each survey, for a total of 3
- Running this analysis, we measured a reduced chi square of:\chi_r^2 = 0.73
- Nominally, this reduced chi square value indicates that the data is consistent
- Looking at the results, I guess this is not all that surprising. Consider the following
- The measure of chi square doesn't care about whether the residuals are patterned in a particular way if the model against which we are comparing is just flat
- If we were to randomly scramble the data points, we'd (I believe) lose the observable pattern, and be left with some randomly scattered data with errors that seem consistent with a constant line.
- In other words, the reduced chi_square isn't good about noticing patterned deviations.
I have developed a technique to simultaneously fit various data sets to find the best fit params, according to some super model that should apply to all of the data. We can allow some subset of the parameters in the super model to vary independently between each survey and/or field. The other parameters in the model are forced to be the same for all data sets considered in the fit.
As a test case, I ran the technique to fit a canonical cosmological model with free H_0 and Omega_M (with the constraint that Omega_Lambda = 1.0 - Omega_M) to verify that we get something very close to the standard cosmology.
- I got results very similar with the canonical cosmology, and with a plot the gives a good match:
- I include all SN available, and force all SN to have all the same fit parameters
- H0 = 70.354 (0.048) km/s/Mpc, OmM = 0.27508 (0.00015)
- The reduced chi-sqr of this fit is 0.995
- Even if I change the position of the first guess to be something wildly different, we still get back to this value
Next, I measured the best fit cosmological params for each survey, assuming a standard looking cosmology:
- CANDELS: H0 = 50.26 (1659.23) km/s/Mpc, OmM =1.0 (6.41)
- CFA1: H0 = 72.246 (14.01 )km/s/Mpc, OmM = 0.0 (5.63)
- CFA2: H0 = 71.34 (5.40) km/s/Mpc, OmM = 0.0 (2.54)
CFA3K: H0 = 71.45 (1.19) km/s/Mpc, OmM = 0.0 (0.454)
- CFA3S: H0 = 70.12 (3.97) km/s/Mpc, OmM = 1.0 (1.44)
- CFA4p1: H0 = 69.91 (4.48) km/s/Mpc, OmM = 1.0 (1.36)
- CFA4p2: H0 = 69.24 (8.75) km/s/Mpc, OmM = 0.703 (6.315)
- CSP: H0 = 68.76 (1.95) km/s/Mpc, OmM = 0.823 (0.594)
- HST: H0 = 81.77 (65.59) km/s/Mpc, OmM = 0.111 (0.011)
- PS1MD: H0 = 68.83 (0.57) km/s/Mpc, OmM = 0.392 (0.003)
- SDSS: H0 = 70.19 (0.49) km/s/Mpc, OmM = 0.260 (0.004)
- SNAP: H0 = 62.63 (3357.73) km/s/Mpc, OmM = 0.406 (2.84)
- SNLS: H0 = 70.61 (1.079) km/s/Mpc, OmM = 0.260 (0.001)
- (the uncertainties claimed here are the diagonals of the estimated covariance matrix, provided by the python script)
So not all of these choose the standard cosmology values, but those that don't match the standard values generally have enormous uncertainties. The only surveys that have reasonably small uncertainties in both H0 and OmM are CFA3K, CSP, PS1MD, SDSS, and SNLS. All most all of the surveys seem consistent with H0~69.0 and OmM~0.26. Of these 5 surveys, most are reasonably close to each other and to the best fit values when all surveys are considered together (see above). The one that is most disparate is PS1MD, which predicts a significantly higher value of OmM (0.392 vs ~0.26). How much concern we should have with that point, I don't know...
So the program seems pretty good at honing in on the standard cosmological parameters. Now let's start thinking about detecting alternative physical models. Most of these models are discussed on the page 'Relating distance modulus residuals to physical sources' on the wiki.
The reduced chi-square measurements from the canonical model are quite reasonable, in the sense that they are very close to 1.0, thus indicating that the data seem consistent with the canonical model without the need for smaller corrections.
With that caveat, we want to ask how likely we would see the seemingly coherrent effects in the residual data. The strategy we developed for that task is as follows:
- Define a function to model the residual feature of interest
- Run the least-square fitter, with bounds chosen to focus on the feature of interest
- The bounds are necessary, as the algorithm tends to latch onto very small or very large features if left unbounded. This behavior underscores the fact that the effects we're interested in are pretty marginal
- Measure the difference between the sum-of-squares for the fitted plot and the residuals vs the sum-of-squares for a single, constant offset fit
- That difference between sum-of-squares is our statistic of interest
- Then, randomly shuffle the data points by the following technique:
- Put all the x-values into a list
- Put all the y-values into a list
- Randomly pair the x-values and the y-values to get the random data set
- In this particular case, the x-values are redshifts and the y-values are mu-residuals (but of course, we could do the same with, say, the x-values as extinctions)
- Then rerun the same fitting algorithm on the randomly shuffled points, and again measure the difference between the sum-of-squares
- By repeating this process many times, we can see the likelihood that we would get a coherrent signal as or more significant than the one that we see, if indeed the effect arose due to random variation
- I should note that one could justifiably object to this approach since I am priming the method with a vision of the effect that I am looking for
- ie, I notice a pattern, and look for the likelihood that something like THAT PARTICULAR PATTERN would occur.
- I am not sure how to confront that issue
So now let's try to implement this technique
- I first tried just fitting to the PS1MD data. I'll then try doing the same fit to all SN at once.
- The first fitting function that I have considered using is a Gaussian hump:
- res(z, A, mu, sigma, shift) = A * exp(-(z - mu)^ 2.0 / (2.0 * sigma ^ 2.0)) + shift
- I recognize that I should really do this fit on an underlying physical energy density perturbation, rather than this generic perturbation not physically grounded in anything
- I have been working on developing a technique for computing the perturbation due to a physical effect, but I am not doing so yet for the following reasons:
- Because my least square fitting technique must compute the fitting function many times, it would take a very long time to perform the least squares fitting on a true calculation of the residuals, as that requires numeric integration.
- I do want to take the time to do this fit properly, but I need to think a bit more on how to accomplish this task. I also want to confirm that the technique works before investing that extra time
- So with our function defined, I now choose the following bounds to focus on the region where we see the effect:
- A in [-1.0, 1.0], mu in [0.2, 0.8], sigma in [0.005, 0.02], shift in [-0.1, 0.1]
- A in [-1.0, 1.0], mu in [0.2, 0.8], sigma in [0.005, 0.02], shift in [-0.1, 0.1]
- Running my least squares fitter with the original data on this function with these bounds, we get the best fit parameters:
- A = 0.07424 (0.0020), mu = 0.31355 (0.00005), sig = 0.01078 (0.00005), shift = -0.08780 (0.00010): sum-of-squares = 245.597
- We constrast that with the fit result when we compare to a constant function:
- shift = -0.08081 (0.00008): sum-of-squares = 249.528
- Note that this will always be the least-squares result for a constant function, as we have the same y-values with the same errors, just at different x-positions. So if the fitting function doesn't depend on x, the sum of squares for a given constant will always be the same
- Now we repeat the computation for a series of randomly sorted points. We find, after repeating the computation 50 times, that we get the following sum of square values: [247.89717216113996, 242.38686783914494, 242.56330126107252, 247.34833623707235, 240.64582956995488, 244.13477009608334, 243.20405574493819, 248.58734380687574, 245.82861818757868, 246.14086384686823, 238.56834043707499, 245.33349717343793, 235.97606326236163, 237.23132364458883, 243.50347475790005, 235.6803315392246, 246.44021928025577, 246.87449273806439, 246.41259207061935, 248.2197191958802, 246.22106722148496, 246.34272176855697, 240.79479048597759, 243.91099700115095, 245.33007394593494, 247.51231914014542, 236.4830305933657, 240.17707440196619, 236.2641349639828, 244.67200125471268, 247.50285028417264, 243.63956567054257, 245.89059383628799, 247.10919630763675, 245.53553512655165, 245.10279942826278, 236.32530781519819, 246.79165280999516, 233.99648833264644, 245.88461810922942, 236.35204181532276, 247.65829954562957, 241.46919489879372, 234.09104831096755, 241.0500126906727, 246.76557661876976, 241.3789217577953, 243.65232981170726, 241.82210499717272, 241.36782299060326, 241.51424517425522]
- Clearly, many of these are below the sum-of-squares values we find when we run the data on our actual data
- That suggests that the actual data is not particularly indicative of a deviation from a single constant, any more than any random arrangement of the data.
- This result surprises me somewhat, as my eye clealry things the effect is there. I will need to think further as to whether this result suggests that there really is no statistically significant effect, or if this method of measuring statistical significance is not really valid. More thought on this is needed...
Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use