Week 8
Testing Independence of Molecules against Target / Date Recorded
- Previously, I had created code that plotted equivalent widths / airmasses of particular air molecules ( H\alpha, H\beta, H2O, O2) for all dates or all stars to see if those air molecules are independent of the star or the date of observation, as expected.
- The code did not account for the different gratings through which the molecule eq widths were measured, so I made some adjustments to separate the files based on gratings to see if the independence of molecules against stars/dates was stronger.
- The code still outputs the best linear fit through the lines, r^2 of lin fit, and covariance coeffs:
Independence of O2 against Star:
Independence of H20 Against Star:
Independence of O2 Against Date-Obs :
Independence of H/beta against Date-Obs
Independence of H/alpha against Date-Obs
- There are definitely some significant outliers that appear to be skewing the data, which is interesting because I have already sigma clipped the data through five iterations, maybe need more iterations
- also, I think the linear fit program I use tries not to use zero as a potential slope, so I can look into a linear fit that allows a zero slope (potentially)
Fixing Sigma Clipping, Linear Fit (p-values)
- I tried increasing iterations to get rid of noticeable outliers (like the one in the chart below); the outlier in the top right corner remained the same. Also tried keeping default settings where the data keeps getting sigma clipped until data converges
- Below is when I sigma-clipped for a max of 100 iterations, and the outlier still stayed
- Instead of doing 5 - sigma clipping, for this particular data set, I changed it to 4 sigma clipping which noticeably removed the outlier in the top right corner.
- However, we see that the linear fit for the data does not pass through the points, so something was wrong with my linear fit
- Found out that the linear fit I was using fitted the model through the clipped data points (since I was using a masked array) so I copied the elements that were not masked over into a new array (this is very inefficient in running time, can ask Eske about better ways to plot masked arrays / fitting through masked arrays when he gets back). So, for the same plot, got a flatter, better fit
- While I was figuring out what was wrong with the linear fit I came across a built-in p-value attribute into the scipy.stats.linregress that I'm using, which gives you the p-value for the null hypothesis that the slope is 0 for the data points. Took the p-values and said if they were greater than 0.05, the null hypothesis is most probably true.
- Printed out the p-values for the linear fits and what conclusions can be made from them:
\
- Rerunning the independence tests, the slopes look much more horizontal, and the p-value confirms that most slopes are not statistically significant enough to not be 0
Chi-2
chi2 histogram of wavelength solution, where trying to convert from pixels to wavelength; separately for the different types of grating
Parangle
Parangle -> Parangle rainbow appear to the north;
, histogram peak where data is good, and start guiding criteria