Week 8
Testing Independence of Molecules against Target / Date Recorded
...
- There are definitely some significant outliers that appear to be skewing the data, which is interesting because I have already sigma clipped the data through five iterations, maybe need more iterations
- also, I think the linear fit program I use tries not to use zero as a potential slope, so I can look into a linear fit that allows a zero slope (potentially)
Next time:
...
Fixing Sigma Clipping, Linear Fit (p-values)
- I tried increasing iterations to get rid of noticeable outliers (like the one in the chart below); the outlier in the top right corner remained the same. Also tried keeping default settings where the data keeps getting sigma clipped until data converges
- Below is when I sigma-clipped for a max of 100 iterations, and the outlier still stayed
- Instead of doing 5 - sigma clipping, for this particular data set, I changed it to 4 sigma clipping which noticeably removed the outlier in the top right corner.
- However, we see that the linear fit for the data does not pass through the points, so something was wrong with my linear fit
- Found out that the linear fit I was using fitted the model through the clipped data points (since I was using a masked array) so I copied the elements that were not masked over into a new array (this is very inefficient in running time, can ask Eske about better ways to plot masked arrays / fitting through masked arrays when he gets back). So, for the same plot, got a flatter, better fit
- While I was figuring out what was wrong with the linear fit I came across a built-in p-value attribute into the scipy.stats.linregress that I'm using, which gives you the p-value for the null hypothesis that the slope is 0 for the data points. Took the p-values and said if they were greater than 0.05, the null hypothesis is most probably true.
- Printed out the p-values for the linear fits and what conclusions can be made from them:
\
- Rerunning the independence tests, the slopes look much more horizontal, and the p-value confirms that most slopes are not statistically significant enough to not be 0
Chi-2
chi2 histogram of wavelength solution, where trying to convert from pixels to wavelength; separately for the different types of grating
Parangle
Parangle -> Parangle rainbow appear to the north;
, histogram peak where data is good, and start guiding criteria