Spline Fitting
- I reprocessed the data such that the spaces between neighboring data points are more consistent and then ran the spline through the data points
- I went back to using the univariate spline. Played around a bit with the attributes within univariate spline, especially the smoothing factor (s) and k, the degree of the smoothing spline
- Other attributes (like changing the number of knots), didn't have as large of an effect as the smoothing factor for some reason
- For the graph above, only got two spline knot locations.
- Found this useful website that explains splines and knots a bit more clear to me:
- https://stats.stackexchange.com/questions/517375/splines-relationship-of-knots-degree-and-degrees-of-freedom
Changing Smoothing Factor
- When s= 0, we get a nice spline fit, but it looks pretty choppy.
- but taking the automatic spline smoothing factor gives the first graph (where the spline is not a strong fit)
Changing Degree of the Smoothing Spline (k)
- Setting k = 5 automatically gave a much better fit, which is expected because polynomials (if the degree is high enough) can fit data points more accurately usually
- Printing out the coefficients of the spline function, not 100% sure how to read it, and then also printing the locations of the knots
Applying to Other Datasets
- Applied the function to other sets of data, a bit too choppy
- I set the smoothing factor to 10 and then kept k at 10. I spaced out the points by taking every point that is at least 1/3rd of the greatest interval width in the data set, changed to 1/2
...
From these sample fits, I think the spline fit is more accurate than the original ones, although some may still need to be improved.
Week 8
Testing Independence of Molecules against Target / Date Recorded
- Previously, I had created code that plotted equivalent widths / airmasses of particular air molecules ( H\alpha, H\beta, H2O, O2) for all dates or all stars to see if those air molecules are independent of the star or the date of observation, as expected.
- The code did not account for the different gratings through which the molecule eq widths were measured, so I made some adjustments to separate the files based on gratings to see if the independence of molecules against stars/dates was stronger.
- The code still outputs the best linear fit through the lines, r^2 of lin fit, and covariance coeffs:
Independence of O2 against Star:
Independence of H20 Against Star:
Independence of O2 Against Date-Obs :
Independence of H/beta against Date-Obs
Independence of H/alpha against Date-Obs
- There are definitely some significant outliers that appear to be skewing the data, which is interesting because I have already sigma clipped the data through five iterations, maybe need more iterations
- also, I think the linear fit program I use tries not to use zero as a potential slope, so I can look into a linear fit that allows a zero slope (potentially)
Fixing Sigma Clipping, Linear Fit (p-values)
- I tried increasing iterations to get rid of noticeable outliers (like the one in the chart below); the outlier in the top right corner remained the same. Also tried keeping default settings where the data keeps getting sigma clipped until data converges
- Below is when I sigma-clipped for a max of 100 iterations, and the outlier still stayed
...
- Rerunning the independence tests, the slopes look much more horizontal, and the p-value confirms that most slopes are not statistically significant enough to not be 0
Chi-2
09/09/2022
Chi-2 quality histogram
looking at the provided chi2 values in the header of the data file based on the type of grating that was used to collect the data
If the values are much different than 1, we cannot take the file data
I extracted the chi2 values from all the different headers of the data file and created a histogram of them
One of the files had a chi2 value of 400+, which I removed because it was skewing the data. This file "spec_data_2022062800311.fits" was previously marked on the faulty_files.csv for having no O2 data, but we also see that its spectrum is messed up
honestly the provided chi-2 values look pretty high, next time I can see what the spectrum graphs look like / how the individual molecule graphs look like for the data files that have a chi2 value that is greater than a certain threshhold
also printed out all the chi2 values
...