Spline Fitting
- I reprocessed the data such that the spaces between neighboring data points are more consistent and then ran the spline through the data points
- I went back to using the univariate spline. Played around a bit with the attributes within univariate spline, especially the smoothing factor (s) and k, the degree of the smoothing spline
- Other attributes (like changing the number of knots), didn't have as large of an effect as the smoothing factor for some reason
- For the graph above, only got two spline knot locations.
- Found this useful website that explains splines and knots a bit more clear to me:
- https://stats.stackexchange.com/questions/517375/splines-relationship-of-knots-degree-and-degrees-of-freedom
Changing Smoothing Factor
- When s= 0, we get a nice spline fit, but it looks pretty choppy.
- but taking the automatic spline smoothing factor gives the first graph (where the spline is not a strong fit)
Changing Degree of the Smoothing Spline (k)
- Setting k = 5 automatically gave a much better fit, which is expected because polynomials (if the degree is high enough) can fit data points more accurately usually
- Printing out the coefficients of the spline function, not 100% sure how to read it, and then also printing the locations of the knots
Applying to Other Datasets
- Applied the function to other sets of data, a bit too choppy
- I set the smoothing factor to 10 and then kept k at 10. I spaced out the points by taking every point that is at least 1/3rd of the greatest interval width in the data set, changed to 1/2
From these sample fits, I think the spline fit is more accurate than the original ones, although some may still need to be improved.
Week 8
Testing Independence of Molecules against Target / Date Recorded
...
- There are definitely some significant outliers that appear to be skewing the data, which is interesting because I have already sigma clipped the data through five iterations, maybe need more iterations
- also, I think the linear fit program I use tries not to use zero as a potential slope, so I can look into a linear fit that allows a zero slope (potentially)
Fixing Sigma Clipping, Linear Fit (p-values)
...
- Rerunning the independence tests, the slopes look much more horizontal, and the p-value confirms that most slopes are not statistically significant enough to not be 0
Chi-2
09/09/2022
Chi-2 quality histogram
...
looking at the provided chi2 values in the header of the data file based on the type of grating that was used to collect the data
If the values are much different than 1, we cannot take the file data
I extracted the chi2 values from all the different headers of the data file and created a histogram of them
One of the files had a chi2 value of 400+, which I removed because it was skewing the data. This file "spec_data_2022062800311.fits" was previously marked on the faulty_files.csv for having no O2 data, but we also see that its spectrum is messed up
honestly the provided chi-2 values look pretty high, next time I can see what the spectrum graphs look like / how the individual molecule graphs look like for the data files that have a chi2 value that is greater than a certain threshhold
also printed out all the chi2 values
Printed chi2 values
Parangle -> Parangle rainbow appear to the north;
, histogram peak where data is good, and start guiding criteria