05 Sep 2022

Spline Fitting

I reprocessed the data such that the spaces between neighboring data points are more consistent and then ran the spline through the data points
Image Added
I went back to using the univariate spline. Played around a bit with the attributes within univariate spline, especially the smoothing factor (s) and k, the degree of the smoothing spline
- Other attributes (like changing the number of knots), didn't have as large of an effect as the smoothing factor for some reason
For the graph above, only got two spline knot locations.
Found this useful website that explains splines and knots a bit more clear to me:
https://stats.stackexchange.com/questions/517375/splines-relationship-of-knots-degree-and-degrees-of-freedom

When s= 0, we get a nice spline fit, but it looks pretty choppy.
Image Added
but taking the automatic spline smoothing factor gives the first graph (where the spline is not a strong fit)

Image Added
Setting k = 5 automatically gave a much better fit, which is expected because polynomials (if the degree is high enough) can fit data points more accurately usually
Printing out the coefficients of the spline function, not 100% sure how to read it, and then also printing the locations of the knots

Applied the function to other sets of data, a bit too choppy
Image Added
I set the smoothing factor to 10 and then kept k at 10. I spaced out the points by taking every point that is at least 1/3rd of the greatest interval width in the data set, changed to 1/2

Image AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage Added

From these sample fits, I think the spline fit is more accurate than the original ones, although some may still need to be improved.

...

There are definitely some significant outliers that appear to be skewing the data, which is interesting because I have already sigma clipped the data through five iterations, maybe need more iterations
also, I think the linear fit program I use tries not to use zero as a potential slope, so I can look into a linear fit that allows a zero slope (potentially)

...

Rerunning the independence tests, the slopes look much more horizontal, and the p-value confirms that most slopes are not statistically significant enough to not be 0

Week 8

09/09/2022

Chi-2 quality histogram

...

looking at the provided chi2 values in the header of the data file based on the type of grating that was used to collect the data
If the values are much different than 1, we cannot take the file data
I extracted the chi2 values from all the different headers of the data file and created a histogram of them
One of the files had a chi2 value of 400+, which I removed because it was skewing the data. This file "spec_data_2022062800311.fits" was previously marked on the faulty_files.csv for having no O2 data, but we also see that its spectrum is messed up
honestly the provided chi-2 values look pretty high, next time I can see what the spectrum graphs look like / how the individual molecule graphs look like for the data files that have a chi2 value that is greater than a certain threshhold
also printed out all the chi2 values

Image AddedImage Added