16 Aug 2022 Tuesday

File Reorganization

Have a lot of jupyter notebooks that are getting a bit hard to keep track of and getting cluttered. Went through each one and made a directory of them with a description of what each file does
README.md

Worked on file "Continuum_Reduction_Ind.ipynb" to create a function that fits the spline function on the spectrum of the fits file. Made the function callable from other python notebooks
Created file "Continuum_Reduction_All.ipynb" that fits a spline function all the fits files and graphs them. Here are some sample spline function graphs:

...

Some of the spline curves don't look like the best fit; I need to do some more reading about spline curves to see if I'm doing the right spline fit (there's quite a few types of spline fitting like bivariate splines etc. that I still need to explore).
I can also get the coefficients of the spline fit, but I'm not sure what they represent. Can ask Stubbs / Eske more about spline fitting.

...

Working in Continuum_Reduction_Ind.ipynb
Met with Chris and he suggested that I take a few points (around 10) for each graph to have better spline fits that pass through data points
I am trying to automate the process of choosing points for spline fit
Looking into the scipy.interpolate.UnivariateSpline, especially the w attribute, which allows me to weigh the points differently in a separate array (so higher weighing points are more likely to be fitted). Trying to devise a math function to fit the data points such that a higher density of points (more clustered) are weighed less than the singular points ( like inflection points )
- Since data is already organized into buckets (grouped by a range of 5 to reduce variability), I thought if more data points are in a bucket, then the resulting averaged data point should have less weight.
- I tried using a math function weight = 1/(bucket length) which would give higher weight to smaller bucket sizes, but the spline didn't look good:
- Tried weight = 1/ pow(x, bucket_length) for diff values where 1 <= x <= 2, and the higher orders did not look very good either (ex. using x = 1.5 (left) and x = 2 (right))
- Maybe a better math function would work
To do next time: choose points by hand and see how the fit goes, once fit is good, try to design an algorithm around the points already in buckets

I reprocessed the data such that the spaces between neighboring data points are more consistent and then ran the spline through the data points
Image Removed
I went back to using the univariate spline. Played around a bit with the attributes within univariate spline, especially the smoothing factor (s) and k, the degree of the smoothing spline
- Other attributes (like changing the number of knots), didn't have as large of an effect as the smoothing factor for some reason
For the graph above, only got two spline knot locations.
Found this useful website that explains splines and knots a bit more clear to me:
https://stats.stackexchange.com/questions/517375/splines-relationship-of-knots-degree-and-degrees-of-freedom

When s= 0, we get a nice spline fit, but it looks pretty choppy.
Image Removed
but taking the automatic spline smoothing factor gives the first graph (where the spline is not a strong fit)

Image Removed
Setting k = 5 automatically gave a much better fit, which is expected because polynomials (if the degree is high enough) can fit data points more accurately usually
Printing out the coefficients of the spline function, not 100% sure how to read it, and then also printing the locations of the knots

Applied the function to other sets of data, a bit too choppy
Image Removed
I set the smoothing factor to 10 and then kept k at 10. I spaced out the points by taking every point that is at least 1/3rd of the greatest interval width in the data set, changed to 1/2