Tuesday
File Reorganization
- Have a lot of jupyter notebooks that are getting a bit hard to keep track of and getting cluttered. Went through each one and made a directory of them with a description of what each file does
- README.md
Spline Fitting
- Worked on file "Continuum_Reduction_Ind.ipynb" to create a function that fits the spline function on the spectrum of the fits file. Made the function callable from other python notebooks
- Created file "Continuum_Reduction_All.ipynb" that fits a spline function all the fits files and graphs them. Here are some sample spline function graphs:
...
- Some of the spline curves don't look like the best fit; I need to do some more reading about spline curves to see if I'm doing the right spline fit (there's quite a few types of spline fitting like bivariate splines etc. that I still need to explore).
- I can also get the coefficients of the spline fit, but I'm not sure what they represent. Can ask Stubbs / Eske more about spline fitting.
-
...
Spline Fitting
- Working in Continuum_Reduction_Ind.ipynb
- Met with Chris and he suggested that I take a few points (around 10) for each graph to have better spline fits that pass through data points
- I am trying to automate the process of choosing points for spline fit
- Looking into the scipy.interpolate.UnivariateSpline, especially the w attribute, which allows me to weigh the points differently in a separate array (so higher weighing points are more likely to be fitted). Trying to devise a math function to fit the data points such that a higher density of points (more clustered) are weighed less than the singular points ( like inflection points )
- Since data is already organized into buckets (grouped by a range of 5 to reduce variability), I thought if more data points are in a bucket, then the resulting averaged data point should have less weight.
- I tried using a math function weight = 1/(bucket length) which would give higher weight to smaller bucket sizes, but the spline didn't look good:
- Tried weight = 1/ pow(x, bucket_length) for diff values where 1 <= x <= 2, and the higher orders did not look very good either (ex. using x = 1.5 (left) and x = 2 (right))
- Maybe a better math function would work
- To do next time: choose points by hand and see how the fit goes, once fit is good, try to design an algorithm around the points already in buckets
Spline Fitting
- I reprocessed the data such that the spaces between neighboring data points are more consistent and then ran the spline through the data points
- I went back to using the univariate spline. Played around a bit with the attributes within univariate spline, especially the smoothing factor (s) and k, the degree of the smoothing spline
- Other attributes (like changing the number of knots), didn't have as large of an effect as the smoothing factor for some reason
- For the graph above, only got two spline knot locations.
- Found this useful website that explains splines and knots a bit more clear to me:
- https://stats.stackexchange.com/questions/517375/splines-relationship-of-knots-degree-and-degrees-of-freedom
Changing Smoothing Factor
- When s= 0, we get a nice spline fit, but it looks pretty choppy.
- but taking the automatic spline smoothing factor gives the first graph (where the spline is not a strong fit)
Changing Degree of the Smoothing Spline (k)
- Setting k = 5 automatically gave a much better fit, which is expected because polynomials (if the degree is high enough) can fit data points more accurately usually
- Printing out the coefficients of the spline function, not 100% sure how to read it, and then also printing the locations of the knots
Applying to Other Datasets
- Applied the function to other sets of data, a bit too choppy
- I set the smoothing factor to 10 and then kept k at 10. I spaced out the points by taking every point that is at least 1/3rd of the greatest interval width in the data set, changed to 1/2
...