Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Some of the spline curves don't look like the best fit; I need to do some more reading about spline curves to see if I'm doing the right spline fit (there's quite a few types of spline fitting like bivariate splines etc. that I still need to explore).
  • I can also get the coefficients of the spline fit, but I'm not sure what they represent. Can ask Stubbs / Eske more about spline fitting.

-  -

Spline Fitting

  • Working in Continuum_Reduction_Ind.ipynb
  • Met with Chris and he suggested that I take a few points (around 10) for each graph to have better spline fits that pass through data points
  • I am trying to automate the process of choosing points for spline fit
  • Looking into the scipy.interpolate.UnivariateSpline, especially the w attribute, which allows me to weigh the points differently in a separate array (so higher weighing points are more likely to be fitted). Trying to devise a math function to fit the data points such that a higher density of points (more clustered) are weighed less than  the singular points ( like inflection points )
    • Since data is already organized into buckets (grouped by a range of 5 to reduce variability), I thought if more data points are in a bucket, then the resulting averaged data point should have less weight.
    • I tried using a math function weight = 1/(bucket length) which would give higher weight to smaller bucket sizes, but the spline didn't look good: 
    • Tried weight = 1/ pow(x, bucket_length) for diff values where 1 <= x <= 2, and the higher orders did not look very good either (ex. using x = 1.5 (left) and x = 2 (right))
    • Image ModifiedImage Modified
    • Maybe a better math function would work
  • To do next time: choose points by hand and see how the fit goes, once fit is good, try to design an algorithm around the points already in buckets

Spline Fitting

  • I reprocessed the data such that the spaces between neighboring data points are more consistent and then ran the spline through the data points
  • Image Added
  • I went back to using the univariate spline. Played around a bit with the attributes within univariate spline, especially the smoothing factor (s) and k, the degree of the smoothing spline
    • Other attributes (like changing the number of knots), didn't have as large of an effect as the smoothing factor for some reason
  • For the graph above, only got two spline knot locations.
  • Found this useful website that explains splines and knots a bit more clear to me:
  • https://stats.stackexchange.com/questions/517375/splines-relationship-of-knots-degree-and-degrees-of-freedom 

Changing Smoothing Factor

  • When s= 0, we get a nice spline fit, but it looks pretty choppy.
  • Image Added
  • but taking the automatic spline smoothing factor gives the first graph (where the spline is not a strong fit)

Changing Degree of the Smoothing Spline (k)

  • Image Added
  • Setting k = 5 automatically gave a much better fit, which is expected because polynomials (if the degree is high enough) can fit data points more accurately usually
  • Printing out the coefficients of the spline function, not 100% sure how to read it, and then also printing the locations of the knots

Applying to Other Datasets

  • Applied the function to other sets of data, a bit too choppy
  • Image Added
  • I set the smoothing factor to 10 and then kept k at 10. I spaced out the points by taking every point that is at least 1/3rd of the greatest interval width in the data set, changed to 1/2

Image AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage AddedImage Added

From these sample fits, I think the spline fit is more accurate than the original ones, although some may still need to be improved.