Week 8 (09/05 - 09/10)

 

Spline Fitting

  • I reprocessed the data such that the spaces between neighboring data points are more consistent and then ran the spline through the data points
  • I went back to using the univariate spline. Played around a bit with the attributes within univariate spline, especially the smoothing factor (s) and k, the degree of the smoothing spline
    • Other attributes (like changing the number of knots), didn't have as large of an effect as the smoothing factor for some reason
  • For the graph above, only got two spline knot locations.
  • Found this useful website that explains splines and knots a bit more clear to me:
  • https://stats.stackexchange.com/questions/517375/splines-relationship-of-knots-degree-and-degrees-of-freedom 

Changing Smoothing Factor

  • When s= 0, we get a nice spline fit, but it looks pretty choppy.
  • but taking the automatic spline smoothing factor gives the first graph (where the spline is not a strong fit)

Changing Degree of the Smoothing Spline (k)

  • Setting k = 5 automatically gave a much better fit, which is expected because polynomials (if the degree is high enough) can fit data points more accurately usually
  • Printing out the coefficients of the spline function, not 100% sure how to read it, and then also printing the locations of the knots

Applying to Other Datasets

  • Applied the function to other sets of data, a bit too choppy
  • I set the smoothing factor to 10 and then kept k at 10. I spaced out the points by taking every point that is at least 1/3rd of the greatest interval width in the data set, changed to 1/2

From these sample fits, I think the spline fit is more accurate than the original ones, although some may still need to be improved.

Week 8

 

Testing Independence of Molecules against Target / Date Recorded

  • Previously, I had created code that plotted equivalent widths / airmasses of particular air molecules ( H\alpha, H\beta, H2O, O2) for all dates or all stars to see if those air molecules are independent of the star or the date of observation, as expected.
  • The code did not account for the different gratings through which the molecule eq widths were measured, so I made some adjustments to separate the files based on gratings to see if the independence of molecules against stars/dates was stronger.
  • The code still outputs the best linear fit through the lines, r^2 of lin fit, and covariance coeffs:

Independence of O2 against Star:

Independence of H20 Against Star:

Independence of O2 Against Date-Obs :

Independence of H/beta against Date-Obs


Independence of H/alpha against Date-Obs

  • There are definitely some significant outliers that appear to be skewing the data, which is interesting because I have already sigma clipped the data through five iterations, maybe need more iterations
  • also, I think the linear fit program I use tries not to use zero as a potential slope, so I can look into a linear fit that allows a zero slope (potentially)

 

Fixing Sigma Clipping, Linear Fit (p-values)

  • I tried increasing iterations to get rid of noticeable outliers (like the one in the chart below); the outlier in the top right corner remained the same. Also tried keeping default settings where the data keeps getting sigma clipped until data converges
  • Below is when I sigma-clipped for a max of 100 iterations, and the outlier still stayed

  • Instead of doing 5 - sigma clipping, for this particular data set, I changed it to 4 sigma clipping which noticeably removed the outlier in the top right corner.
  • However, we see that the linear fit for the data does not pass through the points, so something was wrong with my linear fit

  • Found out that the linear fit I was using fitted the model through the clipped data points (since I was using a masked array) so I copied the elements that were not masked over into a new array (this is very inefficient in running time, can ask Eske about better ways to plot masked arrays / fitting through masked arrays when he gets back). So, for the same plot, got a flatter, better fit

  • While I was figuring out what was wrong with the linear fit I came across a built-in p-value attribute into the scipy.stats.linregress that I'm using, which gives you the p-value for the null hypothesis that the slope is 0 for the data points. Took the p-values and said if they were greater than 0.05, the null hypothesis is most probably true.
  • Printed out the p-values for the linear fits and what conclusions can be made from them:

\

  • Rerunning the independence tests, the slopes look much more horizontal, and the p-value confirms that most slopes are not statistically significant enough to not be 0



Chi-2

Week 8

09/09/2022

Chi-2 quality histogram

  • looking at the provided chi2 values in the header of the data file based on the type of grating that was used to collect the data

  • If the values are much different than 1, we cannot take the file data

  • I extracted the chi2 values from all the different headers of the data file and created a histogram of them

  • One of the files had a chi2 value of 400+, which I removed because it was skewing the data. This file "spec_data_2022062800311.fits" was previously marked on the faulty_files.csv for having no O2 data, but we also see that its spectrum is messed up

  • honestly the provided chi-2 values look pretty high, next time I can see what the spectrum graphs look like / how the individual molecule graphs look like for the data files that have a chi2 value that is greater than a certain threshhold

  • also printed out all the chi2 values

Printed chi2 values


Parangle -> Parangle rainbow appear to the north;

Parangle vs. time for observation , histogram peak where data is good, and start guiding criteria



Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use