Day 1-2 (07/11-12)

Day 1: Reviewed papers Chris sent, met with Chris and Eske on Zoom to discuss expectations, project outline on wiki page
Day 2: Set up FASRC account. Followed the "User Quick Start Guide" on FASRC page
- set up 2FA, FASRC VPN, terminal access, and watched a few data transfer videos
- Used rsync to load Eske's target directory /n/holystore01/LABS/stubbs_lab/Lab/Auxtel_data/spectrum_data onto computer
- Went into lab and met Eske, Ali, and Mark in person. Very cool people.
- Opened Jupyter notebook; looked up astropy documentation to open fits files
- Successfully opened fits file to view contents inside, retrieved and displayed table data, displayed file header information, and plotted data

Day 3 (07/13)

In Jupyter notebook, filtered all the fits files in '/spectrum_data' to print out star name and date/time of observation
Stored names of files in record type that are all associated with one star; looks like there are 4 observed stars in the dataset Eske gave me.

spec_data_2022062800336.fits HD205905 2022-06-29

Plotted H2O and O2 equivalent widths against air masses for each star on each night (four stars on four nights based on data Eske gave me)
Noticed that some files do not have O2 data (may have variants like O2(Z) or O2(B), though)

The equivalent widths of H20 seem to have much more variability than those of O2
May be some outliers in the data (particularly for the first two plots with the negative equivalent widths, may need to check on those data points

Met with Chris over zoom, got a few tasks to do:
- ObsId failiure mode notes
  
  three slide intro to other undergrades 5 minute lightning intro (Thurs)
- Plot H- alpha, H-beta- should have no dependence on airmass; for quality control
- Separate O2 lines (B and Z) lines
- perform five sigma clipping (probably in sci py or astropy to clean data); data trimming
  - five sigma clipping: remove outliers outside of five STDs of data
  - If in future there are uncertainties in equivalent widths to be reported, compare reported with experimental STDs
- Add error bars to plots, error bars represent underlying Gaussian distribution
- create linear fit through data
  - a * (airmass) + b * \sqrt{airmass} + c
    create a polynomial fit
- Create list of questions for data reduction
  - Where are the uncertainties in equivalent widths?
  - Why are there different O2 lines for some plots and not others?
- Keep track of stars and airmass span
- process through more data, contact Chris when delta airmass is greater than 1
Finished adding the H-alpha, H-beta lines, abstracted code to make extraction easier
Separated O2 lines in to the B, Z, Y types and plotted
Working on sigma_clipping
Working on masking the data and applying the mask to the x column as well

Successfully implemented five sigma clipping for all molecules and plots for each star and each night

successfully fit linear models to the data, can grab equations, and the R^2 value
Managed to fit the data to a single equation of the 1/2 order (a * x ^ (1/2) + b) but not (a * x ^ (1/2) + b * x + c)
attempting to fit the polynomial data, having some bugs fo fitting a polynomial to the 1/2 order and the first order
- bugs fixed, have (a * x ^ (1/2) + b * x + c) functions for each type of molecule's equivalent widths against airmass

Met with Chris and Eske to discuss new tasks for the week (more details can be found in personal .md file "Meeting 0722")
Continue working on the "Data_Clipping.ipynb" notebook to try to create a table that has a column of files and their missing information (missing data or negative eq widths)
Created a dictionary of files that stores the file name and an array with error messages → converted into an 2D array of filename and errors
Successfully created the table to output in Jupyter notebook, but having difficulties exporting an image of the data table, which would be nice to have. Tried the following:
from PIL import Image
import imgkit
import dataframe_image as dfi
which did not work even after I installed new packages
Here's what I have now:

Noted that I only have 187 files, I think Eske mentioned I should have access to a number of files in the 700 range. Will need to check with him on that.
Exported the data table as "faulty_files.csv" file that can be found below. I might not pursue any further the conversion of pandas data frame to image:
- faulty_files.csv
Cleaned up code (removed blocks of commented print statements, etc.)
Can look more into pandas to make the data table more readable (Maybe group by errors, or something else)