PS water vapor analysis

PS water vapor analysis, Nov 2014

First update, 2015 Jan 12, Tyler St. Germaine

I have written a code in Python that runs through a directory of fits files, isolates the spectrum of Polaris, and performs PWV analysis on it.

Sample of Images and Spectra

This is a set of what some of the typical spectra look like in the data set:

I consider this a "good" image. Spectrum is there in full, and in focus. Note that I have the absorption features highlighted as well (O2 A, O2 B, and four H2O features). This fits image is median-subtracted, meaning I just took the raw median of the whole image and subtracted it.

This is an image where the filter wheel has the spectrum "chopped up", making it not useful for analysis of absorption features.

I'm not sure what's happening in images like these. It seems the spectrum just gets cut off halfway, not due to running off the CCD.

Also a little unsure of what happens here. The background sky seems to be drowning out the starlight, perhaps it's getting too close to sunrise? It also looks like there may be a cloud?

When the code runs through all the images, it throws out data that looks like the bottom three cases above, and keeps only the "good" images.

Finding the absorption features

As seen in the spectrum in the case of the "good" image, after isolating the entire Polaris spectrum, the code singles out the six points on the spectrum corresponding to the following features:

Feature	Wavelength (nm)
O2 B	683
O2 A	762
H2O	723
H2O	822
H2O	909
H2O	946

To convert the spectrum from pixels to nm, it finds the O2A and O2B features, uses those two as known wavelengths, and calibrates the rest of the spectrum given those two points.

To-do, as of 1/12/15

Measure equivalent widths of the six features, analyze over time
My analysis so far has only been on a subset of all the data, due to the difficulty of not having access to the external drive on my mac. To do: get another external drive to reformat the original drive, allowing me to analyze all the data at once
Cosmic rays may throw off the way my code analyzes the images. Matt recommended I use cosmics.py to simply remove them from the image before analysis.
Temperature data is stored in the headers of the files. Investigate how the quality of the image changes versus time or temperature (e.g. focal length)

Update 25 Jan 2015, Tyler St. Germaine

Cosmics.py does not work well with slitless spectrograph images

Cosmics.py is a python module based on the LA Cosmic algorithm. I downloaded it from here. It just looks for hot pixels and throws them out, while making sure saturated stars do not get thrown out.

Perhaps predictably it does not work well with these images, where the objects are dispersed over many pixels. Example:

It's a little hard to see, but some pixels within the Polaris spectrum are removed entirely, which obviously throws off the analysis.

Fitting the continuum

I use the python class LSQUnivaraiteSpline to fit a spline to the continuum of the spectrum. The knots are controlled manually so that they do not lie on the absorption features (right now they only avoid the six features described above, but I should in the future add more features to avoid such as those near 625 and 650 nm). This continuum is then divided out to give extinction vs wavelength.

Left: blue is true spectrum, red is spline fit, green highlights absorption features. Right: spectrum divided by spline fit, minus 1.

There's an arbitrary factor in the spline fit, the degree of the smoothing spline, an integer from 1-5. I chose a fit of 3. This value is a compromise between basically having a gaussian fit (value of 1), and a spline that perfectly fits the spectrum noise (value of 5). I may look into how this number impacts the measurements of equivalent width.

Measurements of equivalent width

I used the textbook definition of equivalent width:

Eq Width = ∫ 1 - ( F / F_continuum) dλ

integrated over the entire feature, where F is the true spectrum and F_continuum is the spline fit. This equation comes down to integrating over (negative) the feature in the extinction curve above.

This is a plot of the equivalent widths of the 762nm OxA and 822nm water features measured over a sample of ~60 images over a three month period. The blue is OxA and the red is water.

The many zero values of water are due to my code not exactly finding the absorption feature correctly, something I will fix. But it's a start. I can include the widths of all features, but for this plot I decided only to put these two.

To do 1/25/15

Still need to get another external drive so that I may analyze all the data, instead of just this subset
I still would like to look at temperature data over time as well

Quick update 1/27/15

Somehow, some way, the PS1 PWV data disk can now be read and written on my Mac. I don't really understand why (I'm only using a different cord)....

I should be able to try to analyze much larger sets of the data in the next few days.

Update 2/17/15

Temperature Analysis

In the headers of these files there are some temperature data. There are two sets of data:

"T-TEMPS", telescope temperature sensors (5 numbers)
"CCD-TEMP", the CCD temperature

I am not sure the physical meaning of the numbers. The 5 numbers in T-TEMPS consist of one number in the 22-24 range and the other four in the 2 to 10 range. The CCD-TEMP number is exactly -69.5 for every single image.

I wrote a quick code that ran through most of the whole data set and extracted only the header data, as well as just the FWHM of the brightest column in the sky. The purpose is to study correlations between the FWHM of the object (how in-focus the telescope is) and the telescope temperature.

Unfortunately, it seems after a couple months, the telescope temperature data did not get record. I.e., the T-TEMPS data field does not exist in any files past July 2011.

I still made a plot for the two months of data I had available:

I plotted FWHM vs two of the T-TEMPS numbers (the other three are very similar to the bottom plot) for all images from 20110610 to 20110815.

It is very difficult for me to see a correlation. It is especially unfortunate that only the first two months of data had these numbers recorded since most of those images are in focus anyway.

Update 4/2/15

Beginning analysis of entire set

I let the code run continuously for a few days, to see how many of the files it could analyze and to see if any errors need to be fixed in the code.

For each day of data (~1000 images), it analyzes every image and looks for a complete spectrum of Polaris. If a full spectrum is found, it goes through all the steps previously described to extract the equivalent widths of the six features (Ox A, Ox B, and 4 water features). For each day, it writes a text file with each equivalent width. It then calculates the average equivalent width for each feature over the entire day, along with a purely statistical uncertainty, and keeps that info in a separate text file.

Below I plot the average equivalent widths of the six features over a ~three month period (took three days of computation on my Mac).

The two plots are over the same time period, I simply divided the six lines into two plots for clarity (Ox A is in both for reference).

I think a promising sign from these plots is the correlation in the water lines. While the oxygen lines stay constant over time, not only do the four H2O lines vary significantly over time, but they generally vary in the same way. This makes me think the code is doing a good job at extracting the equivalent widths without too much noise.

A few tweaks to do before applying the code to the entire set:

Only ~20% of the images have a complete, "good" spectrum with all features in the CCD. Is there a way to increase this number? I.e., can we make the "chopped" or cut-off spectra described in previous entries useful?
Decrease noise by improving how the algorithm calculates dλ/dpixel. It currently calculates dλ/dpixel by finding Ox A and Ox B in pixel space, and using their known wavelengths. I have confidence that the code finds Ox A very well, but about ~10% of the time it fails to find Ox B. Matt gave a few suggestions for improving how to find Ox B (e.g. cross correlation with typical "good" spectra) that I can quickly implement to improve efficiency.
Compare three different methods for finding equivalent width of a certain line:

1. My current method (using a spline as the continuum; probably the most accurate but computationally intensive)
2. fit a sloped line using the spectrum just before and just after the feature, and use that line as the continuum
3. fit a flat line, i.e. same as above except simply use median of neighboring points. Computationally quick but probably less accurate since some water features are at high slope.

Update 4/6/15

FWHM of all files

I collected all header info and FWHM of every file in the set (sans darks and flats). Below is a scatter plot of FWHM vs date for all files:

(The two plots are the same, just different y-scales). The bottom plot shows that there is still plenty of good data even towards the end, though the top plot shows that the number of out-of-focus images does increase with time. There may even be some periodicity in the FWHM going out of focus.

I was not sure how to best display such large datasets (~800k data points here), so if there are any suggestions feel free to let me know.

I will start analyzing the equivalent widths of all files tonight.