PS water vapor analysis

PS water vapor analysis, Nov 2014

First update, 2015 Jan 12, Tyler St. Germaine

I have written a code in Python that runs through a directory of fits files, isolates the spectrum of Polaris, and performs PWV analysis on it.

Sample of Images and Spectra

This is a set of what some of the typical spectra look like in the data set:

I consider this a "good" image. Spectrum is there in full, and in focus. Note that I have the absorption features highlighted as well (O2 A, O2 B, and four H2O features). This fits image is median-subtracted, meaning I just took the raw median of the whole image and subtracted it.

This is an image where the filter wheel has the spectrum "chopped up", making it not useful for analysis of absorption features.

I'm not sure what's happening in images like these. It seems the spectrum just gets cut off halfway, not due to running off the CCD.

Also a little unsure of what happens here. The background sky seems to be drowning out the starlight, perhaps it's getting too close to sunrise? It also looks like there may be a cloud?

When the code runs through all the images, it throws out data that looks like the bottom three cases above, and keeps only the "good" images.

Finding the absorption features

As seen in the spectrum in the case of the "good" image, after isolating the entire Polaris spectrum, the code singles out the six points on the spectrum corresponding to the following features:

Feature	Wavelength (nm)
O2 B	683
O2 A	762
H2O	723
H2O	822
H2O	909
H2O	946

To convert the spectrum from pixels to nm, it finds the O2A and O2B features, uses those two as known wavelengths, and calibrates the rest of the spectrum given those two points.

To-do, as of 1/12/15

Measure equivalent widths of the six features, analyze over time
My analysis so far has only been on a subset of all the data, due to the difficulty of not having access to the external drive on my mac. To do: get another external drive to reformat the original drive, allowing me to analyze all the data at once
Cosmic rays may throw off the way my code analyzes the images. Matt recommended I use cosmics.py to simply remove them from the image before analysis.
Temperature data is stored in the headers of the files. Investigate how the quality of the image changes versus time or temperature (e.g. focal length)

Update 25 Jan 2015, Tyler St. Germaine

Cosmics.py does not work well with slitless spectrograph images

Cosmics.py is a python module based on the LA Cosmic algorithm. I downloaded it from here. It just looks for hot pixels and throws them out, while making sure saturated stars do not get thrown out.

Perhaps predictably it does not work well with these images, where the objects are dispersed over many pixels. Example:

It's a little hard to see, but some pixels within the Polaris spectrum are removed entirely, which obviously throws off the analysis.

Fitting the continuum

I use the python class LSQUnivaraiteSpline to fit a spline to the continuum of the spectrum. The knots are controlled manually so that they do not lie on the absorption features (right now they only avoid the six features described above, but I should in the future add more features to avoid such as those near 625 and 650 nm). This continuum is then divided out to give extinction vs wavelength.

Left: blue is true spectrum, red is spline fit, green highlights absorption features. Right: spectrum divided by spline fit, minus 1.

There's an arbitrary factor in the spline fit, the degree of the smoothing spline, an integer from 1-5. I chose a fit of 3. This value is a compromise between basically having a gaussian fit (value of 1), and a spline that perfectly fits the spectrum noise (value of 5). I may look into how this number impacts the measurements of equivalent width.

Measurements of equivalent width

I used the textbook definition of equivalent width:

Eq Width = ∫ 1 - ( F / F_continuum) dλ

integrated over the entire feature, where F is the true spectrum and F_continuum is the spline fit. This equation comes down to integrating over (negative) the feature in the extinction curve above.

This is a plot of the equivalent widths of the 762nm OxA and 822nm water features measured over a sample of ~60 images over a three month period. The blue is OxA and the red is water.

The many zero values of water are due to my code not exactly finding the absorption feature correctly, something I will fix. But it's a start. I can include the widths of all features, but for this plot I decided only to put these two.

To do 1/25/15

Still need to get another external drive so that I may analyze all the data, instead of just this subset
I still would like to look at temperature data over time as well

Quick update 1/27/15

Somehow, some way, the PS1 PWV data disk can now be read and written on my Mac. I don't really understand why (I'm only using a different cord)....

I should be able to try to analyze much larger sets of the data in the next few days.

Update 2/17/15

Temperature Analysis

In the headers of these files there are some temperature data. There are two sets of data:

"T-TEMPS", telescope temperature sensors (5 numbers)
"CCD-TEMP", the CCD temperature

I am not sure the physical meaning of the numbers. The 5 numbers in T-TEMPS consist of one number in the 22-24 range and the other four in the 2 to 10 range. The CCD-TEMP number is exactly -69.5 for every single image.

I wrote a quick code that ran through most of the whole data set and extracted only the header data, as well as just the FWHM of the brightest column in the sky. The purpose is to study correlations between the FWHM of the object (how in-focus the telescope is) and the telescope temperature.

Unfortunately, it seems after a couple months, the telescope temperature data did not get record. I.e., the T-TEMPS data field does not exist in any files past July 2011.

I still made a plot for the two months of data I had available:

I plotted FWHM vs two of the T-TEMPS numbers (the other three are very similar to the bottom plot) for all images from 20110610 to 20110815.

It is very difficult for me to see a correlation. It is especially unfortunate that only the first two months of data had these numbers recorded since most of those images are in focus anyway.

Update 4/2/15

Beginning analysis of entire set

I let the code run continuously for a few days, to see how many of the files it could analyze and to see if any errors need to be fixed in the code.

For each day of data (~1000 images), it analyzes every image and looks for a complete spectrum of Polaris. If a full spectrum is found, it goes through all the steps previously described to extract the equivalent widths of the six features (Ox A, Ox B, and 4 water features). For each day, it writes a text file with each equivalent width. It then calculates the average equivalent width for each feature over the entire day, along with a purely statistical uncertainty, and keeps that info in a separate text file.

Below I plot the average equivalent widths of the six features over a ~three month period (took three days of computation on my Mac).

The two plots are over the same time period, I simply divided the six lines into two plots for clarity (Ox A is in both for reference).

I think a promising sign from these plots is the correlation in the water lines. While the oxygen lines stay constant over time, not only do the four H2O lines vary significantly over time, but they generally vary in the same way. This makes me think the code is doing a good job at extracting the equivalent widths without too much noise.

A few tweaks to do before applying the code to the entire set:

Only ~20% of the images have a complete, "good" spectrum with all features in the CCD. Is there a way to increase this number? I.e., can we make the "chopped" or cut-off spectra described in previous entries useful?
Decrease noise by improving how the algorithm calculates dλ/dpixel. It currently calculates dλ/dpixel by finding Ox A and Ox B in pixel space, and using their known wavelengths. I have confidence that the code finds Ox A very well, but about ~10% of the time it fails to find Ox B. Matt gave a few suggestions for improving how to find Ox B (e.g. cross correlation with typical "good" spectra) that I can quickly implement to improve efficiency.
Compare three different methods for finding equivalent width of a certain line:

1. My current method (using a spline as the continuum; probably the most accurate but computationally intensive)
2. fit a sloped line using the spectrum just before and just after the feature, and use that line as the continuum
3. fit a flat line, i.e. same as above except simply use median of neighboring points. Computationally quick but probably less accurate since some water features are at high slope.

Update 4/6/15

FWHM of all files

I collected all header info and FWHM of every file in the set (sans darks and flats). Below is a scatter plot of FWHM vs date for all files:

(The two plots are the same data, just different y-scales). The bottom plot shows that there is still plenty of good data even towards the end, though the top plot shows that the number of out-of-focus images does increase with time. There may even be some periodicity in the FWHM going out of focus.

I was not sure how to best display such large datasets (~800k data points here), so if there are any suggestions feel free to let me know.

I will start analyzing the equivalent widths of all files tonight.

Update 4/14/15

Filters

Each file has 5 different filter numbers that are stored both in the header and in the file name. Below are examples of the filters; all these images are within the same few minutes

Filter 0: no filter
Filter 1:
Filter 2
Filter 3
Filter 4

Filter distribution over all images:

In the first few months + last few months, the images alternate between all 5 filters. There is no date for most of 2012, and all the data in 2013 only uses filter 4.

To-do:

find out the wavelength cut-off for these filters, which will make the analysis much easier and computationally faster.
Analyze all images using filter 4, performing a polynomial fit on the four passes in each image

Equivalent widths for all filter 0 data

As shown above, the only data with no filters are the first few months and last few months. First I plot all the filter 0 data, to illustrate the gap in data. Then I follow with the data from the first few months and last few months individually.

Note: These data are averages over entire days, i.e. each data point is the average equivalent width of that line over all images in that day. There are associated statistical uncertainties, but for clarity I have not plotted them below. I will plot them in a different section.

In total:

First few months: this is mostly similar to the plot under the section "Beginning analysis of entire set"

Last few months: this is all new data.

The first thing I noticed is the behavior in all lines starting at the end of May 2014. The stable Ox A and Ox B lines sharply dip down, and oscillate. The water lines also begin to oscillate but do not dip down. I plan on looking into this by looking at the physical images and checking if my algorithm ran into an issue here.

Equivalent widths for filter 0 data, with statistical uncertainties

These are the same as the plots before, but now with error bars that represent the standard deviation associated with the average taken over every day.

First few months:

Last few months:

stubbslab

PS water vapor analysis

Analytics

PS water vapor analysis, Nov 2014

Related content