How to upload multiple observations
In order to upload multiple observations at once you need to prepare two files. The first is a table containing the metadata about
each observation and the name of the spectrum to which it is associated
see the guide below (see a simple example
here). The second is a tarball containing the relevant spectral files.
The spectra should be formatted according to
the guide below. Additionally all the files should be either
located in the root directory of the tarball (a so called tar-bomb) or the subdirectory should be specified in the filename column of the
table file with the format 'subdirectory/name_of_spectrum_file.dat' or equivalent unix readable path.
How to prepare a spectrum file for upload
Currently we support only plain text file uploads to the database and we have the following requirements for the formatting of the file:
- In general lines prefaced with '#' are ignored as comments
- The file is required to have 3 columns separated by tabs
- The three columns need to be Wavelength, Flux, and Error
- The units of the columns should be Ångström and erg/s/cm2/Å respectively.
For bulk upload a prefactor for the flux such as 1e17, as well as other units for
flux_unit
, wavelength_unit
and lensing magnification lens_mag
can be specified in the table file.
Additionally it is possible to set the
flux_factor
,
flux_unit
,
wavelength_unit
, and lensing magnification
lens_mag
in the header of the spectral file by using the format:
#flux_factor=1e-17
and same for
flux_unit
,
wavelength_unit
, and
lens_mag
.
Also note that for the unit keywords any unit specification that is parseable by astropy.units will work. If these are not set the LASD
pipeline will assume
flux_factor = 1
,
flux_unit= erg / (s cm2 AA)
,
wavelength_unit=AA
, and
lens_mag=1
.
Note that spectra should be corrected for Milky Way foreground extinction by the user before being uploaded to the database!
How to prepare a table file for upload
When doing a bulk upload all the information on each object needs to be specified in a machine-readable table file. You can find a simple example
here.
The file needs to be tab-separated and the first line should contain the following keywords:
-
-
Filename
-       - name of the associated spectrum file in the tarball
-
Instrument
-       - name of theinstrument with which the observations were made
-
Resolution
-       - Approximate spectral resolution
-
Redshift
-       - Redshift of the source
-
Redshift_error
-       - Estimated uncertainty on the redshift
-
z_approx/system
-       - Whether the redshift is an estimate (e.g., based on the Lya peak) or a measurement of the systemic redshift (from another line).
   Boolean - Needs to be either 'True' or 'False'.
-
Ra
-       - Right ascension of the source. Preferred format: degrees
-
Dec
-       - Declination of the source. Preferred format: degrees
-
Allow_download
-       - Whether the raw spectrum is downloadable or not. Boolean - True or False
-
Bibcode
-       - Bibcode or URL from the ADS, for example: 2015A&A...583A..55O or http://adsabs.harvard.edu/abs/2016ApJ...828...49H.
-    
- The following keywords are also recognized but are optional:
-
Flux_factor
-       - A prefactor for the fluxvector in the spectrum.
       Example: 1e-17, Default: '1'
-
Flux_units
-       - Units of the fluxvector. Example (and default): 'erg / (s cm2 AA)'
-
Wavelength_units
-       - Units of the wavelength array. Default: 'AA'
-
Lens_mag
-       - Magnification factor from gravitational lensing. Default: 1
-
Name
-       - Name of the object.
Note that the header line is required but should
not be prefaced with a # symbol. Also note that for the unit keywords any unit specification that is parseable by astropy.units will work.
FAQ
What happens once I've uploaded the data?
The analysis for each spectrum consists of the following steps:
- continuum removal,
- redshift estimation, and
- computation of the spectral quantities.
We describe each of these steps in more detail in our paper.
In order to quantify the uncertainty of the computed spectral quantities, we repeat the calculation 100 times and in between 'shuffle' the spectrum. That is, we draw a new flux in each bin from a Gaussian with mean and standard deviation being the reported flux and error, respectively. We then repeat we repeat the redshift estimation process, and if the systemic redshift (and uncertainty) is given by the user, draw a new redshift from a Gaussian defined by these values.
Apart from the data analysis, LASD will also go \& get the full bibtex entry from SAO/NASA ADS to make it easier for the downloader to cite the appropriate papers. We will also send out a tweet to twitter page to let the world know about the most recent upload :)
What if my Lya spectra are gravitationally lensed?
Just set the
lens_mag
parameter. This can be done in several ways.
- Directly in the spectral file by adding
#lens_mag=1
to the file header (replacing 1 with the actual magnification naturally).
- In the single upload form optional fields there is a field for this
- In the bulk_upload table file. One can add the optional table column
Lens_mag
to the table file.
We then divide out this magnification factor from the spectra during our analysis. So make sure that the spectra are as observed otherwise double corrections can happen.
How to download data
Press on "Download data" in the menu above. Then you have the choiice between downloading the spectra (i.e., the original upload if the uploader has allowed this) or the measurements created by LASD. Yes, it's that easy ;)
How are the downloaded datafiles formatted
When downloading spectra you will get a zip file containing all downloadable spectra from the database in ascii format as they were uploaded to us
When downloading measurements you also recieve a zipfile. This file contains a bibtex file with the original publishers of the included spectra, and
two whitespace(tab) delimited files called "zautodf.ascii" and "zsysdf.ascii". These files contain the LASD measurements based on LASD estimated
redshifts and user provided redshifts respectively.
These files are formatted in a way that is easily readable with for instance python using pandas (see examples below). The first line of the file
is a header row (not prefixed with #). The files contain all of the keywords defined in Measured Quantities.
For all quantities were we estimate errors (all except neg_peak_fraction and pos_peak_fraction) the quantity exists in 4 versions - one measured, and 25th, 50th and 75th percentiles.
In the latter 3 cases the keyword is appended with e.g. "_25percentile". For example "z_25percentile".
In addition to those keyword we also include
- R The user specified spectral resolution of the observation
- SFR The user specified star formation rate of the system (if not given this value is nan)
- SFR_method The user specified measurement method of the star formation rate
- Spectral_file The name of the spectral file from which the measurement was made (can be used to correlate measurements with downloaded spectral files)
How to attribute
A lot of work went into building planning \& executing the observations as well as building this website.
When using data of LASD in your publication, we therefore kindly ask you to
- cite the original observational papers of the data you used. For this, we provide the full bibtex entries in the download.
- cite our paper Runnholm et al. (2020)
- and add a footnote to the LASD URL "http://lasd.lyman-alpha.com".
Thank you!
Working with the data in Python
Here we provide some simple examples of how to work with the downloaded data using python.
import pandas as pd
df = pd.read_csv('zsysdf.ascii', delim_whitespace=True)
This will return a pandas dataframe object with all galaxies for which there are systematic redshifts.
Simple filtering
import pandas as pd
df = pd.read_csv('zsysdf.ascii', delim_whitespace=True)
# Select subset based on Lya Equivalent width
lya_emitter_subset = df[df['EW'] >= 20]
Select double peak galaxies
import pandas as pd
df = pd.read_csv('zestdf.ascii', delim_whitespace=True)
# Select galaxies where significant flux found on blue side in 95% of MC iterations
dbpeak = (df.neg_peak_fraction > 0.95 ) & (df.pos_peak_fraction > 0.95)
# Select subset from the database
lya_emitter_subset = df[dbpeak]
I found a bug! / I have an idea for improvements!
Great! :) Please report it to us using email, slack, or (preferred method) our issue tracker.
Database Details
For a more in depth description of the database structure, included alorithms and measured quantities see our presentation paper:
Runnholm et al 2020 (in prep)
Measured quantities
Each of the quantities is computed 100 times from a "shuffled" spectrum, that is, a spectrum where the fluxes
are re-drawn taking the error, the spectral resolution, and the redshift uncertainty into account.
This allows us to compute the uncertainties of each measurment which we display as 16th, 50th, and 84th percentile
of the computed distribution. Furthermore, we also add a "Measured value" which is the measurement of the original,
i.e., unmodified spectrum.
We also perform this analysis once using the detected redshift (see ... on how the detection is performed), and
(if available) using the systemic redshift given at upload.
Below is a table of all our measured quantities.
Variable name |
Description |
Units |
Dx_max |
Peak separation between maximum luminosity densities. |
km/s |
Dx_mean |
Peak separation between first moments of both sides. |
km/s |
EW |
Equivalent width of line. Value of -1000000 is a flag for undetected continuum level. |
Å |
FWHM_max |
Full-width at half maximum of highest peak. |
km/s |
FWHM_neg |
Full-width at half maximum of blue side. |
km/s |
FWHM_pos |
Full-width at half maximum of red side. |
km/s |
F_cont |
Level of continuum. |
erg/s/(km/s) |
F_lc |
Luminosity density at line center. |
erg/s/(km/s) |
F_max |
Luminosity density of highest peak. |
erg/s/(km/s) |
F_neg_max |
Luminosity density of highest peak on blue side. |
erg/s/(km/s) |
F_pos_max |
Luminosity density of highest peak on red side. |
erg/s/(km/s) |
F_valley |
Luminosity density of minimum between peaks. |
erg/s/(km/s) |
L_neg |
Luminosity of blue side. |
erg/s |
L_pos |
Luminosity of red side. |
erg/s |
L_tot |
Total luminosity. |
erg/s |
R_F_cut_neg |
Ratio of maximum luminosity density and peak detection threshold on blue side. |
|
R_F_cut_pos |
Ratio of maximum luminosity density and peak detection threshold on blue side. |
|
R_F_lc_max |
Ratio of luminosity density at line center and maximum peak height. |
|
R_F_pos_neg |
Ratio of luminosity density at red and blue peak. |
|
R_F_valley_max |
Ratio of luminosity density in the `valley` between the peaks and the maximum peak. |
|
R_L_cut_neg |
Ratio of blueward luminosity and peak detection threshold. |
|
R_L_cut_pos |
Ratio of redward luminosity and peak detection threshold. |
|
R_L_pos_neg |
Ratio of redward over blueward luminosity. |
|
W_std |
Square-root of second moment of whole spectrum. |
km/s |
W_neg_std |
Blue peak width as measured by square-root of second moment. |
km/s |
W_pos_std |
Red peak width as measured by square-root of second moment. |
km/s |
neg_peak_fraction |
Fraction of times a blue peak was detected. |
|
pos_peak_fraction |
Fraction of times a red peak was detected. |
|
skew |
Pearson's moment coefficient of skewness of whole spectrum. |
|
skew_neg |
Pearson's moment coefficient of skewness of blue side. |
|
skew_pos |
Pearson's moment coefficient of skewness of red side. |
|
x_max |
Highest peak position determined by maximum luminosity density. |
km/s |
x_mean |
First moment of spectrum. |
km/s |
x_neg_max |
Peak position determined by maximum luminosity density on blue side. |
km/s |
x_neg_mean |
Peak position determined by weighted mean on blue side. |
km/s |
x_pos_max |
Peak position determined by maximum luminosity density on red side. |
km/s |
x_pos_mean |
Peak position determined by weighted mean on red side. |
km/s |
x_valley |
Position of `valley` between the peaks. |
km/s |
z |
Systemic redshift of source. |
|