Uploading data

How to upload a single observation

Uploading to LASD is easy! To upload a single spectrum go to http://lasd.lyman-alpha.com/upload/single
You will need to prepare a file with the spectrum according to How to prepare the spectrum file for upload and fill out the input fields provided. The spectral file should a plain text file with three (tab or space seperated) columns of wavelength, flux, and error in units of Ångström and erg/s/cm2/Å, respectively (more details).

If you want to upload multiple spectra at once, see "How to upload multiple observations"

How to upload multiple observations

In order to upload multiple observations at once you need to prepare two files. The first is a table containing the metadata about each observation and the name of the spectrum to which it is associated see the guide below (see a simple example here). The second is a tarball containing the relevant spectral files.

The spectra should be formatted according to the guide below. Additionally all the files should be either located in the root directory of the tarball (a so called tar-bomb) or the subdirectory should be specified in the filename column of the table file with the format 'subdirectory/name_of_spectrum_file.dat' or equivalent unix readable path.

How to prepare a spectrum file for upload

Currently we support only plain text file uploads to the database and we have the following requirements for the formatting of the file:
  1. In general lines prefaced with '#' are ignored as comments
  2. The file is required to have 3 columns separated by tabs
  3. The three columns need to be Wavelength, Flux, and Error
  4. The units of the columns should be Ångström and erg/s/cm2/Å respectively. For bulk upload a prefactor for the flux such as 1e17, as well as other units for flux_unit, wavelength_unit and lensing magnification lens_mag can be specified in the table file.
Additionally it is possible to set the flux_factor, flux_unit, wavelength_unit, and lensing magnification lens_magin the header of the spectral file by using the format: #flux_factor=1e-17 and same for flux_unit, wavelength_unit, and lens_mag. Also note that for the unit keywords any unit specification that is parseable by astropy.units will work. If these are not set the LASD pipeline will assume flux_factor = 1, flux_unit= erg / (s cm2 AA), wavelength_unit=AA, and lens_mag=1.

Note that spectra should be corrected for Milky Way foreground extinction by the user before being uploaded to the database!

How to prepare a table file for upload

When doing a bulk upload all the information on each object needs to be specified in a machine-readable table file. You can find a simple example here. The file needs to be tab-separated and the first line should contain the following keywords:
  • Filename

          - name of the associated spectrum file in the tarball


          - name of theinstrument with which the observations were made


          - Approximate spectral resolution


          - Redshift of the source


          - Estimated uncertainty on the redshift


          - Whether the redshift is an estimate (e.g., based on the Lya peak) or a measurement of the systemic redshift (from another line).
       Boolean - Needs to be either 'True' or 'False'.


          - Right ascension of the source. Preferred format: degrees


          - Declination of the source. Preferred format: degrees


          - Whether the raw spectrum is downloadable or not. Boolean - True or False


          - Bibcode or URL from the ADS, for example: 2015A&A...583A..55O or http://adsabs.harvard.edu/abs/2016ApJ...828...49H.
  • The following keywords are also recognized but are optional:


          - A prefactor for the fluxvector in the spectrum.
           Example: 1e-17, Default: '1'


          - Units of the fluxvector. Example (and default): 'erg / (s cm2 AA)'


          - Units of the wavelength array. Default: 'AA'


          - Magnification factor from gravitational lensing. Default: 1


          - Name of the object.
Note that the header line is required but should not be prefaced with a # symbol. Also note that for the unit keywords any unit specification that is parseable by astropy.units will work.


What happens once I've uploaded the data?

The analysis for each spectrum consists of the following steps:
  1. continuum removal,
  2. redshift estimation, and
  3. computation of the spectral quantities.

We describe each of these steps in more detail in our paper.

In order to quantify the uncertainty of the computed spectral quantities, we repeat the calculation 100 times and in between 'shuffle' the spectrum. That is, we draw a new flux in each bin from a Gaussian with mean and standard deviation being the reported flux and error, respectively. We then repeat we repeat the redshift estimation process, and if the systemic redshift (and uncertainty) is given by the user, draw a new redshift from a Gaussian defined by these values.

Apart from the data analysis, LASD will also go \& get the full bibtex entry from SAO/NASA ADS to make it easier for the downloader to cite the appropriate papers. We will also send out a tweet to twitter page to let the world know about the most recent upload :)

What if my Lya spectra are gravitationally lensed?

Just set the lens_mag parameter. This can be done in several ways.
  1. Directly in the spectral file by adding #lens_mag=1 to the file header (replacing 1 with the actual magnification naturally).
  2. In the single upload form optional fields there is a field for this
  3. In the bulk_upload table file. One can add the optional table column Lens_mag to the table file.
We then divide out this magnification factor from the spectra during our analysis. So make sure that the spectra are as observed otherwise double corrections can happen.

How to download data

Press on "Download data" in the menu above. Then you have the choiice between downloading the spectra (i.e., the original upload if the uploader has allowed this) or the measurements created by LASD. Yes, it's that easy ;)

How are the downloaded datafiles formatted

When downloading spectra you will get a zip file containing all downloadable spectra from the database in ascii format as they were uploaded to us

When downloading measurements you also recieve a zipfile. This file contains a bibtex file with the original publishers of the included spectra, and two whitespace(tab) delimited files called "zautodf.ascii" and "zsysdf.ascii". These files contain the LASD measurements based on LASD estimated redshifts and user provided redshifts respectively.

These files are formatted in a way that is easily readable with for instance python using pandas (see examples below). The first line of the file is a header row (not prefixed with #). The files contain all of the keywords defined in Measured Quantities. For all quantities were we estimate errors (all except neg_peak_fraction and pos_peak_fraction) the quantity exists in 4 versions - one measured, and 25th, 50th and 75th percentiles. In the latter 3 cases the keyword is appended with e.g. "_25percentile". For example "z_25percentile".

In addition to those keyword we also include

  • R The user specified spectral resolution of the observation
  • SFR The user specified star formation rate of the system (if not given this value is nan)
  • SFR_method The user specified measurement method of the star formation rate
  • Spectral_file The name of the spectral file from which the measurement was made (can be used to correlate measurements with downloaded spectral files)

How to attribute

A lot of work went into building planning \& executing the observations as well as building this website. When using data of LASD in your publication, we therefore kindly ask you to

  1. cite the original observational papers of the data you used. For this, we provide the full bibtex entries in the download.
  2. cite our paper Runnholm et al. (2020)
  3. and add a footnote to the LASD URL "http://lasd.lyman-alpha.com".
Thank you!

How to contact us

If you have any question or want to provide feedback, you can send us an email, join our slack channel, or ping us on twitter. We're looking forward to hearing from you! :)

Working with the data in Python

Here we provide some simple examples of how to work with the downloaded data using python.

        import pandas as pd 
        df = pd.read_csv('zsysdf.ascii', delim_whitespace=True)

This will return a pandas dataframe object with all galaxies for which there are systematic redshifts.

Simple filtering

        import pandas as pd 
        df = pd.read_csv('zsysdf.ascii', delim_whitespace=True)
        # Select subset based on Lya Equivalent width
        lya_emitter_subset = df[df['EW'] >= 20]

Select double peak galaxies

        import pandas as pd 
        df = pd.read_csv('zestdf.ascii', delim_whitespace=True)
        # Select galaxies where significant flux found on blue side in 95% of MC iterations
        dbpeak = (df.neg_peak_fraction > 0.95 ) & (df.pos_peak_fraction > 0.95)
        # Select subset from the database
        lya_emitter_subset = df[dbpeak]

I found a bug! / I have an idea for improvements!

Great! :) Please report it to us using email, slack, or (preferred method) our issue tracker.

Database Details

For a more in depth description of the database structure, included alorithms and measured quantities see our presentation paper: Runnholm et al 2020 (in prep)

Measured quantities

Each of the quantities is computed 100 times from a "shuffled" spectrum, that is, a spectrum where the fluxes are re-drawn taking the error, the spectral resolution, and the redshift uncertainty into account. This allows us to compute the uncertainties of each measurment which we display as 16th, 50th, and 84th percentile of the computed distribution. Furthermore, we also add a "Measured value" which is the measurement of the original, i.e., unmodified spectrum.
We also perform this analysis once using the detected redshift (see ... on how the detection is performed), and (if available) using the systemic redshift given at upload.

Below is a table of all our measured quantities.
Variable name Description Units
Dx_max Peak separation between maximum luminosity densities. km/s
Dx_mean Peak separation between first moments of both sides. km/s
EW Equivalent width of line. Value of -1000000 is a flag for undetected continuum level. Å
FWHM_max Full-width at half maximum of highest peak. km/s
FWHM_neg Full-width at half maximum of blue side. km/s
FWHM_pos Full-width at half maximum of red side. km/s
F_cont Level of continuum. erg/s/(km/s)
F_lc Luminosity density at line center. erg/s/(km/s)
F_max Luminosity density of highest peak. erg/s/(km/s)
F_neg_max Luminosity density of highest peak on blue side. erg/s/(km/s)
F_pos_max Luminosity density of highest peak on red side. erg/s/(km/s)
F_valley Luminosity density of minimum between peaks. erg/s/(km/s)
L_neg Luminosity of blue side. erg/s
L_pos Luminosity of red side. erg/s
L_tot Total luminosity. erg/s
R_F_cut_neg Ratio of maximum luminosity density and peak detection threshold on blue side.
R_F_cut_pos Ratio of maximum luminosity density and peak detection threshold on blue side.
R_F_lc_max Ratio of luminosity density at line center and maximum peak height.
R_F_pos_neg Ratio of luminosity density at red and blue peak.
R_F_valley_max Ratio of luminosity density in the `valley` between the peaks and the maximum peak.
R_L_cut_neg Ratio of blueward luminosity and peak detection threshold.
R_L_cut_pos Ratio of redward luminosity and peak detection threshold.
R_L_pos_neg Ratio of redward over blueward luminosity.
W_std Square-root of second moment of whole spectrum. km/s
W_neg_std Blue peak width as measured by square-root of second moment. km/s
W_pos_std Red peak width as measured by square-root of second moment. km/s
neg_peak_fraction Fraction of times a blue peak was detected.
pos_peak_fraction Fraction of times a red peak was detected.
skew Pearson's moment coefficient of skewness of whole spectrum.
skew_neg Pearson's moment coefficient of skewness of blue side.
skew_pos Pearson's moment coefficient of skewness of red side.
x_max Highest peak position determined by maximum luminosity density. km/s
x_mean First moment of spectrum. km/s
x_neg_max Peak position determined by maximum luminosity density on blue side. km/s
x_neg_mean Peak position determined by weighted mean on blue side. km/s
x_pos_max Peak position determined by maximum luminosity density on red side. km/s
x_pos_mean Peak position determined by weighted mean on red side. km/s
x_valley Position of `valley` between the peaks. km/s
z Systemic redshift of source.