Difference between revisions of "PAUS+COSMOS photoz"
(One intermediate revision by the same user not shown) | |||
Line 314: | Line 314: | ||
[jcarrete@data photoz]$ hdfs dfs -copyFromLocal /cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.4/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.4.parquet /user/cosmohub/value_added_data/paudm/photoz/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.4.parquet | [jcarrete@data photoz]$ hdfs dfs -copyFromLocal /cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.4/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.4.parquet /user/cosmohub/value_added_data/paudm/photoz/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.4.parquet | ||
− | |||
Latest revision as of 12:20, 9 November 2020
Version 0.2
Ingesting Alex catalog used in his paper:
https://pausurvey.org/wiki/index.php/Main_Page/Papers/BayesEvZ
There is an email with subject: 'Catalogo photo-z PAUS+COSMOS para cosmohub' where the iteration with Alex is described.
The external table is created by Pau using the "standard" script for parquet files:
CREATE EXTERNAL TABLE `tallada.pau_cosmos_photoz_v0_2`( `ref_id` bigint, `i_auto` double, `photoz` double, `ra` double, `dec` double, `nbands` bigint, `id_laigle` bigint, `zspec_mean` double, `zspec_std` double, `best_model` double, `best_extlaw` bigint, `best_continuum` double, `best_ebv` double, `best_type` double, `muv` double, `mu` double, `mr` double, `mi` double, `flux_oii` double, `flux_hbeta` double, `flux_oiii` double, `flux_halpha` double, `flux_err_oii` double, `flux_err_hbeta` double, `flux_err_oiii` double, `flux_err_halpha` double, `line_coverage_oii` double, `line_coverage_hbeta` double, `line_coverage_oiii` double, `line_coverage_halpha` double) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://AntNest/user/tallada/data/pau/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.2' ;
I create the internal table in CosmoHub:
CREATE TABLE `pau_cosmos_photoz_v0_2_c`( `ref_id` bigint COMMENT 'PAUdm reference ID', `i_auto` double COMMENT 'Auto i-band magnitude from Ilbert+09 cosmos photo-z catalog', `photoz` double COMMENT 'Photo-z, defined as the mode of the p(z) of each object obtained from the templates', `ra` double COMMENT 'Right ascention', `dec` double COMMENT 'Declination', `nbands` bigint COMMENT 'Number of bands used in the photo-z code', `id_laigle` bigint COMMENT 'COSMOS2015 (Laigle+15) reference ID', `zspec_mean` double COMMENT 'Average spectroscopic redshift from several public redshift surveys (when available)', `zspec_std` double COMMENT 'Standard deviation spectroscopic redshift from several public redshift surveys (when available)', `best_model` double COMMENT 'Best model in the photo-z code', `best_extlaw` bigint COMMENT 'Best extinction law in the photo-z code (0:No extinction; 1:Prevot; 2:Calzetti; 3:Calzetti+Bump1; 4:Calzetti+Bump2', `best_continuum` double COMMENT 'Best continuum template group (see Table 1 in Alarcon et.al)', `best_ebv` double COMMENT 'Best extinction E(B-V) value (from 0 to 0.5, in steps of 0.05)', `best_type` double COMMENT 'Best galaxy type (0: red, 1:green, 2:blue)', `muv` double COMMENT 'Absolute magnitude Galex NUV band. Corrected for internal galaxy extinction', `mu` double COMMENT 'Absolute magnitude CFHT U band. Corrected for internal galaxy extinction', `mr` double COMMENT 'Absolute magnitude Subaru R band. Corrected for internal galaxy extinction', `mi` double COMMENT 'Absolute magnitude Subaru I band. Corrected for internal galaxy extinction', `flux_oii` double COMMENT 'Flux for the OII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_hbeta` double COMMENT 'Flux for the Hbeta line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_oiii` double COMMENT 'Flux for the OIII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_halpha` double COMMENT 'Flux for the Halpha line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_oii` double COMMENT 'Flux error for the OII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_hbeta` double COMMENT 'Flux error for the Hbeta line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_oiii` double COMMENT 'Flux error for the OIII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_halpha` double COMMENT 'Flux error for the Halpha line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `line_coverage_oii` double COMMENT 'Maximum coverage of the OII flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged', `line_coverage_hbeta` double COMMENT 'Maximum coverage of the Hbeta flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged', `line_coverage_oiii` double COMMENT 'Maximum coverage of the OIII flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged', `line_coverage_halpha` double COMMENT 'Maximum coverage of the Halpha flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged' ) CLUSTERED BY ( ref_id ) SORTED BY ( ref_id ASC ) INTO 4 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' ;
INSERT OVERWRITE TABLE cosmohub.pau_cosmos_photoz_v0_2_c SELECT * FROM tallada.pau_cosmos_photoz_v0_2;
ANALYZE TABLE cosmohub.pau_cosmos_photoz_v0_2_c COMPUTE STATISTICS FOR COLUMNS;
Copy the P(z) file into the value_added_data path:
[jcarrete@data photoz]$ hdfs dfs -copyFromLocal /cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.2/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.2.parquet /user/cosmohub/value_added_data/paudm/photoz/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.2.parquet [jcarrete@data photoz]$ hdfs dfs -copyFromLocal /cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.2/PAU_COSMOS_photoz_catalog_PIC_PZ_v0.2_column_description.csv /user/cosmohub/value_added_data/paudm/photoz/PAU_COSMOS_photoz_catalog_PIC_PZ_v0.2_column_description.csv
Version 0.3
Data are in:
/cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.3
Copy to hdfs path:
hdfs dfs -copyFromLocal /cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.3/PAU_COSMOS_photoz_catalog_PIC_v0.3.parquet /user/jcarrete/data/paudm/paus_cosmos_photoz_v0.3/
Create external table:
CREATE EXTERNAL TABLE jcarrete.pau_cosmos_photoz_v0_3_p ( `ref_id` BIGINT, `I_auto` DOUBLE, `photoz` DOUBLE, `ra` DOUBLE, `dec` DOUBLE, `nbands` BIGINT, `id_laigle` BIGINT, `zspec_mean` DOUBLE, `zspec_std` DOUBLE, `best_model` DOUBLE, `best_extlaw` BIGINT, `best_continuum` DOUBLE, `best_ebv` DOUBLE, `best_type` DOUBLE, `MUV` DOUBLE, `MU` DOUBLE, `MR` DOUBLE, `MI` DOUBLE, `flux_OII` DOUBLE, `flux_Hbeta` DOUBLE, `flux_OIII` DOUBLE, `flux_Halpha` DOUBLE, `flux_err_OII` DOUBLE, `flux_err_Hbeta` DOUBLE, `flux_err_OIII` DOUBLE, `flux_err_Halpha` DOUBLE, `line_coverage_OII` DOUBLE, `line_coverage_Hbeta` DOUBLE, `line_coverage_OIII` DOUBLE, `line_coverage_Halpha` DOUBLE ) STORED AS PARQUET LOCATION '/user/jcarrete/data/paudm/paus_cosmos_photoz_v0.3' ;
CREATE TABLE cosmohub.pau_cosmos_photoz_v0_3_c( `ref_id` bigint COMMENT 'PAUdm reference ID', `i_auto` double COMMENT 'Auto i-band magnitude from Ilbert+09 cosmos photo-z catalog', `photoz` double COMMENT 'Photo-z, defined as the mode of the p(z) of each object obtained from the templates', `ra` double COMMENT 'Right ascention', `dec` double COMMENT 'Declination', `nbands` bigint COMMENT 'Number of bands used in the photo-z code', `id_laigle` bigint COMMENT 'COSMOS2015 (Laigle+15) reference ID', `zspec_mean` double COMMENT 'Average spectroscopic redshift from several public redshift surveys (when available)', `zspec_std` double COMMENT 'Standard deviation spectroscopic redshift from several public redshift surveys (when available)', `best_model` double COMMENT 'Best model in the photo-z code', `best_extlaw` bigint COMMENT 'Best extinction law in the photo-z code (0:No extinction\; 1:Prevot\; 2:Calzetti\; 3:Calzetti+Bump1\; 4:Calzetti+Bump2', `best_continuum` double COMMENT 'Best continuum template group (see Table 1 in Alarcon et.al)', `best_ebv` double COMMENT 'Best extinction E(B-V) value (from 0 to 0.5, in steps of 0.05)', `best_type` double COMMENT 'Best galaxy type (0: red, 1:green, 2:blue)', `muv` double COMMENT 'Absolute magnitude Galex NUV band. Corrected for internal galaxy extinction', `mu` double COMMENT 'Absolute magnitude CFHT U band. Corrected for internal galaxy extinction', `mr` double COMMENT 'Absolute magnitude Subaru R band. Corrected for internal galaxy extinction', `mi` double COMMENT 'Absolute magnitude Subaru I band. Corrected for internal galaxy extinction', `flux_oii` double COMMENT 'Flux for the OII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_hbeta` double COMMENT 'Flux for the Hbeta line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_oiii` double COMMENT 'Flux for the OIII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_halpha` double COMMENT 'Flux for the Halpha line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_oii` double COMMENT 'Flux error for the OII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_hbeta` double COMMENT 'Flux error for the Hbeta line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_oiii` double COMMENT 'Flux error for the OIII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_halpha` double COMMENT 'Flux error for the Halpha line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `line_coverage_oii` double COMMENT 'Maximum coverage of the OII flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged', `line_coverage_hbeta` double COMMENT 'Maximum coverage of the Hbeta flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged', `line_coverage_oiii` double COMMENT 'Maximum coverage of the OIII flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged', `line_coverage_halpha` double COMMENT 'Maximum coverage of the Halpha flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged') CLUSTERED BY ( ref_id) SORTED BY ( ref_id ASC) INTO 4 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' ;
INSERT OVERWRITE TABLE cosmohub.pau_cosmos_photoz_v0_3_c SELECT * FROM jcarrete.pau_cosmos_photoz_v0_3_p;
ANALYZE TABLE cosmohub.pau_cosmos_photoz_v0_3_c COMPUTE STATISTICS FOR COLUMNS;
[jcarrete@data photoz]$ hdfs dfs -copyFromLocal /cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.3/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.3.parquet /user/cosmohub/value_added_data/paudm/photoz/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.3.parquet
Version 0.4
Data are in:
/cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.4
Copy to hdfs path:
hdfs dfs -copyFromLocal /cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.4/PAU_COSMOS_photoz_catalog_PIC_v0.4.parquet /user/jcarrete/data/paudm/paus_cosmos_photoz_v0.4/
[jcarrete@data ~]$ /software/astro/scripts/create_table_from_parquet.sh -p /hdfs/user/jcarrete/data/paudm/paus_cosmos_photoz_v0.4/PAU_COSMOS_photoz_catalog_PIC_v0.4.parquet -t jcarrete.pau_cosmos_photoz_v0_4_p -e
CREATE EXTERNAL TABLE jcarrete.pau_cosmos_photoz_v0_4_p ( `ref_id` BIGINT, `I_auto` DOUBLE, `photoz` DOUBLE, `ra` DOUBLE, `dec` DOUBLE, `nbands` BIGINT, `id_laigle` BIGINT, `zspec_mean` DOUBLE, `zspec_std` DOUBLE, `best_model` DOUBLE, `best_extlaw` BIGINT, `best_continuum` DOUBLE, `best_ebv` DOUBLE, `best_type` DOUBLE, `MUV` DOUBLE, `MU` DOUBLE, `MR` DOUBLE, `MI` DOUBLE, `flux_OII` DOUBLE, `flux_Hbeta` DOUBLE, `flux_OIII` DOUBLE, `flux_Halpha` DOUBLE, `flux_err_OII` DOUBLE, `flux_err_Hbeta` DOUBLE, `flux_err_OIII` DOUBLE, `flux_err_Halpha` DOUBLE, `line_coverage_OII` DOUBLE, `line_coverage_Hbeta` DOUBLE, `line_coverage_OIII` DOUBLE, `line_coverage_Halpha` DOUBLE ) STORED AS PARQUET LOCATION '/user/jcarrete/data/paudm/paus_cosmos_photoz_v0.4' ;
CREATE TABLE cosmohub.pau_cosmos_photoz_v0_4_c( `ref_id` bigint COMMENT 'PAUdm reference ID', `i_auto` double COMMENT 'Auto i-band magnitude from Ilbert+09 cosmos photo-z catalog', `photoz` double COMMENT 'Photo-z, defined as the mode of the p(z) of each object obtained from the templates', `ra` double COMMENT 'Right ascention', `dec` double COMMENT 'Declination', `nbands` bigint COMMENT 'Number of bands used in the photo-z code', `id_laigle` bigint COMMENT 'COSMOS2015 (Laigle+15) reference ID', `zspec_mean` double COMMENT 'Average spectroscopic redshift from several public redshift surveys (when available)', `zspec_std` double COMMENT 'Standard deviation spectroscopic redshift from several public redshift surveys (when available)', `best_model` double COMMENT 'Best model in the photo-z code', `best_extlaw` bigint COMMENT 'Best extinction law in the photo-z code (0:No extinction\; 1:Prevot\; 2:Calzetti\; 3:Calzetti+Bump1\; 4:Calzetti+Bump2', `best_continuum` double COMMENT 'Best continuum template group (see Table 1 in Alarcon et.al)', `best_ebv` double COMMENT 'Best extinction E(B-V) value (from 0 to 0.5, in steps of 0.05)', `best_type` double COMMENT 'Best galaxy type (0: red, 1:green, 2:blue)', `muv` double COMMENT 'Absolute magnitude Galex NUV band. Corrected for internal galaxy extinction', `mu` double COMMENT 'Absolute magnitude CFHT U band. Corrected for internal galaxy extinction', `mr` double COMMENT 'Absolute magnitude Subaru R band. Corrected for internal galaxy extinction', `mi` double COMMENT 'Absolute magnitude Subaru I band. Corrected for internal galaxy extinction', `flux_oii` double COMMENT 'Flux for the OII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_hbeta` double COMMENT 'Flux for the Hbeta line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_oiii` double COMMENT 'Flux for the OIII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_halpha` double COMMENT 'Flux for the Halpha line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_oii` double COMMENT 'Flux error for the OII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_hbeta` double COMMENT 'Flux error for the Hbeta line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_oiii` double COMMENT 'Flux error for the OIII line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `flux_err_halpha` double COMMENT 'Flux error for the Halpha line template in units of 1e-17 erg/s/cm2. Corrected for internal galaxy extinction', `line_coverage_oii` double COMMENT 'Maximum coverage of the OII flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged', `line_coverage_hbeta` double COMMENT 'Maximum coverage of the Hbeta flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged', `line_coverage_oiii` double COMMENT 'Maximum coverage of the OIII flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged', `line_coverage_halpha` double COMMENT 'Maximum coverage of the Halpha flux line. Range is [0,1], with 0 being poorly covered and 1 great coverage. Measurements with poor coverage are unreliable and should be flagged') CLUSTERED BY ( ref_id) SORTED BY ( ref_id ASC) INTO 4 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' ;
INSERT OVERWRITE TABLE cosmohub.pau_cosmos_photoz_v0_4_c SELECT * FROM jcarrete.pau_cosmos_photoz_v0_4_p;
ANALYZE TABLE cosmohub.pau_cosmos_photoz_v0_4_c COMPUTE STATISTICS FOR COLUMNS;
[jcarrete@data photoz]$ hdfs dfs -copyFromLocal /cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.4/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.4.parquet /user/cosmohub/value_added_data/paudm/photoz/PAU_COSMOS_photoz_catalog_PZ_PIC_v0.4.parquet
Version 0.4 FLUX DATA
Complementary data for v0.4. Alex sent us a catalog with the calibrated flux used to estimate the photometric redshifts:
/cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.4/PAU_COSMOS_fluxes_from_photoz_catalog_PIC_v0.4.parquet
I copy the files in the hdfs:
hdfs dfs -mkdir /user/jcarrete/data/paudm/paus_cosmos_photoz_v0.4_photometry hdfs dfs -copyFromLocal /cephfs/pic.es/astro/scratch/alarcon/paucosmos_photoz_catalogs/v0.4/PAU_COSMOS_fluxes_from_photoz_catalog_PIC_v0.4.parquet /user/jcarrete/data/paudm/paus_cosmos_photoz_v0.4_photometry/PAU_COSMOS_fluxes_from_photoz_catalog_PIC_v0.4.parquet
I create the external table using the standard script:
/software/astro/scripts/create_table_from_parquet.sh -p /hdfs/user/jcarrete/data/paudm/paus_cosmos_photoz_v0.4_photometry/PAU_COSMOS_fluxes_from_photoz_catalog_PIC_v0.4.parquet -t jcarrete.pau_cosmos_photoz_v0_4_photometry_pq -e
Creating the table from beeline:
CREATE EXTERNAL TABLE jcarrete.pau_cosmos_photoz_v0_4_photometry_pq ( `flux` DOUBLE, `ref_id` BIGINT, `flux_err` DOUBLE, `band` STRING ) STORED AS PARQUET LOCATION '/user/jcarrete/data/paudm/paus_cosmos_photoz_v0.4_photometry' ;
I create the table in the cosmohub schema:
CREATE TABLE cosmohub.pau_cosmos_photoz_v0_4_photometry_c ( `ref_id` bigint COMMENT 'unique identifier for reference catalogue', `band` string COMMENT 'filter', `flux` double COMMENT 'flux', `flux_err` double COMMENT 'error of the flux' ) CLUSTERED BY ( ref_id) SORTED BY ( ref_id ASC) INTO 4 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' ;
Insert overwrite:
INSERT OVERWRITE TABLE cosmohub.pau_cosmos_photoz_v0_4_photometry_c SELECT ref_id, band, flux, flux_err FROM jcarrete.pau_cosmos_photoz_v0_4_photometry_pq ;
Statistics:
ANALYZE TABLE cosmohub.pau_cosmos_photoz_v0_4_photometry_c COMPUTE STATISTICS FOR COLUMNS;