High redshift galaxies

From Public PIC Wiki
Jump to navigation Jump to search

Version 1.0

Catalog provided by Rebecca and Pascal: https://euclid.roe.ac.uk/projects/puswg/wiki/Galaxies

It is a ".txt" file.

I remove the header and copy it into hdfs:

   tail -n+2 /cephfs/pic.es/astro/scratch/jcarrete/sandbox/Euclid/High_redshift_galaxies/Euclid_SC8_simcat_LBGs_DPL_16sqdegs_v1.txt | hdfs dfs -copyFromLocal - /user/jcarrete/data/euclid/high_z_galaxies/v1.0/Euclid_SC8_simcat_LBGs_DPL_16sqdegs_v1.csv

I create the external table:

   CREATE EXTERNAL TABLE jcarrete.euclid_sc8_simcat_lbgs_dpl_16_sqdegs_v1_0(
   id bigint COMMENT 'unique galaxy identifier',
   `type` string COMMENT 'type (LBG for all objects)',
   z float COMMENT 'Redshift (in the range [6-12])',
   abs_muv float COMMENT 'absolute magnitude at 1500 angstroms',
   sedname string COMMENT 'Spectral Energy Distribution name (among 76, 38 without UV lines, 38 with UV lines, corresponding to a given UV slope)',
   rhalf_kpc float COMMENT 'half light radius (in physical kpc)',
   rhalf_arcsec float COMMENT 'half light radius (in arcsec)',
   muv float COMMENT 'for information only, observed magnitude at rest-frame 1500',
   beta float COMMENT 'for information only, used to assign an SED'
   )
   ROW FORMAT SERDE 
     'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
   WITH SERDEPROPERTIES ( 
     'field.delim'=' ', 
     'serialization.format'=' ') 
   STORED AS INPUTFORMAT 
     'org.apache.hadoop.mapred.TextInputFormat' 
   OUTPUTFORMAT 
     'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   LOCATION
     'hdfs://AntNest/user/jcarrete/data/euclid/high_z_galaxies/v1.0' 
   ;

I create the ORC table:

   CREATE TABLE euclid_sc8_simcat_lbgs_dpl_16_sqdegs_v1_0_c(
   id bigint COMMENT 'unique galaxy identifier',
   `type` string COMMENT 'type (LBG for all objects)',
   z float COMMENT 'Redshift (in the range [6-12])',
   abs_muv float COMMENT 'absolute magnitude at 1500 angstroms',
   sedname string COMMENT 'Spectral Energy Distribution name (among 76, 38 without UV lines, 38 with UV lines, corresponding to a given UV slope)',
   rhalf_kpc float COMMENT 'half light radius (in physical kpc)',
   rhalf_arcsec float COMMENT 'half light radius (in arcsec)',
   muv float COMMENT 'for information only, observed magnitude at rest-frame 1500',
   beta float COMMENT 'for information only, used to assign an SED'
   )
   CLUSTERED BY ( 
     id) 
   SORTED BY ( 
     id ASC) 
   INTO 4 BUCKETS
   ROW FORMAT SERDE 
     'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
   STORED AS INPUTFORMAT 
     'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
   OUTPUTFORMAT 
     'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
   ;

Then I covert the data:

   INSERT OVERWRITE TABLE cosmohub. euclid_sc8_simcat_lbgs_dpl_16_sqdegs_v1_0_c
   SELECT * FROM jcarrete.euclid_sc8_simcat_lbgs_dpl_16_sqdegs_v1_0
   WHERE id IS NOT NULL;

Version 1.1

I decide to add a new column in order to have the sedname as a integer values rather than as a string value:

   CREATE TABLE euclid_sc8_simcat_lbgs_dpl_16_sqdegs_v1_1_c(
   id bigint COMMENT 'unique galaxy identifier',
   `type` string COMMENT 'type (LBG for all objects)',
   z float COMMENT 'Redshift (in the range [6-12])',
   abs_muv float COMMENT 'absolute magnitude at 1500 angstroms',
   sedname string COMMENT 'Spectral Energy Distribution name (among 76, 38 without UV lines, 38 with UV lines, corresponding to a given UV slope)',
   sedname_int int COMMENT 'sedname as integer (from 0 to 75, from 0 to 37 without UV lines, from 38 to 75 with UV lines)',
   rhalf_kpc float COMMENT 'half light radius (in physical kpc)',
   rhalf_arcsec float COMMENT 'half light radius (in arcsec)',
   muv float COMMENT 'for information only, observed magnitude at rest-frame 1500',
   beta float COMMENT 'for information only, used to assign an SED'
   )
   CLUSTERED BY ( 
     id) 
   SORTED BY ( 
     id ASC) 
   INTO 4 BUCKETS
   ROW FORMAT SERDE 
     'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
   STORED AS INPUTFORMAT 
     'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
   OUTPUTFORMAT 
     'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
   ;

This is the command to convert sedname into sedname_int:

   INSERT OVERWRITE TABLE cosmohub.euclid_sc8_simcat_lbgs_dpl_16_sqdegs_v1_1_c
   SELECT id, `type`,z, abs_muv, sedname, 
   CASE WHEN sedname = 'beta_1.25.dat' THEN 0
   WHEN sedname = 'beta_1.30.dat' THEN 1
   WHEN sedname = 'beta_1.35.dat' THEN 2
   WHEN sedname = 'beta_1.40.dat' THEN 3
   WHEN sedname = 'beta_1.45.dat' THEN 4
   WHEN sedname = 'beta_1.50.dat' THEN 5
   WHEN sedname = 'beta_1.55.dat' THEN 6
   WHEN sedname = 'beta_1.60.dat' THEN 7
   WHEN sedname = 'beta_1.65.dat' THEN 8
   WHEN sedname = 'beta_1.70.dat' THEN 9
   WHEN sedname = 'beta_1.75.dat' THEN 10
   WHEN sedname = 'beta_1.80.dat' THEN 11
   WHEN sedname = 'beta_1.85.dat' THEN 12
   WHEN sedname = 'beta_1.90.dat' THEN 13
   WHEN sedname = 'beta_1.95.dat' THEN 14
   WHEN sedname = 'beta_2.00.dat' THEN 15
   WHEN sedname = 'beta_2.05.dat' THEN 16
   WHEN sedname = 'beta_2.10.dat' THEN 17
   WHEN sedname = 'beta_2.15.dat' THEN 18
   WHEN sedname = 'beta_2.20.dat' THEN 19
   WHEN sedname = 'beta_2.25.dat' THEN 20
   WHEN sedname = 'beta_2.30.dat' THEN 21
   WHEN sedname = 'beta_2.35.dat' THEN 22
   WHEN sedname = 'beta_2.40.dat' THEN 23
   WHEN sedname = 'beta_2.45.dat' THEN 24
   WHEN sedname = 'beta_2.50.dat' THEN 25
   WHEN sedname = 'beta_2.55.dat' THEN 26
   WHEN sedname = 'beta_2.60.dat' THEN 27
   WHEN sedname = 'beta_2.65.dat' THEN 28
   WHEN sedname = 'beta_2.70.dat' THEN 29
   WHEN sedname = 'beta_2.75.dat' THEN 30
   WHEN sedname = 'beta_2.80.dat' THEN 31
   WHEN sedname = 'beta_2.85.dat' THEN 32
   WHEN sedname = 'beta_2.90.dat' THEN 33
   WHEN sedname = 'beta_2.95.dat' THEN 34
   WHEN sedname = 'beta_3.00.dat' THEN 35
   WHEN sedname = 'beta_3.05.dat' THEN 36
   WHEN sedname = 'beta_3.10.dat' THEN 37
   WHEN sedname = 'beta_1.25_wUVLines.dat' THEN 38
   WHEN sedname = 'beta_1.30_wUVLines.dat' THEN 39
   WHEN sedname = 'beta_1.35_wUVLines.dat' THEN 40
   WHEN sedname = 'beta_1.40_wUVLines.dat' THEN 41
   WHEN sedname = 'beta_1.45_wUVLines.dat' THEN 42
   WHEN sedname = 'beta_1.50_wUVLines.dat' THEN 43
   WHEN sedname = 'beta_1.55_wUVLines.dat' THEN 44
   WHEN sedname = 'beta_1.60_wUVLines.dat' THEN 45
   WHEN sedname = 'beta_1.65_wUVLines.dat' THEN 46
   WHEN sedname = 'beta_1.70_wUVLines.dat' THEN 47
   WHEN sedname = 'beta_1.75_wUVLines.dat' THEN 48
   WHEN sedname = 'beta_1.80_wUVLines.dat' THEN 49
   WHEN sedname = 'beta_1.85_wUVLines.dat' THEN 50
   WHEN sedname = 'beta_1.90_wUVLines.dat' THEN 51
   WHEN sedname = 'beta_1.95_wUVLines.dat' THEN 52
   WHEN sedname = 'beta_2.00_wUVLines.dat' THEN 53
   WHEN sedname = 'beta_2.05_wUVLines.dat' THEN 54
   WHEN sedname = 'beta_2.10_wUVLines.dat' THEN 55
   WHEN sedname = 'beta_2.15_wUVLines.dat' THEN 56
   WHEN sedname = 'beta_2.20_wUVLines.dat' THEN 57
   WHEN sedname = 'beta_2.25_wUVLines.dat' THEN 58
   WHEN sedname = 'beta_2.30_wUVLines.dat' THEN 59
   WHEN sedname = 'beta_2.35_wUVLines.dat' THEN 60
   WHEN sedname = 'beta_2.40_wUVLines.dat' THEN 61
   WHEN sedname = 'beta_2.45_wUVLines.dat' THEN 62
   WHEN sedname = 'beta_2.50_wUVLines.dat' THEN 63
   WHEN sedname = 'beta_2.55_wUVLines.dat' THEN 64
   WHEN sedname = 'beta_2.60_wUVLines.dat' THEN 65
   WHEN sedname = 'beta_2.65_wUVLines.dat' THEN 66
   WHEN sedname = 'beta_2.70_wUVLines.dat' THEN 67
   WHEN sedname = 'beta_2.75_wUVLines.dat' THEN 68
   WHEN sedname = 'beta_2.80_wUVLines.dat' THEN 69
   WHEN sedname = 'beta_2.85_wUVLines.dat' THEN 70
   WHEN sedname = 'beta_2.90_wUVLines.dat' THEN 71
   WHEN sedname = 'beta_2.95_wUVLines.dat' THEN 72
   WHEN sedname = 'beta_3.00_wUVLines.dat' THEN 73
   WHEN sedname = 'beta_3.05_wUVLines.dat' THEN 74
   WHEN sedname = 'beta_3.10_wUVLines.dat' THEN 75
   END as sedname_int,
   rhalf_kpc, rhalf_arcsec, muv, beta
   FROM jcarrete.euclid_sc8_simcat_lbgs_dpl_16_sqdegs_v1_0
   WHERE id IS NOT NULL;
   ANALYZE TABLE cosmohub.euclid_sc8_simcat_lbgs_dpl_16_sqdegs_v1_1_c COMPUTE STATISTICS FOR COLUMNS;

Then I create the catalog in CosmoHub.