HowTo: Managing Tape File Families in dCache

From Public PIC Wiki
Jump to: navigation, search

File Families

  • file_family is a dCache tag which determines the file family associated with all files in this directory.
  • This is an Enstore specific flag.
  • Files are grouped on Enstore tape volumes according to the storage group and + file family attribute. A file family is a name that defines a category, or family, of data files.
  • Each experiment (i.e., each storage group) must carefully plan its set of file families.
  • There may be many file families configured; by design there is no pre-set upper limit on the number.
  • A given storage volume may only contain files belonging to one file family

Read File Family tag

  • This can be done in dCache in 3 ways:
  • Using the chimera command:
chimera readtag /pnfs/pic.es/data/atlas/ file_family
  • Reading tags with dot commands:
cd /pnfs/pic.es/data/atlas/; grep "" $(cat ".(tags)()")
  • Using Enstore commands:
. /usr/local/etc/setups.sh; cd /pnfs/pic.es/data/atlas/; enstore pnfs --tags 

Using a File Family in dCache

  • In order to be able to use a new File Family (at least new for the dCache system) it is necessary to add it to the PoolManager configuration.
  • There is a limited wildcard usage accepted by the PoolManger:
  • At PIC some VOs are taking advantage of wildcards, so if you're just adding a new file family to the following storage groups you don't need to do any change at the dCache level:
- vo-dteam
- vo-magic
- vo-lhcb
- vo-cms
- vo-pau
- vo-cta
  • For all other VOs / Storage Groups (i.e. vo-atlas) you need to login to the dccore.pic.es, edit /root/srm22/srm22-prod.xml, add/edit/remove the desired File Families and then propagate information using ./propagarCanvis.sh srm22-prod.xml -yes
  • In the following example we're removing some File Families and adding a wildcard for some VOs (which will include the deleted File Families, so they will be still accepted by dCache):
[root@dccore01 ~]# cd /root/srm22

[root@dccore01 srm22]# vi srm22-prod.xml 

[root@dccore01 srm22]# ./propagarCanvis.sh srm22-prod.xml
diff /tmp/LinkGroupAuthorization.conf.temp /opt/d-cache/etc/LinkGroupAuthorization.conf

#############################################################################################

diff /tmp/PoolManager.conf.temp /opt/d-cache/config/PoolManager.conf
30,31c30,33
< psu create unit -store vo-pau\..+@enstore
< psu create unit -store vo-cta\..+@enstore
---
> psu create unit -store vo-pau.paus@enstore
> psu create unit -store vo-pau.mice@enstore
> psu create unit -store vo-pau.castor@enstore
> psu create unit -store vo-cta.cta@enstore
69c71,73
< psu addto ugroup ugroup-pau vo-pau\..+@enstore
---
> psu addto ugroup ugroup-pau vo-pau.paus@enstore
> psu addto ugroup ugroup-pau vo-pau.mice@enstore
> psu addto ugroup ugroup-pau vo-pau.castor@enstore
71c75
< psu addto ugroup ugroup-cta vo-cta\..+@enstore
---
> psu addto ugroup ugroup-cta vo-cta.cta@enstore

# Propagate changes with the '-yes' option

[root@dccore01 srm22]# ./propagarCanvis.sh srm22-prod.xml -yes
LinkGroupAuthorization.conf                                                                                                                                          100% 1011     1.0KB/s   00:00    
Per fer efectius els canvis fer un 'reload -yes' al PoolManager@dccore.pic.es i un 'update link groups' a SrmSpaceManager@srm.pic.es

File Family Width

  • file_family_width is a dCache tag which determines the file family width associated with all files in this directory.
  • File family width is an integer value associated with a file family that is used to limit write-accessibility on data storage volumes; there is no width associated with reading.
  • For a given media type and for a given file family, Enstore limits the number of volumes available for writing in parallel to the value of the file family width (except when unfilled volumes are already mounted for previous reads). Correspondingly, the number of media drives on which the volumes are loaded is also limited to the width.

File Family Wrapper

  • file_family_wrapper is a dCache tag which determines the file family wrapper associated with all files in this directory.
  • A file family wrapper specifies the format of files on the storage volume. It defines information that gets added before and after data files as they’re written to media. In this way the data written to tape is self-contained and independent of metadata stored externally.
  • There are two wrapper types implemented:
  • cpio_odc - is the default wrapper set up by the Enstore admin when a new namespace area is created. All files with the cpio_odc wrapper can be dumped with cpio. This wrapper has a file length limit of (8G – 1) bytes. It is sufficient for the vast majority of data files, as most files are still under 2GB.
  • cern - accommodates data files up to (10^21 –1) bytes, which in effect limits the file size to the tape size, since spanning and striping of files across multiple tapes are not supported. It matches an extension to the ANSI standard, as proposed by CERN, and allows data files written at Fermilab to be readable by CERN, and viceversa.