HDFS Access via VOSpace

From Public PIC Wiki
Jump to navigation Jump to search

Introduction

PIC provides access to the distributed file system HDFS through a VOSpace server, following the IVOA standard described in VOSpace 2.1.

This service is an alternative to WebDAV access, allowing users to manage their data programmatically and in a structured way using tools compatible with the Virtual Observatory (VO) ecosystem. It is especially aimed at users who require data management operations such as reading, writing, moving, and metadata querying within a standardized environment.

How to connect to the service

VOSpace Endpoint

The VOSpace server is available at the following URL:

   https://vospace.pic.es/vospace

Compatible clients

You can access the service using tools compatible with the VOSpace 2.1 standard:

Install with pip:
   pip install vos>=3.6.3
  • curl:
Can be used to perform HTTP operations following the examples defined in the VOSpace 2.1 standard, including node creation, file transfers, and property queries.

Authentication

The VOSpace server allows both anonymous and authenticated access. To access restricted data or personal space, users must specify their PIC username and password.

Usage Guidelines

Once connected to the service, users can perform a variety of operations defined by the VOSpace standard. Supported functionalities include:

  • getProtocols: Query supported transfer protocols.
  • getViews: View available data formats (views).
  • getProperties: Obtain node properties.
  • createNode: Create files or folders.
  • getNode: Retrieve node information.
  • deleteNode: Delete files or folders.
  • moveNode: Move nodes inside the VOSpace.
  • put / get: Upload or download files.
  • pushToVoSpace / pullFromVoSpace: Transfer files via external URLs.

Best Practices

  • Avoid using the ! character in file or folder names, as it may cause compatibility issues with the CADC vos client.
  • Use tools that comply with the VOSpace standard to ensure compatibility and avoid transfer errors.

Troubleshooting

Cannot access the server:

Problems with properties:

  • The setProperties operation is not allowed due to permission restrictions on HDFS.

Security and Data Management

  • Protect PIC credentials. Do not share or store them in plain text.
  • For automation, consider using secure credential managers or temporary storage.

Usage Examples

Using the CADC vos client (Python)

This server supports the CADC vos client and operations like listdir, mkdir, copy, move, and delete. Minimum version: vos >= 3.6.3.

Configuration

  • Create a config file (e.g. vos-config.ini):
   [vos]
   resourceID = https://vospace.pic.es/vospace http
  • Export the file path as an environment variable:
   export VOSPACE_CONFIG_FILE=/path/to/vos-config.ini
  • Create ~/.netrc for authentication:
   machine https://vospace.pic.es/vospace
     login USERNAME
     password PASSWORD
  • Set permissions:
   chmod 600 ~/.netrc

Python usage examples

   import vos
   client = vos.Client()
   # List contents
   files = client.listdir("https://vospace.pic.es/vospace/nodes/user/my_user/")
   print("Contents:", files)
   # Create directory
   client.mkdir("https://vospace.pic.es/vospace/nodes/user/my_user/mydir")
   # Upload file
   client.copy("localfile.txt", "https://vospace.pic.es/vospace/user/my_user/mydir/localfile.txt")
   # Download file
   client.copy("https://vospace.pic.es/vospace/user/my_user/mydir/localfile.txt", "downloaded.txt")
   # Move file
   client.move(
       "https://vospace.pic.es/vospace/user/my_user/mydir/localfile.txt",
       "https://vospace.pic.es/vospace/user/my_user/mydir/renamedfile.txt"
   )
   # Get node info
   node = client.get_node("https://vospace.pic.es/vospace/nodes/user/my_user/mydir/renamedfile.txt")
   print("Properties:", node.props)
   # Delete resource
   client.delete("https://vospace.pic.es/vospace/nodes/user/my_user/mydir")

Note: Ensure configuration is properly set.

Using curl

You can also interact with the server using curl or other HTTP tools.

These must follow the VOSpace 2.1 specification.

Important:

  • Avoid using ! in URLs.
  • Use full HTTP paths like /vospace/nodes/....

VOSpace Server Endpoints

The server provides the following REST endpoints per the VOSpace 2.1 specification:

Method Path Description
GET /vospace/protocols Retrieves supported transfer protocols
GET /vospace/views Retrieves available data views
GET /vospace/properties Retrieves node properties
GET /vospace/capabilities Retrieves server capabilities
GET /vospace/{job_id} Retrieves job information
GET /vospace/{job_id}/phase Retrieves job phase
GET /vospace/{job_id}/error Retrieves job error info
GET /vospace/{job_id}/results/transferDetails Retrieves transfer details
GET /vospace/{path} Retrieves a file/folder node
GET /vospace/{path} Downloads a file (streaming)
PUT /vospace/{path} Creates a file or folder
PUT /vospace/{path} Uploads a file
POST /vospace/ Moves or copies a node (returns 303 redirect)
POST /vospace/synctrans Push/pull transfer
POST /vospace/{path} Set properties (not supported)
POST /vospace/{job_id}/phase Change job phase
DELETE /vospace/{path} Deletes a file or folder

Notes

  • {path} and {job_id} are dynamic parameters.
  • The server uses standard HTTP error codes: 400, 403, 404, 409, 500.
  • PUT supports uploads and node creation.
  • POST /vospace/ moves or copies nodes (responds with 303 See Other).
  • setProperties is disabled due to HDFS permissions.
  • File downloads return streamed content with GET.
  • Avoid using ! in paths for full compatibility with the CADC client.