Hadoop Distributed File System (HDFS)

From Public PIC Wiki
Revision as of 08:41, 19 May 2025 by Sbogaart (talk | contribs) (→‎How to connect to the service)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

PIC offers a WebDAV service over WebHDFS for seamless file transfer and management. This service is designed for users who need to upload, download, and manage large files on HDFS through a familiar file management interface. The WebDAV protocol allows users to access the HDFS in a way that mimics the experience of working with local file systems, offering the flexibility and ease of use of tools like Finder, File Explorer, and rclone, among others.


WebDAV-compatible clients allow for the management of large datasets and offer multithreaded operations for enhanced performance. This makes the WebDAV service ideal for handling large-scale data uploads and downloads, with support for efficient file transfers.


The service also provides a simple and effective way to interface with HDFS, especially for users who prefer a file-system-like experience for managing their data, rather than relying on more technical methods like command-line tools.


How to connect to the service

To connect to the WebDAV service, follow the steps below:

  1. Use a WebDAV-compatible client:
    • Finder (macOS)
    • File Explorer (Windows)
    • Linux File System (Linux)
    • rclone (CLI)
    • Cyberduck
    • CrossFTP
    • curl
  2. Mount the WebDAV endpoint:
    The WebDAV server is accessible via the following URL: https://webdav-hdfs.pic.es/ [1]
  3. Authenticate with your PIC credentials:
    When prompted, enter your PIC user credentials to authenticate your session. This ensures that only authorized users have access to the HDFS.
  4. Browse, upload, and download files:
    Once connected, you will be able to manage your files in the same way you would with any local file system. You can drag and drop files, create directories, and manage large datasets directly on the HDFS.
    • Tip: Using a multithreaded client like rclone or Cyberduck will help improve upload and download speeds for large files.

Usage Guidelines

  • File Uploads:
    • The WebDAV service allows you to upload files of any size to HDFS.
    • For large datasets, it is recommended to use a multithreaded client, as this will optimize the upload process and make it faster and more efficient.
  • File Downloads:
    • You can also download files from the HDFS to your local machine.
    • Multithreading is supported, allowing you to download large files quickly.
  • File Management:
    • Files and directories can be created, deleted, or renamed directly within your WebDAV-compatible client.
    • You can also move files around within your HDFS storage, making file management easier.

Best Practices

Optimize large file transfers:

To improve performance, use tools like rclone or Cyberduck for multithreaded file transfers. This helps manage the upload and download of large files or large quantities of files more efficiently.

Troubleshooting

Cannot Connect to the WebDAV Service:

If you're unable to connect to the WebDAV server, ensure that you're using the correct URL and that your PIC credentials are entered properly. Also, verify that your WebDAV-compatible client is configured correctly.

Slow Uploads or Downloads:

If you experience slow file transfer speeds, check if your client supports multithreaded transfers and consider using a tool like rclone or Cyberduck, which support multiple threads for faster transfers.

File Uploads Fail

If file uploads fail, try splitting large files into smaller chunks or use an alternative tool that supports better error handling and retry capabilities, such as rclone.

Security and Data Management

PIC Credentials:

Always ensure that your PIC credentials are secure. Do not share your credentials with others or store them in insecure locations.