Difference between revisions of "JupyterHub"

From Public PIC Wiki
Jump to navigation Jump to search
 
(175 intermediate revisions by 6 users not shown)
Line 2: Line 2:
 
= Introduction =
 
= Introduction =
  
PIC offers a service for running Jupyter notebooks on CPU or GPU resources. This service is primarily thought for code developing rather than massive data processing. The usage is similar to running notebooks on your personal computer but offers the advantage of developing and testing your code on different hardware configurations.  
+
PIC offers a service for running Jupyter notebooks on CPU or GPU resources. This service is primarily thought for code developing or prototyping rather than data processing. The usage is similar to running notebooks on your personal computer but offers the advantage of developing and testing your code on different hardware configurations, as well as facilitating the scalability of the code since it is being tested in the same environment in which it would run on a mass scale.  
  
Since the service is strictly thought for development and small scale testing tasks, there is a shutdown policy in place: 1) Your session will be closed and removed after 2 hours of idle keyboard. 2) The maximum duration for a session (idle or active) is limited to 48h. In practice that  means that you should estimate the test data volume that you work with during a session to be able to be processed in less than 48 hours.
+
Since the service is strictly thought for development and small scale testing tasks, a shutdown policy for the sessions has been put in place:
  
 +
# The maximum duration for a session is 48h.
 +
# After an idle period of 2 hours, the session will be closed.
  
= How to connect to the service =
+
In practice that  means that you should estimate the test data volume that you work with during a session to be able to be processed in less than 48 hours.
  
Got to jupyter01.pic.es to see your login screen.
+
== How to connect to the service ==
  
[[File:login.png|frameless|700px|Login screen]]
+
Got to [https://jupyter.pic.es jupyter.pic.es] to see your login screen.
 +
 
 +
[[File:login.png|700px|Login screen]]
  
 
Sign in with your PIC user credentials. This will prompt you to the following screen.
 
Sign in with your PIC user credentials. This will prompt you to the following screen.
  
[[File:screen01.png|700px]]
+
[[File:ScreenshotJupyterSpawn.png|700px|current]]
  
Here you can choose the hardware configuration for your Jupyter session. After choosing a configuration and pressing start the next screen will show you the progress of the initialisation process. Keep in mind that a job containing your Jupyter session is actually sent to the HTCondor queuing system and waiting for available resources before being started. This can take a bit but shouldn't be more than a minute.  
+
Here you can choose the hardware configuration for your Jupyter session. Also, you have to choose the experiment (project) you are working on during the Jupyter session. After choosing a configuration and pressing start the next screen will show you the progress of the initialisation process. Keep in mind that a job containing your Jupyter session is actually sent to the HTCondor queuing system and waiting for available resources before being started. This usually takes less than a minute but can take up to a few depending on our resource usage.
  
[[File:screen02.png|600px]]
+
[[File:screen02.png|900px]]
  
 
In the next screen you can choose the tool that you want to use for your work: a Python notebook, a Python console or a plain bash terminal.
 
In the next screen you can choose the tool that you want to use for your work: a Python notebook, a Python console or a plain bash terminal.
 +
For the Python environment (either notebook or environment) you have two default options:
 +
* the ipykernel version of Python 3
 +
* the XPython version of Python 3.9, this one allows you to use the integrated debugging module.
 +
 +
Further you see an icon with a "D" - desktop, this one starts a VNC session that allows the use of programs with graphical user interfaces.
 +
 +
Also, recently you can find the icon of Visual Studio, an integrated development environment.
  
[[File:screen03.png|700px]]
+
[[File:ScreenshotJupyterlab20231103.png|700px]]
  
 
Your python environments should appear under Notebook and Console headers. In a later section we will show you how to create a new environment and to remove an existing one.
 
Your python environments should appear under Notebook and Console headers. In a later section we will show you how to create a new environment and to remove an existing one.
  
= Terminate your session and logout =
+
== Terminate your session and logout ==
  
It is important that before you log out you terminate your job. In order to do so, go to the top page menu "'''File''' -> '''Hub Control Panel'''" and you will see the following screen.
+
It is important that you terminate your session before you log out. In order to do so, go to the top page menu "'''File''' -> '''Hub Control Panel'''" and you will see the following screen.
  
 
[[File:screen04.png|600px]]
 
[[File:screen04.png|600px]]
  
Here click on the '''Stop My Server''' button. After that you can log out by clicking the '''logout''' button in the right upper corner.
+
Here, click on the '''Stop My Server''' button. After that you can log out by clicking the '''Logout''' button in the right upper corner.
 +
 
 +
=  Python virtual environments =
 +
 
 +
This section covers the use of Python virtual environments with Jupyter.
  
 +
== Initialize conda (we highly recommend the use of miniforge/mambaforge) ==
  
=  Virtual environments with conda =
+
Before using conda/mamba in your bash session, you have to initialize it.
 
+
* For access to an available conda/mamba installation, please get in contact with your project liaison at PIC. He/she will give you the actual value for the '''/path/to/mambaforge''' placeholder.
On the homepage of your Jupyter session, click on the terminal button on the session dashboard on the right to open a bash terminal.
+
* If you want to use your own conda/mamba installation, we recommend you to install the minimal '''miniforge''' distribution, instructions [https://github.com/conda-forge/miniforge here] or [https://github.com/mamba-org/mamba mamba/micromamba]
  
== Initialize conda ==
+
Log onto Jupyter and start a session. On the homepage of your Jupyter session, click on the terminal button on the session dashboard on the right to open a bash terminal. If no specific version is needed you can use the link provided in the example.
If you use conda for the first time in your home , you have to initialize conda.
 
  
Fiirst, let's go on and activate the base environment:
+
First, let's initialize conda for our bash sessions:
 
<pre>
 
<pre>
[neissner@td110 ~]$ eval "$(/path/to/anaconda/bin/conda shell.bash hook)"
+
[neissner@td110 ~]$ /data/astro/software/alma9/conda/miniforge-24.1.2/bin/mamba init
 
</pre>
 
</pre>
  
Initialize conda:
+
This actually changes the .bashrc file in your home directory in order to activate the base environment on login.
 +
To avoid that the base environment is activated every time you log on to a node, run:
 
<pre>
 
<pre>
(base) [neissner@td110 ~]$ conda init
+
[neissner@td110 ~]$ conda config --set auto_activate_base false
 
</pre>
 
</pre>
  
This actually changes the .bashrc file in your home directory in order to activate the base environment on login.
+
Then, in order to activate the base environment you will have to run this command:
To avoid that the base environment is activated every time you log on to a node, run:
 
 
<pre>
 
<pre>
(base) [neissner@td110 ~]$ conda config --set auto_activate_base false
+
[neissner@td110 ~]$ eval "$(/data/astro/software/alma9/conda/miniforge-24.1.2/bin/conda shell.bash hook)"
 
</pre>
 
</pre>
  
For now you can deactivate the base environment and exit the terminal.
+
For now you can exit the terminal.
 
<pre>
 
<pre>
(base) [neissner@td110 ~]$ conda deactivate
 
 
[neissner@td110 ~]$ exit
 
[neissner@td110 ~]$ exit
 
</pre>
 
</pre>
  
== Bind existing environments to your Jupyter sessions ==
+
== Link an existing environment to Jupyter ==
  
If you want to know which environments are available for your project, please contact your project liaison at PIC. Although, you might find instructions on how to create your own environments, we would like to encourage the use of the predefined environments. The main reason is the sheer size of a conda environment which reaches easily several GB.
+
You can find instructions on how to create your own environments, e.g. [[#Create_virtual_environments_with_venv_or_conda | here]].
  
 
Log into Jupyter, start a session. From the session dashboard choose the bash terminal.
 
Log into Jupyter, start a session. From the session dashboard choose the bash terminal.
  
Inside the terminal, activate your environment:
+
Inside the terminal, activate your environment.
 +
 
 +
For '''conda/mamba''' environments:
 +
* if you created the environment without a prefix:
 +
<pre>
 +
[neissner@td110 ~]$ mamba activate environment
 +
(...) [neissner@td110 ~]$
 +
</pre>
 +
The parenthesis (...) in front of your bash prompt show the name of your environment.
 +
* if you created the environment with a prefix:
 +
<pre>
 +
[neissner@td110 ~]$ mamba activate /path/to/environment
 +
(...) [neissner@td110 ~]$
 +
</pre>
 +
The parenthesis (...) in front of your bash prompt show the absolute path of your environment.
 +
 
 +
For '''venv''' environments:
 
<pre>
 
<pre>
[neissner@td110 ~]$ conda activate /path/to/predefined/environment
+
[neissner@td110 ~]$ source /path/to/environment/bin/activate
(/path/to/predefined/environment) [neissner@td110 ~]$  
+
(...) [neissner@td110 ~]$  
 
</pre>
 
</pre>
  
Install the ipykernel package and link it to the environment (it might already be installed):
+
Link the environment to a Jupyter kernel. For both, '''conda/mamba''' and '''venv''':
 
<pre>
 
<pre>
(/path/to/predefined/environment) [neissner@td110 ~]$ conda install --prefix=/path/to/predefined/environment ipykernel
+
(...) [neissner@td110 ~]$ python -m  ipykernel install --user --name=whatever_kernel_name
Collecting package metadata (current_repodata.json): done
+
Installed kernelspec whatever_kernel_name in
Solving environment: done
+
                        /nfs/pic.es/user/n/neissner/.local/share/jupyter/kernels/whatever_kernel_name
 +
</pre>
 +
 
 +
If you don't have the '''ipykernel''' module installed in your environment you may receive an error message like the one below when trying to run the previous command.
 +
<pre>No module named ipykernel</pre>
 +
 
 +
If this is the case, you need to install it by running: '''pip install ipykernel'''
  
# All requested packages already installed.
+
Deactivate your environment.  
  
(/path/to/predefined/environment) [neissner@td110 ~]$ ipython kernel install --user --name=whatever_kernel_name
+
For conda:
Installed kernelspec whatever_kernel_name in /nfs/pic.es/user/n/neissner/.local/share/jupyter/kernels/whatever_kernel_name
+
<pre>
 +
(...) [neissner@td110 ~]$ mamba deactivate
 
</pre>
 
</pre>
  
Deactivate your environment:
+
For venv:
 
<pre>
 
<pre>
(/path/to/predefined/environment) [neissner@td110 ~]$ conda deactivate
+
(...) [neissner@td110 ~]$ deactivate
 
</pre>
 
</pre>
  
Line 98: Line 135:
 
[[File:screen05.png|700px]]
 
[[File:screen05.png|700px]]
  
== Remove environment/kernel from Jupyter ==
+
== Unlink an environment from Jupyter ==
 
Log onto Jupyter, start a session and from the session dashboard choose the bash terminal. To remove your environment/kernel from Jupyter run:
 
Log onto Jupyter, start a session and from the session dashboard choose the bash terminal. To remove your environment/kernel from Jupyter run:
 
<pre>
 
<pre>
[neissner@td110 ~]$ jupyter kernelspec uninstall whatever_kernel_name
+
[neissner@td110 ~]$ jupyter kernelspec uninstall whatever_kernel_name
 
Kernel specs to remove:
 
Kernel specs to remove:
   whatever_kernel_name                 /nfs/pic.es/user/n/neissner/.local/share/jupyter/kernels/whatever_kernel_name
+
   whatever_kernel_name     /nfs/pic.es/user/n/neissner/.local/share/jupyter/kernels/whatever_kernel_name
 
Remove 1 kernel specs [y/N]: y
 
Remove 1 kernel specs [y/N]: y
 
[RemoveKernelSpec] Removed /nfs/pic.es/user/n/neissner/.local/share/jupyter/kernels/whatever_kernel_name
 
[RemoveKernelSpec] Removed /nfs/pic.es/user/n/neissner/.local/share/jupyter/kernels/whatever_kernel_name
 
</pre>
 
</pre>
 
Keep in mind that, although not available in Jupyter anymore, the environment still exists. Whenever you need it, you can link it again.
 
Keep in mind that, although not available in Jupyter anymore, the environment still exists. Whenever you need it, you can link it again.
 +
 +
== Create virtual environments with venv or conda ==
 +
 +
Before creating a new environment, please get in contact with your project liaison at PIC as there may be already a suitable environment for your needs in place.
 +
 +
If none of the existing environments suits your needs, you can create a new environment.
 +
First, create a directory in a suitable place to store the environment. For single-user environments, place them in your home under ~/env. For environments that will be shared with other project users, contact your project liaison and ask him/her for a path in a shared storage volume that is visible to all of them.
 +
 +
Once you have the location (i.e. /path/to/env/folder), create the environment with the following commands:
 +
 +
For '''venv''' environments '''(recommended)'''
 +
 +
If your_env is installed at /path/to/env/your_env
 +
<pre>
 +
[neissner@td110 ~]$ cd /path/to/env
 +
[neissner@td110 ~]$ python3 -m venv your_env
 +
</pre>
 +
Now you should be able to activate your environment and install additional modules
 +
<pre>
 +
[neissner@td110 ~]$ cd /path/to/env
 +
[neissner@td110 ~]$ source your_env/bin/activate
 +
(...)[neissner@td110 ~]$ pip install additional_module1 additional_module2 ...
 +
</pre>
 +
 +
For '''conda/mamba''' environments
 +
<pre>
 +
[neissner@td110 ~]$ mamba create --prefix /path/to/env/your_env
 +
</pre>
 +
The list of modules (module1, module2, ...) is optional. For instance, for a python3 environment with scipy you would specify: ''python=3 scipy''
 +
 +
Now you should be able to activate your environment and install additional modules
 +
<pre>
 +
[neissner@td110 ~]$ mamba activate /path/to/env/folder/your_env
 +
(...)[neissner@td110 ~]$ mamba install additional_module1 additional_module2 ...
 +
</pre>
 +
You can use pip install inside a mamba environment, however, resolving dependencies might require installing additional packages manually.
 +
 +
== conda / mamba configuration ==
 +
 +
The behaviour of conda/mamba can be configured through the "$HOME/.condarc" file, described [https://docs.conda.io/projects/conda/en/latest/configuration.html here]. Some interesting parameters:
 +
 +
* envs_dirs: The list of directories to search for named environments. E.g.: different locations where you created environments
 +
 +
  envs_dirs:
 +
    - /data/pic/scratch/torradeflot/envs
 +
    - /data/astro/scratch/torradeflot/envs
 +
    - /data/aai/scratch/torradeflot/envs
 +
 +
* pkgs_dirs: Folder where to store conda packages
 +
 +
= Proper usage of X509 based proxies =
 +
 +
We found recently that the usage of proxies within a Jupyter session might cause problems because the environment changes certain standard locations such as '''/tmp'''
 +
 +
For a correct functioning please create the proxy the following way, example for Virgo:
 +
 +
<pre>
 +
[<user>@<hostname> ~]$ /bin/voms-proxy-init --voms virgo:/virgo/ligo --out ./x509up_u$(id -u)
 +
[<user>@<hostname> ~]$ export X509_USER_PROXY=./x509up_u$(id -u)
 +
[<user>@<hostname> ~]$ ls /cvmfs/ligo.osgstorage.org
 +
ls: cannot access /cvmfs/ligo.osgstorage.org: Permission denied
 +
</pre>
 +
 +
Here the proxy cannot properly be located. Therefore we have to put the complete path into the variable:
 +
 +
<pre>
 +
[<user>@<hostname> ~]$ pwd
 +
/nfs/pic.es/user/<letter>/<user>
 +
[<user>@<hostname> ~]$ export X509_USER_PROXY=/nfs/pic.es/user/<letter>/<user>/x509up_u$(id -u)
 +
[<user>@<hostname> ~]$ ls /cvmfs/ligo.osgstorage.org
 +
frames  powerflux  pycbc  test_access
 +
</pre>
 +
 +
= Software of particular interest =
 +
== SageMath ==
 +
 +
[https://www.sagemath.org/ SageMath] is particularly interesting for Cosmology because it allows symbolic calculations, e.g. deriving the equations of motions for the scale factor starting from a customised space-time metric.
 +
 +
=== Standard cosmology examples ===
 +
 +
* The Friedman equations for the FLRW solution of the Einstein equations.
 +
You can find the corresponding Notebook in any PIC terminal at '''/data/astro/software/notebooks/FLRW_cosmology.ipynb'''
 +
 +
* The notebook you can find at '''/data/astro/software/notebooks/FLRW_cosmology_solutions.ipynb''' uses known analytical solutions of the FLRW cosmology and produces this image for the evolution of the scale factor:
 +
 +
[[File:Screenshot_Sage05.png|300px]]
 +
 +
* The notebook you can find at '''/data/astro/software/notebooks/Interior_Schwarzschild.ipynb''' shows the formalism for the interior Schwarzschild metric and displays the solutions for density and pressure of a static celestial object that is sufficiently larger than its Schwarzschild radius. The pressure for an object with constant density id shown in the image:
 +
 +
[[File:Screenshot_Sage06.png|300px]]
 +
 +
=== Enabling SageMath environment in Jupyter ===
 +
 +
If you have never initialized mamba, run:
 +
 +
<pre>
 +
[<user>@<hostname> ~]$ /data/astro/software/centos7/conda/mambaforge_4.14.0/bin/mamba init
 +
[<user>@<hostname> ~]$ conda config --set auto_activate_base false
 +
</pre>
 +
 +
After that you can enable SageMath for its use in a Jupyter notebook session:
 +
 +
<pre>
 +
[<user>@<hostname> ~] mamba activate /data/astro/software/envs/sage
 +
(/data/astro/software/envs/sage) [<user>@<hostname> ~]$ python -m  ipykernel install --user --name=sage
 +
....
 +
(/data/astro/software/envs/sage) [<user>@<hostname> ~]$ mamba deactivate
 +
</pre>
 +
 +
This creates a file in you home '''~/.local/share/jupyter/kernels/sage/kernel.json'''
 +
which has to be modified to look like this:
 +
 +
<pre>
 +
{
 +
"argv": [
 +
  "/data/astro/software/envs/sage/bin/sage",
 +
  "--python",
 +
  "-m",
 +
  "sage.repl.ipython_kernel",
 +
  "-f",
 +
  "{connection_file}"
 +
],
 +
"display_name": "sage",
 +
"language": "sage",
 +
"metadata": {
 +
  "debugger": true
 +
}
 +
}
 +
</pre>
 +
 +
Next time you go to your Jupyter dashboard you will find the sage environment listed there.
 +
 +
 +
 +
= Dask =
 +
 +
A notebook with instructions on how to run Dask at PIC can be found [https://gitlab.pic.es/services/code-samples/-/blob/main/computing/dask/dask_htcondor.ipynb here]
 +
 +
= Using a singularity image as a jupyter kernel =
 +
 +
Sometimes the software stack of some projects may be provided in the shape of a singularity image, it will then be convenient to use this image as a kernel for the notebooks in jupyter.pic.es.
 +
 +
The singularity image to be used as a kernel needs to fullfill some requirements. Different requirements will apply depending on the programming language.
 +
 +
== python jupyter kernel in a singularity image ==
 +
 +
The singularity image needs to have the '''python''' and the '''ipykernel''' module installed.
 +
 +
* Create the folder that will host the kernel definition
 +
 +
  mkdir -p $HOME/.local/share/jupyter/kernels/singularity
 +
 +
* Create the '''kernel.json''' file inside it with the following content:
 +
 +
 +
{
 +
  "argv": [
 +
    "singularity",
 +
    "exec",
 +
    "--cleanenv",
 +
    "/path/to/the/singularity/image.sif",
 +
    "python",
 +
    "-m",
 +
    "ipykernel",
 +
    "-f",
 +
    "{connection_file}"
 +
  ],
 +
  "language": "python",
 +
  "display_name": "singularity-kernel"
 +
}
 +
 +
 +
Refresh or start the jupyterlab interface and the singularity kernel should appear in the launcher tab
 +
 +
= GPUs =
 +
 +
The way to identify the GPUs that are assigned to your job is:
 +
* check the environment variable CUDA_VISIBLE_DEVICES. In a terminal run "echo $CUDA_VISIBLE_DEVICES". The environment variable contains a list of comma-separated GPU ids. With this you will already know how many gpus are assigned to your job. If the variable does not exist, there are no gpus assigned to the job
 +
 +
* list the gpus with nvidia-smi, in a terminal run "nvidia-smi -L", and look for the gpus you've been assigned. Remember their indexes (integers from 0 to 7)
 +
 +
[[File:check_gpu_id_highlighted.png]]
 +
 +
* in the GPU dashboard the gpus are identified with their index
 +
 +
[[File:check_gpu_resources_highlighted.png]]
 +
 +
= Jupyterlab user guide =
 +
 +
You can find the official documentation of the currently installed version of jupyterlab (3.6) [https://jupyterlab.readthedocs.io/en/4.2.x/ here], there you will find instruction on how to
 +
* [https://jupyterlab.readthedocs.io/en/4.2.x/user/commands.html Access the command palette]
 +
* [https://jupyterlab.readthedocs.io/en/4.2.x/user/toc.html Build a Table Of Contents]
 +
* [https://jupyterlab.readthedocs.io/en/4.2.x/user/debugger.html Debug your code]
 +
 +
A set of non-official jupyterlab extensions are installed to provide additional functionalities
 +
 +
== jupytext ==
 +
Pair your notebooks with text files to enhance version tracking.
 +
https://jupytext.readthedocs.io
 +
 +
=== Example ===
 +
 +
If you had a notebook (.ipynb file) containing only the cell below, tracked in a git repository
 +
 +
<pre>
 +
%matplotlib inline
 +
import numpy as np
 +
import matplotlib.pyplot as plt
 +
plt.imshow(np.random.random([10, 10]))
 +
</pre>
 +
 +
Different executions of the cell would produce different images, and the images are embedded in a pseudo-binary format inside the notebook file. In this case, doing a '''git diff''' of the .ipynb file would produce a huge output (because the image changed), even if there wasn't any change in the code. It is thus convenient to sync the notebook with a text file (e.g. a .py script) using the jupytext extension and track this one with git. The outputs, including images, as well as some additional metadata, won't be added to the synced text file. So in the case of different executions of the same notebook, the diff will always be empty.
 +
 +
== git ==
 +
Sidebar GUI to git repo management
 +
https://github.com/jupyterlab/jupyterlab-git
 +
 +
== variable inspector ==
 +
Variable inspection à la Matlab
 +
https://github.com/jupyterlab-contrib/jupyterlab-variableInspector
 +
 +
...
 +
 +
= Code samples =
 +
 +
A repository with sample code can be found here: https://gitlab.pic.es/services/code-samples/
 +
 +
= Troubleshooting =
 +
 +
== Logs ==
 +
 +
The log files for the jupyterlab server are stored in "~/.jupyter". The log files will be created once the jupyter lab server job is finished.
 +
 +
 +
== Clean workspaces ==
 +
 +
Jupyterlab stores the workspace status in the "~/.jupyter/lab/workspaces" folder. If you want to start with a fresh (empty) workspace, delete all the content of this folder before launching the notebook.
 +
 +
    cd ~/.jupyter/lab/workspaces
 +
    rm *
 +
 +
 +
== 504 Gateway timeout ==
 +
 +
The notebook job is running in HTCondor but the user can not access the notebook server. Ultimately a 504 error is received.
 +
 +
This is probably because there's some error when starting the jupyterlab server. First of all, shutdown the notebook server and check the logs to better identify the problem. If you don't see the source of the error, try clean the workspaces and launching a notebook again.

Latest revision as of 11:09, 3 December 2024

Introduction

PIC offers a service for running Jupyter notebooks on CPU or GPU resources. This service is primarily thought for code developing or prototyping rather than data processing. The usage is similar to running notebooks on your personal computer but offers the advantage of developing and testing your code on different hardware configurations, as well as facilitating the scalability of the code since it is being tested in the same environment in which it would run on a mass scale.

Since the service is strictly thought for development and small scale testing tasks, a shutdown policy for the sessions has been put in place:

  1. The maximum duration for a session is 48h.
  2. After an idle period of 2 hours, the session will be closed.

In practice that means that you should estimate the test data volume that you work with during a session to be able to be processed in less than 48 hours.

How to connect to the service

Got to jupyter.pic.es to see your login screen.

Login screen

Sign in with your PIC user credentials. This will prompt you to the following screen.

current

Here you can choose the hardware configuration for your Jupyter session. Also, you have to choose the experiment (project) you are working on during the Jupyter session. After choosing a configuration and pressing start the next screen will show you the progress of the initialisation process. Keep in mind that a job containing your Jupyter session is actually sent to the HTCondor queuing system and waiting for available resources before being started. This usually takes less than a minute but can take up to a few depending on our resource usage.

Screen02.png

In the next screen you can choose the tool that you want to use for your work: a Python notebook, a Python console or a plain bash terminal. For the Python environment (either notebook or environment) you have two default options:

  • the ipykernel version of Python 3
  • the XPython version of Python 3.9, this one allows you to use the integrated debugging module.

Further you see an icon with a "D" - desktop, this one starts a VNC session that allows the use of programs with graphical user interfaces.

Also, recently you can find the icon of Visual Studio, an integrated development environment.

ScreenshotJupyterlab20231103.png

Your python environments should appear under Notebook and Console headers. In a later section we will show you how to create a new environment and to remove an existing one.

Terminate your session and logout

It is important that you terminate your session before you log out. In order to do so, go to the top page menu "File -> Hub Control Panel" and you will see the following screen.

Screen04.png

Here, click on the Stop My Server button. After that you can log out by clicking the Logout button in the right upper corner.

Python virtual environments

This section covers the use of Python virtual environments with Jupyter.

Initialize conda (we highly recommend the use of miniforge/mambaforge)

Before using conda/mamba in your bash session, you have to initialize it.

  • For access to an available conda/mamba installation, please get in contact with your project liaison at PIC. He/she will give you the actual value for the /path/to/mambaforge placeholder.
  • If you want to use your own conda/mamba installation, we recommend you to install the minimal miniforge distribution, instructions here or mamba/micromamba

Log onto Jupyter and start a session. On the homepage of your Jupyter session, click on the terminal button on the session dashboard on the right to open a bash terminal. If no specific version is needed you can use the link provided in the example.

First, let's initialize conda for our bash sessions:

[neissner@td110 ~]$ /data/astro/software/alma9/conda/miniforge-24.1.2/bin/mamba init

This actually changes the .bashrc file in your home directory in order to activate the base environment on login. To avoid that the base environment is activated every time you log on to a node, run:

[neissner@td110 ~]$ conda config --set auto_activate_base false

Then, in order to activate the base environment you will have to run this command:

[neissner@td110 ~]$ eval "$(/data/astro/software/alma9/conda/miniforge-24.1.2/bin/conda shell.bash hook)"

For now you can exit the terminal.

[neissner@td110 ~]$ exit

Link an existing environment to Jupyter

You can find instructions on how to create your own environments, e.g. here.

Log into Jupyter, start a session. From the session dashboard choose the bash terminal.

Inside the terminal, activate your environment.

For conda/mamba environments:

  • if you created the environment without a prefix:
[neissner@td110 ~]$ mamba activate environment
(...) [neissner@td110 ~]$ 

The parenthesis (...) in front of your bash prompt show the name of your environment.

  • if you created the environment with a prefix:
[neissner@td110 ~]$ mamba activate /path/to/environment
(...) [neissner@td110 ~]$ 

The parenthesis (...) in front of your bash prompt show the absolute path of your environment.

For venv environments:

[neissner@td110 ~]$ source /path/to/environment/bin/activate
(...) [neissner@td110 ~]$ 

Link the environment to a Jupyter kernel. For both, conda/mamba and venv:

(...) [neissner@td110 ~]$ python -m  ipykernel install --user --name=whatever_kernel_name
Installed kernelspec whatever_kernel_name in 
                         /nfs/pic.es/user/n/neissner/.local/share/jupyter/kernels/whatever_kernel_name

If you don't have the ipykernel module installed in your environment you may receive an error message like the one below when trying to run the previous command.

No module named ipykernel

If this is the case, you need to install it by running: pip install ipykernel

Deactivate your environment.

For conda:

(...) [neissner@td110 ~]$ mamba deactivate

For venv:

(...) [neissner@td110 ~]$ deactivate

Now you can exit the terminal. After refreshing the Jupyter page your whatever_kernel_name appears in the dashboard. In this example test has been used for whatever_kernel_name

Screen05.png

Unlink an environment from Jupyter

Log onto Jupyter, start a session and from the session dashboard choose the bash terminal. To remove your environment/kernel from Jupyter run:

[neissner@td110 ~]$ jupyter kernelspec uninstall whatever_kernel_name
Kernel specs to remove:
  whatever_kernel_name     /nfs/pic.es/user/n/neissner/.local/share/jupyter/kernels/whatever_kernel_name
Remove 1 kernel specs [y/N]: y
[RemoveKernelSpec] Removed /nfs/pic.es/user/n/neissner/.local/share/jupyter/kernels/whatever_kernel_name

Keep in mind that, although not available in Jupyter anymore, the environment still exists. Whenever you need it, you can link it again.

Create virtual environments with venv or conda

Before creating a new environment, please get in contact with your project liaison at PIC as there may be already a suitable environment for your needs in place.

If none of the existing environments suits your needs, you can create a new environment. First, create a directory in a suitable place to store the environment. For single-user environments, place them in your home under ~/env. For environments that will be shared with other project users, contact your project liaison and ask him/her for a path in a shared storage volume that is visible to all of them.

Once you have the location (i.e. /path/to/env/folder), create the environment with the following commands:

For venv environments (recommended)

If your_env is installed at /path/to/env/your_env

[neissner@td110 ~]$ cd /path/to/env
[neissner@td110 ~]$ python3 -m venv your_env

Now you should be able to activate your environment and install additional modules

[neissner@td110 ~]$ cd /path/to/env
[neissner@td110 ~]$ source your_env/bin/activate
(...)[neissner@td110 ~]$ pip install additional_module1 additional_module2 ...

For conda/mamba environments

[neissner@td110 ~]$ mamba create --prefix /path/to/env/your_env

The list of modules (module1, module2, ...) is optional. For instance, for a python3 environment with scipy you would specify: python=3 scipy

Now you should be able to activate your environment and install additional modules

[neissner@td110 ~]$ mamba activate /path/to/env/folder/your_env
(...)[neissner@td110 ~]$ mamba install additional_module1 additional_module2 ...

You can use pip install inside a mamba environment, however, resolving dependencies might require installing additional packages manually.

conda / mamba configuration

The behaviour of conda/mamba can be configured through the "$HOME/.condarc" file, described here. Some interesting parameters:

  • envs_dirs: The list of directories to search for named environments. E.g.: different locations where you created environments
 envs_dirs:
   - /data/pic/scratch/torradeflot/envs
   - /data/astro/scratch/torradeflot/envs
   - /data/aai/scratch/torradeflot/envs
  • pkgs_dirs: Folder where to store conda packages

Proper usage of X509 based proxies

We found recently that the usage of proxies within a Jupyter session might cause problems because the environment changes certain standard locations such as /tmp

For a correct functioning please create the proxy the following way, example for Virgo:

[<user>@<hostname> ~]$ /bin/voms-proxy-init --voms virgo:/virgo/ligo --out ./x509up_u$(id -u)
[<user>@<hostname> ~]$ export X509_USER_PROXY=./x509up_u$(id -u)
[<user>@<hostname> ~]$ ls /cvmfs/ligo.osgstorage.org
ls: cannot access /cvmfs/ligo.osgstorage.org: Permission denied

Here the proxy cannot properly be located. Therefore we have to put the complete path into the variable:

[<user>@<hostname> ~]$ pwd
/nfs/pic.es/user/<letter>/<user>
[<user>@<hostname> ~]$ export X509_USER_PROXY=/nfs/pic.es/user/<letter>/<user>/x509up_u$(id -u)
[<user>@<hostname> ~]$ ls /cvmfs/ligo.osgstorage.org
frames  powerflux  pycbc  test_access

Software of particular interest

SageMath

SageMath is particularly interesting for Cosmology because it allows symbolic calculations, e.g. deriving the equations of motions for the scale factor starting from a customised space-time metric.

Standard cosmology examples

  • The Friedman equations for the FLRW solution of the Einstein equations.

You can find the corresponding Notebook in any PIC terminal at /data/astro/software/notebooks/FLRW_cosmology.ipynb

  • The notebook you can find at /data/astro/software/notebooks/FLRW_cosmology_solutions.ipynb uses known analytical solutions of the FLRW cosmology and produces this image for the evolution of the scale factor:

Screenshot Sage05.png

  • The notebook you can find at /data/astro/software/notebooks/Interior_Schwarzschild.ipynb shows the formalism for the interior Schwarzschild metric and displays the solutions for density and pressure of a static celestial object that is sufficiently larger than its Schwarzschild radius. The pressure for an object with constant density id shown in the image:

Screenshot Sage06.png

Enabling SageMath environment in Jupyter

If you have never initialized mamba, run:

[<user>@<hostname> ~]$ /data/astro/software/centos7/conda/mambaforge_4.14.0/bin/mamba init
[<user>@<hostname> ~]$ conda config --set auto_activate_base false

After that you can enable SageMath for its use in a Jupyter notebook session:

[<user>@<hostname> ~] mamba activate /data/astro/software/envs/sage
(/data/astro/software/envs/sage) [<user>@<hostname> ~]$ python -m  ipykernel install --user --name=sage
....
(/data/astro/software/envs/sage) [<user>@<hostname> ~]$ mamba deactivate

This creates a file in you home ~/.local/share/jupyter/kernels/sage/kernel.json which has to be modified to look like this:

{
 "argv": [
  "/data/astro/software/envs/sage/bin/sage",
  "--python",
  "-m",
  "sage.repl.ipython_kernel",
  "-f",
  "{connection_file}"
 ],
 "display_name": "sage",
 "language": "sage",
 "metadata": {
  "debugger": true
 }
}

Next time you go to your Jupyter dashboard you will find the sage environment listed there.


Dask

A notebook with instructions on how to run Dask at PIC can be found here

Using a singularity image as a jupyter kernel

Sometimes the software stack of some projects may be provided in the shape of a singularity image, it will then be convenient to use this image as a kernel for the notebooks in jupyter.pic.es.

The singularity image to be used as a kernel needs to fullfill some requirements. Different requirements will apply depending on the programming language.

python jupyter kernel in a singularity image

The singularity image needs to have the python and the ipykernel module installed.

  • Create the folder that will host the kernel definition
 mkdir -p $HOME/.local/share/jupyter/kernels/singularity
  • Create the kernel.json file inside it with the following content:


{
  "argv": [
    "singularity",
    "exec",
    "--cleanenv",
    "/path/to/the/singularity/image.sif",
    "python",
    "-m",
   "ipykernel",
    "-f",
    "{connection_file}"
  ],
  "language": "python",
  "display_name": "singularity-kernel"
}


Refresh or start the jupyterlab interface and the singularity kernel should appear in the launcher tab

GPUs

The way to identify the GPUs that are assigned to your job is:

  • check the environment variable CUDA_VISIBLE_DEVICES. In a terminal run "echo $CUDA_VISIBLE_DEVICES". The environment variable contains a list of comma-separated GPU ids. With this you will already know how many gpus are assigned to your job. If the variable does not exist, there are no gpus assigned to the job
  • list the gpus with nvidia-smi, in a terminal run "nvidia-smi -L", and look for the gpus you've been assigned. Remember their indexes (integers from 0 to 7)

Check gpu id highlighted.png

  • in the GPU dashboard the gpus are identified with their index

Check gpu resources highlighted.png

Jupyterlab user guide

You can find the official documentation of the currently installed version of jupyterlab (3.6) here, there you will find instruction on how to

A set of non-official jupyterlab extensions are installed to provide additional functionalities

jupytext

Pair your notebooks with text files to enhance version tracking. https://jupytext.readthedocs.io

Example

If you had a notebook (.ipynb file) containing only the cell below, tracked in a git repository

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.imshow(np.random.random([10, 10]))

Different executions of the cell would produce different images, and the images are embedded in a pseudo-binary format inside the notebook file. In this case, doing a git diff of the .ipynb file would produce a huge output (because the image changed), even if there wasn't any change in the code. It is thus convenient to sync the notebook with a text file (e.g. a .py script) using the jupytext extension and track this one with git. The outputs, including images, as well as some additional metadata, won't be added to the synced text file. So in the case of different executions of the same notebook, the diff will always be empty.

git

Sidebar GUI to git repo management https://github.com/jupyterlab/jupyterlab-git

variable inspector

Variable inspection à la Matlab https://github.com/jupyterlab-contrib/jupyterlab-variableInspector

...

Code samples

A repository with sample code can be found here: https://gitlab.pic.es/services/code-samples/

Troubleshooting

Logs

The log files for the jupyterlab server are stored in "~/.jupyter". The log files will be created once the jupyter lab server job is finished.


Clean workspaces

Jupyterlab stores the workspace status in the "~/.jupyter/lab/workspaces" folder. If you want to start with a fresh (empty) workspace, delete all the content of this folder before launching the notebook.

   cd ~/.jupyter/lab/workspaces
   rm *


504 Gateway timeout

The notebook job is running in HTCondor but the user can not access the notebook server. Ultimately a 504 error is received.

This is probably because there's some error when starting the jupyterlab server. First of all, shutdown the notebook server and check the logs to better identify the problem. If you don't see the source of the error, try clean the workspaces and launching a notebook again.