Dask wiki

From Public PIC Wiki
Jump to navigation Jump to search

Dask + HTCondor

Creating a cluster

using dask-labextension

To create a Dask cluster from the extension. Go to the dask-labextension tab and click "+ NEW"

![kk](static/dask_labextension_1.png)

You can see the cluster information and have some shortcuts

  • "<>" insert a cell with the necessary lines to create a Dask client. Some environment variables have been set in order to be able to use a "regular" Dask Client
  • "SCALE" increase/decrease the number of workers
  • "SHUTDOWN" shutdown the cluster

![kk](static/dask_add_cluster_cell.png)


   from dask.distributed import Client
   client = Client("tls://192.168.100.56:45722")
   client
   /data/astro/scratch2/torradeflot/envs/dask/lib/python3.11/site-packages/distributed/client.py:1388: 
   VersionMismatchWarning: Mismatched versions found
   
   +---------+----------------+----------------+----------------+
   | Package | Client         | Scheduler      | Workers        |
   +---------+----------------+----------------+----------------+
   | lz4     | 4.3.3          | 4.3.2          | 4.3.2          |
   | msgpack | 1.0.7          | 1.0.5          | 1.0.5          |
   | numpy   | 1.26.3         | 1.24.3         | 1.24.3         |
   | pandas  | 2.2.0          | 2.0.2          | 2.0.2          |
   | python  | 3.11.7.final.0 | 3.11.5.final.0 | 3.11.5.final.0 |
   | toolz   | 0.12.1         | 0.12.0         | 0.12.0         |
   +---------+----------------+----------------+----------------+
     warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))




```python client.cluster ```

The encryption of the traffic between the different Dask components is enforced through environment variables


```python [f'{k}={v}' for k,v in os.environ.items() if 'DASK' in k] ```



   ['DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY=/nfs/pic.es/user/t/torradeflot/.config/dask/security/key.pem',
    'DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT=/nfs/pic.es/user/t/torradeflot/.config/dask/security/cert.pem',
    'DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT=/nfs/pic.es/user/t/torradeflot/.config/dask/security/cert.pem',
    'DASK_DISTRIBUTED__COMM__TLS__CA_FILE=/nfs/pic.es/user/t/torradeflot/.config/dask/security/ca_file.pem',
    'DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY=/nfs/pic.es/user/t/torradeflot/.config/dask/security/key.pem',
    'DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY=/nfs/pic.es/user/t/torradeflot/.config/dask/security/key.pem',
    'DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION=true',
    'DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT=/nfs/pic.es/user/t/torradeflot/.config/dask/security/cert.pem']


![kk](static/dask_labextension_2.png)

    1. From python
      1. Using the default environment

This could be a notebook or a script submitted to HTCondor


```python from pic_jupyterhub.dask_condor import SecureHTCondor from dask.distributed import Client ```


```python cluster = SecureHTCondor() #scheduler_options={'dashboard_address':":8788"}) ```

   Certificate files already exist


The cluster is initially created without workers. We need to scale it.


```python cluster.scale(2) ```


```python client = Client(cluster) ```


```python cluster.close() ```

      1. Using a custom environment

The environment needs to have:

  • `dask-jobqueue` to be able to start Dask clusters with HTCondor
  • `ipykernel` to use it as a notebook kernel
  • `numpy` and `pandas` to be able to create Dask arrays and dataframes
  • `bokeh` for the Dask dashboard

`mamba create (-n {env name} | -p {env Path}) dask-jobqueue ipykernel numpy pandas bokeh`


```python from dask_jobqueue import HTCondorCluster from dask.distributed import Client ```


```python cluster = HTCondorCluster(cores=1, memory='2GB', disk='10 GB',

                        job_extra_directives={'getenv': 'True'}) # needed to propagate the security

```


```python cluster ```



```python cluster.scale(2) ```


```python c = Client(cluster) ```

Security is inherited from environment variables


```python cluster.security ```



Security

require_encryption Local (/nfs/pic.es/user/t/torradeflot/jupyterhub_conf/examples/true)
tls_ca_file Local (/nfs/pic.es/user/t/torradeflot/.config/dask/security/ca_file.pem)
tls_client_cert Local (/nfs/pic.es/user/t/torradeflot/.config/dask/security/cert.pem)
tls_client_key Local (/nfs/pic.es/user/t/torradeflot/.config/dask/security/key.pem)
tls_min_version 771
tls_scheduler_cert Local (/nfs/pic.es/user/t/torradeflot/.config/dask/security/cert.pem)
tls_scheduler_key Local (/nfs/pic.es/user/t/torradeflot/.config/dask/security/key.pem)
tls_worker_cert Local (/nfs/pic.es/user/t/torradeflot/.config/dask/security/cert.pem)
tls_worker_key Local (/nfs/pic.es/user/t/torradeflot/.config/dask/security/key.pem)


    1. Connect to existing cluster

Meaning a cluster that was launched from outside you jupyterlab instance, e.g. an independent HTCondor job or somebody else's cluster.

Check IP address of running job ``` nslookup $(condor_q $JOB_ID -af RemoteHost | cut -d "@" -f 2) ```


```python from dask.distributed import Client

client = Client("tls://192.168.100.5:42166") client ```

   /data/astro/scratch2/torradeflot/envs/dask/lib/python3.11/site-packages/distributed/client.py:1388: VersionMismatchWarning: Mismatched versions found
   
   +---------+----------------+----------------+----------------+
   | Package | Client         | Scheduler      | Workers        |
   +---------+----------------+----------------+----------------+
   | lz4     | 4.3.3          | 4.3.2          | 4.3.2          |
   | msgpack | 1.0.7          | 1.0.5          | 1.0.5          |
   | numpy   | 1.26.3         | 1.24.3         | 1.24.3         |
   | pandas  | 2.2.0          | 2.0.2          | 2.0.2          |
   | python  | 3.11.7.final.0 | 3.11.5.final.0 | 3.11.5.final.0 |
   | toolz   | 0.12.1         | 0.12.0         | 0.12.0         |
   +---------+----------------+----------------+----------------+
     warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))



Client

Client-589eb004-c406-11ee-8584-001e67f120a8

Connection method: Direct
                   Dashboard:  <a href="http://192.168.100.5:41069/status" target="_blank">http://192.168.100.5:41069/status</a>



           <details>
<summary style="margin-bottom: 20px;">

Scheduler Info

</summary>

Scheduler

Scheduler-24c09727-b6fc-4703-b25c-6d3bfbf750ee

                       Comm: tls://192.168.100.5:42166
                       Workers: 5
                       Dashboard: <a href="http://192.168.100.5:41069/status" target="_blank">http://192.168.100.5:41069/status</a>
                       Total threads: 5
                       Started: 42 minutes ago
                       Total memory: 9.30 GiB
   <details style="margin-left: 48px;">
       <summary style="margin-bottom: 20px;">

Workers

       </summary>


           <details>
               <summary>

Worker: SecureHTCondor-0

               </summary>
                           Comm:  tls://192.168.101.127:38421
                           Total threads:  1
                           Dashboard:  <a href="http://192.168.101.127:34923/status" target="_blank">http://192.168.101.127:34923/status</a>
                           Memory:  1.86 GiB
                           Nanny:  tls://192.168.101.127:38180
                           Local directory:  /tmp/dask-scratch-space/worker-vzw0w_hh
                           Tasks executing:  
                           Tasks in memory:  
                           Tasks ready:  
                           Tasks in flight: 
                           CPU usage: 4.0%
                           Last seen:  Just now
                           Memory usage:  125.28 MiB
                           Spilled bytes:  0 B
                           Read bytes:  604.32 kiB
                           Write bytes:  0.95 MiB
           </details>
           <details>
               <summary>

Worker: SecureHTCondor-1

               </summary>
                           Comm:  tls://192.168.102.27:46440
                           Total threads:  1
                           Dashboard:  <a href="http://192.168.102.27:35027/status" target="_blank">http://192.168.102.27:35027/status</a>
                           Memory:  1.86 GiB
                           Nanny:  tls://192.168.102.27:44026
                           Local directory:  /tmp/dask-scratch-space/worker-nj1hcr7h
                           Tasks executing:  
                           Tasks in memory:  
                           Tasks ready:  
                           Tasks in flight: 
                           CPU usage: 2.0%
                           Last seen:  Just now
                           Memory usage:  122.96 MiB
                           Spilled bytes:  0 B
                           Read bytes:  17.31 MiB
                           Write bytes:  182.89 kiB
           </details>
           <details>
               <summary>

Worker: SecureHTCondor-2

               </summary>
                           Comm:  tls://192.168.102.37:36278
                           Total threads:  1
                           Dashboard:  <a href="http://192.168.102.37:41285/status" target="_blank">http://192.168.102.37:41285/status</a>
                           Memory:  1.86 GiB
                           Nanny:  tls://192.168.102.37:36316
                           Local directory:  /tmp/dask-scratch-space/worker-l2021n4v
                           Tasks executing:  
                           Tasks in memory:  
                           Tasks ready:  
                           Tasks in flight: 
                           CPU usage: 2.0%
                           Last seen:  Just now
                           Memory usage:  121.28 MiB
                           Spilled bytes:  0 B
                           Read bytes:  79.03 kiB
                           Write bytes:  102.23 kiB
           </details>
           <details>
               <summary>

Worker: SecureHTCondor-3

               </summary>
                           Comm:  tls://192.168.100.12:45248
                           Total threads:  1
                           Dashboard:  <a href="http://192.168.100.12:43200/status" target="_blank">http://192.168.100.12:43200/status</a>
                           Memory:  1.86 GiB
                           Nanny:  tls://192.168.100.12:36383
                           Local directory:  /tmp/dask-scratch-space/worker-0v5jilfv
                           Tasks executing:  
                           Tasks in memory:  
                           Tasks ready:  
                           Tasks in flight: 
                           CPU usage: 2.0%
                           Last seen:  Just now
                           Memory usage:  125.12 MiB
                           Spilled bytes:  0 B
                           Read bytes:  3.26 MiB
                           Write bytes:  3.21 MiB
           </details>
           <details>
               <summary>

Worker: SecureHTCondor-4

               </summary>
                           Comm:  tls://192.168.100.14:42847
                           Total threads:  1
                           Dashboard:  <a href="http://192.168.100.14:36788/status" target="_blank">http://192.168.100.14:36788/status</a>
                           Memory:  1.86 GiB
                           Nanny:  tls://192.168.100.14:46489
                           Local directory:  /tmp/dask-scratch-space/worker-f3u75i_8
                           Tasks executing:  
                           Tasks in memory:  
                           Tasks ready:  
                           Tasks in flight: 
                           CPU usage: 2.0%
                           Last seen:  Just now
                           Memory usage:  121.30 MiB
                           Spilled bytes:  0 B
                           Read bytes:  20.58 MiB
                           Write bytes:  4.19 MiB
           </details>


   </details>
           </details>



   2024-02-05 11:21:41,076 - distributed.client - ERROR - Failed to reconnect to scheduler after 30.00 seconds, closing client


  1. Troubleshooting
    1. Compatibility issues

If you try to connect a notebook to a Dask cluster, and the notebook's environment is different from the one used to launch de cluster, you may encounter compatibility issues.

You will tipycally receive a "Mismatched versions found" warning like this:

``` /data/astro/scratch2/torradeflot/envs/dask/lib/python3.11/site-packages/distributed/client.py:1388: VersionMismatchWarning: Mismatched versions found

+---------+----------------+----------------+----------------+ | Package | Client | Scheduler | Workers | +---------+----------------+----------------+----------------+ | lz4 | 4.3.3 | 4.3.2 | 4.3.2 | | msgpack | 1.0.7 | 1.0.5 | 1.0.5 | | numpy | 1.26.3 | 1.24.3 | 1.24.3 | | pandas | 2.2.0 | 2.0.2 | 2.0.2 | | python | 3.11.7.final.0 | 3.11.5.final.0 | 3.11.5.final.0 | | toolz | 0.12.1 | 0.12.0 | 0.12.0 | +---------+----------------+----------------+----------------+

 warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))

```

Some mismatches might be blocking, it is recommended to match the major and minor versions. A mismatch in the patch version shouldn't be a problem.

    1. Problems with securitization

If you launch a Dask cluster from a notebook but you have never launched a cluster from the dask-labextension or using the `pic_jupyterhub` module, you may encounter a problem because encryption is enforced but the certificates do not exist.

If this is the case, you can launch a cluster using one of these options as shown above. This will generate the certificate files and the subsequent creation of a Dask cluster from a notebook should succeed.

  1. Examples
    1. Dask example 1

picked from https://docs.dask.org/en/stable/10-minutes-to-dask.html


```python import numpy as np import pandas as pd

import dask.dataframe as dd import dask.array as da import dask.bag as db ```

      1. DataFrame


```python index = pd.date_range("2021-09-01", periods=2400, freq="1H") df = pd.DataFrame({"a": np.arange(2400), "b": list("abcaddbe" * 300)}, index=index) ddf = dd.from_pandas(df, npartitions=10) ddf ```

   /tmp/ipykernel_291/1125038151.py:1: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead.
     index = pd.date_range("2021-09-01", periods=2400, freq="1H")



Dask DataFrame Structure:

<style scoped>

   .dataframe tbody tr th:only-of-type {
       vertical-align: middle;
   }
   .dataframe tbody tr th {
       vertical-align: top;
   }
   .dataframe thead th {
       text-align: right;
   }

</style>

<thead> </thead> <tbody> </tbody>
a b
npartitions=10
2021-09-01 00:00:00 int64 object
2021-09-11 00:00:00 ... ...
... ... ...
2021-11-30 00:00:00 ... ...
2021-12-09 23:00:00 ... ...
Dask Name: from_pandas, 1 graph layer



```python ddf.divisions ```



   (Timestamp('2021-09-01 00:00:00'),
    Timestamp('2021-09-11 00:00:00'),
    Timestamp('2021-09-21 00:00:00'),
    Timestamp('2021-10-01 00:00:00'),
    Timestamp('2021-10-11 00:00:00'),
    Timestamp('2021-10-21 00:00:00'),
    Timestamp('2021-10-31 00:00:00'),
    Timestamp('2021-11-10 00:00:00'),
    Timestamp('2021-11-20 00:00:00'),
    Timestamp('2021-11-30 00:00:00'),
    Timestamp('2021-12-09 23:00:00'))



```python ddf.partitions[1] ```



Dask DataFrame Structure:

<style scoped>

   .dataframe tbody tr th:only-of-type {
       vertical-align: middle;
   }
   .dataframe tbody tr th {
       vertical-align: top;
   }
   .dataframe thead th {
       text-align: right;
   }

</style>

<thead> </thead> <tbody> </tbody>
a b
npartitions=1
2021-09-11 int64 object
2021-09-21 ... ...
Dask Name: blocks, 2 graph layers



```python ddf["2000-10-01": "2021-10-09 5:00"].compute() ```



<style scoped>

   .dataframe tbody tr th:only-of-type {
       vertical-align: middle;
   }
   .dataframe tbody tr th {
       vertical-align: top;
   }
   .dataframe thead th {
       text-align: right;
   }

</style>

<thead> </thead> <tbody> </tbody>
a b
2021-09-01 00:00:00 0 a
2021-09-01 01:00:00 1 b
2021-09-01 02:00:00 2 c
2021-09-01 03:00:00 3 a
2021-09-01 04:00:00 4 d
... ... ...
2021-10-09 01:00:00 913 b
2021-10-09 02:00:00 914 c
2021-10-09 03:00:00 915 a
2021-10-09 04:00:00 916 d
2021-10-09 05:00:00 917 d

918 rows × 2 columns


      1. Array


```python import numpy as np import dask.array as da

data = np.arange(100_000).reshape(200, 500) a = da.from_array(data, chunks=(100, 100)) a ```



<thead> </thead> <tbody> </tbody>
Array Chunk
Bytes 781.25 kiB 78.12 kiB
Shape (200, 500) (100, 100)
Dask graph 10 chunks in 1 graph layer
Data type int64 numpy.ndarray
       <svg width="170" height="98" style="stroke:rgb(0,0,0);stroke-width:1" >
 <line x1="0" y1="0" x2="120" y2="0" style="stroke-width:2" />
 <line x1="0" y1="24" x2="120" y2="24" />
 <line x1="0" y1="48" x2="120" y2="48" style="stroke-width:2" />
 <line x1="0" y1="0" x2="0" y2="48" style="stroke-width:2" />
 <line x1="24" y1="0" x2="24" y2="48" />
 <line x1="48" y1="0" x2="48" y2="48" />
 <line x1="72" y1="0" x2="72" y2="48" />
 <line x1="96" y1="0" x2="96" y2="48" />
 <line x1="120" y1="0" x2="120" y2="48" style="stroke-width:2" />
 <polygon points="0.0,0.0 120.0,0.0 120.0,48.0 0.0,48.0" style="fill:#ECB172A0;stroke-width:0"/>
 <text x="60.000000" y="68.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" >500</text>
 <text x="140.000000" y="24.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,140.000000,24.000000)">200</text>

</svg>



```python a.chunks ```



   ((100, 100), (100, 100, 100, 100, 100))



```python a.blocks[1, 3] ```



<thead> </thead> <tbody> </tbody>
Array Chunk
Bytes 78.12 kiB 78.12 kiB
Shape (100, 100) (100, 100)
Dask graph 1 chunks in 2 graph layers
Data type int64 numpy.ndarray
       <svg width="170" height="170" style="stroke:rgb(0,0,0);stroke-width:1" >
 <line x1="0" y1="0" x2="120" y2="0" style="stroke-width:2" />
 <line x1="0" y1="120" x2="120" y2="120" style="stroke-width:2" />
 <line x1="0" y1="0" x2="0" y2="120" style="stroke-width:2" />
 <line x1="120" y1="0" x2="120" y2="120" style="stroke-width:2" />
 <polygon points="0.0,0.0 120.0,0.0 120.0,120.0 0.0,120.0" style="fill:#ECB172A0;stroke-width:0"/>
 <text x="60.000000" y="140.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" >100</text>
 <text x="140.000000" y="60.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,140.000000,60.000000)">100</text>

</svg>



```python a[:50, 200] ```



<thead> </thead> <tbody> </tbody>
Array Chunk
Bytes 400 B 400 B
Shape (50,) (50,)
Dask graph 1 chunks in 2 graph layers
Data type int64 numpy.ndarray
       <svg width="170" height="79" style="stroke:rgb(0,0,0);stroke-width:1" >
 <line x1="0" y1="0" x2="120" y2="0" style="stroke-width:2" />
 <line x1="0" y1="29" x2="120" y2="29" style="stroke-width:2" />
 <line x1="0" y1="0" x2="0" y2="29" style="stroke-width:2" />
 <line x1="120" y1="0" x2="120" y2="29" style="stroke-width:2" />
 <polygon points="0.0,0.0 120.0,0.0 120.0,29.030629010473877 0.0,29.030629010473877" style="fill:#ECB172A0;stroke-width:0"/>
 <text x="60.000000" y="49.030629" font-size="1.0rem" font-weight="100" text-anchor="middle" >50</text>
 <text x="140.000000" y="14.515315" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,140.000000,14.515315)">1</text>

</svg>



```python a[:50, 200].compute() ```



   array([  200,   700,  1200,  1700,  2200,  2700,  3200,  3700,  4200,
           4700,  5200,  5700,  6200,  6700,  7200,  7700,  8200,  8700,
           9200,  9700, 10200, 10700, 11200, 11700, 12200, 12700, 13200,
          13700, 14200, 14700, 15200, 15700, 16200, 16700, 17200, 17700,
          18200, 18700, 19200, 19700, 20200, 20700, 21200, 21700, 22200,
          22700, 23200, 23700, 24200, 24700])



```python a.mean() a.mean().compute() np.sin(a) np.sin(a).compute() a.T a.T.compute() ```



   array([[    0,   500,  1000, ..., 98500, 99000, 99500],
          [    1,   501,  1001, ..., 98501, 99001, 99501],
          [    2,   502,  1002, ..., 98502, 99002, 99502],
          ...,
          [  497,   997,  1497, ..., 98997, 99497, 99997],
          [  498,   998,  1498, ..., 98998, 99498, 99998],
          [  499,   999,  1499, ..., 98999, 99499, 99999]])



```python b = a.max(axis=1)[::-1] + 10 ```


```python b[:10].compute() ```



   array([100009,  99509,  99009,  98509,  98009,  97509,  97009,  96509,
           96009,  95509])



```python b.dask ```



           <svg width="76" height="71" viewBox="0 0 76 71" fill="none" xmlns="http://www.w3.org/2000/svg">
               <circle cx="61.5" cy="36.5" r="13.5" style="stroke: var(--jp-ui-font-color2, #1D1D1D); fill: var(--jp-layout-color1, #F2F2F2);" stroke-width="2"/>
               <circle cx="14.5" cy="14.5" r="13.5" style="stroke: var(--jp-ui-font-color2, #1D1D1D); fill: var(--jp-layout-color1, #F2F2F2);" stroke-width="2"/>
               <circle cx="14.5" cy="56.5" r="13.5" style="stroke: var(--jp-ui-font-color2, #1D1D1D); fill: var(--jp-layout-color1, #F2F2F2);" stroke-width="2"/>
               <path d="M28 16L30.5 16C33.2614 16 35.5 18.2386 35.5 21L35.5 32.0001C35.5 34.7615 37.7386 37.0001 40.5 37.0001L43 37.0001" style="stroke: var(--jp-ui-font-color2, #1D1D1D);" stroke-width="1.5"/>
               <path d="M40.5 37L40.5 37.75L40.5 37.75L40.5 37ZM35.5 42L36.25 42L35.5 42ZM35.5 52L34.75 52L35.5 52ZM30.5 57L30.5 57.75L30.5 57ZM41.5001 36.25L40.5 36.25L40.5 37.75L41.5001 37.75L41.5001 36.25ZM34.75 42L34.75 52L36.25 52L36.25 42L34.75 42ZM30.5 56.25L28.0001 56.25L28.0001 57.75L30.5 57.75L30.5 56.25ZM34.75 52C34.75 54.3472 32.8472 56.25 30.5 56.25L30.5 57.75C33.6756 57.75 36.25 55.1756 36.25 52L34.75 52ZM40.5 36.25C37.3244 36.25 34.75 38.8243 34.75 42L36.25 42C36.25 39.6528 38.1528 37.75 40.5 37.75L40.5 36.25Z" style="fill: var(--jp-ui-font-color2, #1D1D1D);"/>
               <circle cx="28" cy="16" r="2.25" fill="#E5E5E5" style="stroke: var(--jp-ui-font-color2, #1D1D1D);" stroke-width="1.5"/>
               <circle cx="28" cy="57" r="2.25" fill="#E5E5E5" style="stroke: var(--jp-ui-font-color2, #1D1D1D);" stroke-width="1.5"/>
               <path d="M45.25 36.567C45.5833 36.7594 45.5833 37.2406 45.25 37.433L42.25 39.1651C41.9167 39.3575 41.5 39.117 41.5 38.7321V35.2679C41.5 34.883 41.9167 34.6425 42.25 34.8349L45.25 36.567Z" style="fill: var(--jp-ui-font-color2, #1D1D1D);"/>
           </svg>

HighLevelGraph

HighLevelGraph with 6 layers and 30 keys from all layers.

   <svg width="24" height="24" viewBox="0 0 32 32" fill="none" xmlns="http://www.w3.org/2000/svg" style="position: absolute;">
       <circle cx="16" cy="16" r="14" fill="#8F8F8F" style="stroke: var(--jp-ui-font-color2, #1D1D1D);" stroke-width="2"/>
   </svg>
   <details style="margin-left: 32px;">
       <summary style="margin-bottom: 10px; margin-top: 10px;">

Layer1: array

       </summary>

array-5f1b3c0ca172b03296ed6d431d3c9df7

layer_type MaterializedLayer
is_materialized True
number of outputs 10
shape (200, 500)
dtype int64
chunksize (100, 100)
type dask.array.core.Array
chunk_type numpy.ndarray
               <svg width="250" height="130" style="stroke:rgb(0,0,0);stroke-width:1" >
 <line x1="0" y1="0" x2="200" y2="0" style="stroke-width:2" />
 <line x1="0" y1="40" x2="200" y2="40" />
 <line x1="0" y1="80" x2="200" y2="80" style="stroke-width:2" />
 <line x1="0" y1="0" x2="0" y2="80" style="stroke-width:2" />
 <line x1="40" y1="0" x2="40" y2="80" />
 <line x1="80" y1="0" x2="80" y2="80" />
 <line x1="120" y1="0" x2="120" y2="80" />
 <line x1="160" y1="0" x2="160" y2="80" />
 <line x1="200" y1="0" x2="200" y2="80" style="stroke-width:2" />
 <polygon points="0.0,0.0 200.0,0.0 200.0,80.0 0.0,80.0" style="fill:#ECB172A0;stroke-width:0"/>
 <text x="100.000000" y="100.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" >500</text>
 <text x="220.000000" y="40.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,220.000000,40.000000)">200</text>

</svg>

   </details>
   <svg width="24" height="24" viewBox="0 0 32 32" fill="none" xmlns="http://www.w3.org/2000/svg" style="position: absolute;">
       <circle cx="16" cy="16" r="14" style="stroke: var(--jp-ui-font-color2, #1D1D1D); fill: var(--jp-layout-color1, #F2F2F2);" stroke-width="2" />
   </svg>
   <details style="margin-left: 32px;">
       <summary style="margin-bottom: 10px; margin-top: 10px;">

Layer2: chunk_max

       </summary>

chunk_max-22359f6e7c214a78df6cd2673d3d603f

layer_type Blockwise
is_materialized False
number of outputs 10
shape (200, 500)
dtype int64
chunksize (100, 100)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on array-5f1b3c0ca172b03296ed6d431d3c9df7
               <svg width="250" height="130" style="stroke:rgb(0,0,0);stroke-width:1" >
 <line x1="0" y1="0" x2="200" y2="0" style="stroke-width:2" />
 <line x1="0" y1="40" x2="200" y2="40" />
 <line x1="0" y1="80" x2="200" y2="80" style="stroke-width:2" />
 <line x1="0" y1="0" x2="0" y2="80" style="stroke-width:2" />
 <line x1="40" y1="0" x2="40" y2="80" />
 <line x1="80" y1="0" x2="80" y2="80" />
 <line x1="120" y1="0" x2="120" y2="80" />
 <line x1="160" y1="0" x2="160" y2="80" />
 <line x1="200" y1="0" x2="200" y2="80" style="stroke-width:2" />
 <polygon points="0.0,0.0 200.0,0.0 200.0,80.0 0.0,80.0" style="fill:#ECB172A0;stroke-width:0"/>
 <text x="100.000000" y="100.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" >500</text>
 <text x="220.000000" y="40.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,220.000000,40.000000)">200</text>

</svg>

   </details>
   <svg width="24" height="24" viewBox="0 0 32 32" fill="none" xmlns="http://www.w3.org/2000/svg" style="position: absolute;">
       <circle cx="16" cy="16" r="14" fill="#8F8F8F" style="stroke: var(--jp-ui-font-color2, #1D1D1D);" stroke-width="2"/>
   </svg>
   <details style="margin-left: 32px;">
       <summary style="margin-bottom: 10px; margin-top: 10px;">

Layer3: chunk_max-partial

       </summary>

chunk_max-partial-1dbed0d768d7fbafb3cb682ddb58870a

layer_type MaterializedLayer
is_materialized True
number of outputs 4
shape (200, 2)
dtype int64
chunksize (100, 1)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on chunk_max-22359f6e7c214a78df6cd2673d3d603f
               <svg width="92" height="250" style="stroke:rgb(0,0,0);stroke-width:1" >
 <line x1="0" y1="0" x2="42" y2="0" style="stroke-width:2" />
 <line x1="0" y1="100" x2="42" y2="100" />
 <line x1="0" y1="200" x2="42" y2="200" style="stroke-width:2" />
 <line x1="0" y1="0" x2="0" y2="200" style="stroke-width:2" />
 <line x1="21" y1="0" x2="21" y2="200" />
 <line x1="42" y1="0" x2="42" y2="200" style="stroke-width:2" />
 <polygon points="0.0,0.0 42.354360857637474,0.0 42.354360857637474,200.0 0.0,200.0" style="fill:#ECB172A0;stroke-width:0"/>
 <text x="21.177180" y="220.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" >2</text>
 <text x="62.354361" y="100.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,62.354361,100.000000)">200</text>

</svg>

   </details>
   <svg width="24" height="24" viewBox="0 0 32 32" fill="none" xmlns="http://www.w3.org/2000/svg" style="position: absolute;">
       <circle cx="16" cy="16" r="14" fill="#8F8F8F" style="stroke: var(--jp-ui-font-color2, #1D1D1D);" stroke-width="2"/>
   </svg>
   <details style="margin-left: 32px;">
       <summary style="margin-bottom: 10px; margin-top: 10px;">

Layer4: max-aggregate

       </summary>

max-aggregate-f1f4671fcd497326d08ac769e2eeff52

layer_type MaterializedLayer
is_materialized True
number of outputs 2
shape (200,)
dtype int64
chunksize (100,)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on chunk_max-partial-1dbed0d768d7fbafb3cb682ddb58870a
               <svg width="250" height="92" style="stroke:rgb(0,0,0);stroke-width:1" >
 <line x1="0" y1="0" x2="200" y2="0" style="stroke-width:2" />
 <line x1="0" y1="42" x2="200" y2="42" style="stroke-width:2" />
 <line x1="0" y1="0" x2="0" y2="42" style="stroke-width:2" />
 <line x1="100" y1="0" x2="100" y2="42" />
 <line x1="200" y1="0" x2="200" y2="42" style="stroke-width:2" />
 <polygon points="0.0,0.0 200.0,0.0 200.0,42.354360857637474 0.0,42.354360857637474" style="fill:#ECB172A0;stroke-width:0"/>
 <text x="100.000000" y="62.354361" font-size="1.0rem" font-weight="100" text-anchor="middle" >200</text>
 <text x="220.000000" y="21.177180" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,220.000000,21.177180)">1</text>

</svg>

   </details>
   <svg width="24" height="24" viewBox="0 0 32 32" fill="none" xmlns="http://www.w3.org/2000/svg" style="position: absolute;">
       <circle cx="16" cy="16" r="14" fill="#8F8F8F" style="stroke: var(--jp-ui-font-color2, #1D1D1D);" stroke-width="2"/>
   </svg>
   <details style="margin-left: 32px;">
       <summary style="margin-bottom: 10px; margin-top: 10px;">

Layer5: getitem

       </summary>

getitem-65be51186a8fbeccfd51c0cf1245f77e

layer_type MaterializedLayer
is_materialized True
number of outputs 2
shape (200,)
dtype int64
chunksize (100,)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on max-aggregate-f1f4671fcd497326d08ac769e2eeff52
               <svg width="250" height="92" style="stroke:rgb(0,0,0);stroke-width:1" >
 <line x1="0" y1="0" x2="200" y2="0" style="stroke-width:2" />
 <line x1="0" y1="42" x2="200" y2="42" style="stroke-width:2" />
 <line x1="0" y1="0" x2="0" y2="42" style="stroke-width:2" />
 <line x1="100" y1="0" x2="100" y2="42" />
 <line x1="200" y1="0" x2="200" y2="42" style="stroke-width:2" />
 <polygon points="0.0,0.0 200.0,0.0 200.0,42.354360857637474 0.0,42.354360857637474" style="fill:#ECB172A0;stroke-width:0"/>
 <text x="100.000000" y="62.354361" font-size="1.0rem" font-weight="100" text-anchor="middle" >200</text>
 <text x="220.000000" y="21.177180" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,220.000000,21.177180)">1</text>

</svg>

   </details>
   <svg width="24" height="24" viewBox="0 0 32 32" fill="none" xmlns="http://www.w3.org/2000/svg" style="position: absolute;">
       <circle cx="16" cy="16" r="14" style="stroke: var(--jp-ui-font-color2, #1D1D1D); fill: var(--jp-layout-color1, #F2F2F2);" stroke-width="2" />
   </svg>
   <details style="margin-left: 32px;">
       <summary style="margin-bottom: 10px; margin-top: 10px;">

Layer6: add

       </summary>

add-c42a5ea4600e83a48a31b107d8933476

layer_type Blockwise
is_materialized False
number of outputs 2
shape (200,)
dtype int64
chunksize (100,)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on getitem-65be51186a8fbeccfd51c0cf1245f77e
               <svg width="250" height="92" style="stroke:rgb(0,0,0);stroke-width:1" >
 <line x1="0" y1="0" x2="200" y2="0" style="stroke-width:2" />
 <line x1="0" y1="42" x2="200" y2="42" style="stroke-width:2" />
 <line x1="0" y1="0" x2="0" y2="42" style="stroke-width:2" />
 <line x1="100" y1="0" x2="100" y2="42" />
 <line x1="200" y1="0" x2="200" y2="42" style="stroke-width:2" />
 <polygon points="0.0,0.0 200.0,0.0 200.0,42.354360857637474 0.0,42.354360857637474" style="fill:#ECB172A0;stroke-width:0"/>
 <text x="100.000000" y="62.354361" font-size="1.0rem" font-weight="100" text-anchor="middle" >200</text>
 <text x="220.000000" y="21.177180" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,220.000000,21.177180)">1</text>

</svg>

   </details>


    1. Dask example 2


```python import dask.array as da x = da.random.random((30_000, 30_000), chunks=(1000, 1000)) x ```



<thead> </thead> <tbody> </tbody>
Array Chunk
Bytes 6.71 GiB 7.63 MiB
Shape (30000, 30000) (1000, 1000)
Dask graph 900 chunks in 1 graph layer
Data type float64 numpy.ndarray
       <svg width="170" height="170" style="stroke:rgb(0,0,0);stroke-width:1" >
 <line x1="0" y1="0" x2="120" y2="0" style="stroke-width:2" />
 <line x1="0" y1="4" x2="120" y2="4" />
 <line x1="0" y1="12" x2="120" y2="12" />
 <line x1="0" y1="16" x2="120" y2="16" />
 <line x1="0" y1="24" x2="120" y2="24" />
 <line x1="0" y1="28" x2="120" y2="28" />
 <line x1="0" y1="36" x2="120" y2="36" />
 <line x1="0" y1="44" x2="120" y2="44" />
 <line x1="0" y1="48" x2="120" y2="48" />
 <line x1="0" y1="56" x2="120" y2="56" />
 <line x1="0" y1="60" x2="120" y2="60" />
 <line x1="0" y1="68" x2="120" y2="68" />
 <line x1="0" y1="72" x2="120" y2="72" />
 <line x1="0" y1="80" x2="120" y2="80" />
 <line x1="0" y1="88" x2="120" y2="88" />
 <line x1="0" y1="92" x2="120" y2="92" />
 <line x1="0" y1="100" x2="120" y2="100" />
 <line x1="0" y1="104" x2="120" y2="104" />
 <line x1="0" y1="112" x2="120" y2="112" />
 <line x1="0" y1="120" x2="120" y2="120" style="stroke-width:2" />
 <line x1="0" y1="0" x2="0" y2="120" style="stroke-width:2" />
 <line x1="4" y1="0" x2="4" y2="120" />
 <line x1="12" y1="0" x2="12" y2="120" />
 <line x1="16" y1="0" x2="16" y2="120" />
 <line x1="24" y1="0" x2="24" y2="120" />
 <line x1="28" y1="0" x2="28" y2="120" />
 <line x1="36" y1="0" x2="36" y2="120" />
 <line x1="44" y1="0" x2="44" y2="120" />
 <line x1="48" y1="0" x2="48" y2="120" />
 <line x1="56" y1="0" x2="56" y2="120" />
 <line x1="60" y1="0" x2="60" y2="120" />
 <line x1="68" y1="0" x2="68" y2="120" />
 <line x1="72" y1="0" x2="72" y2="120" />
 <line x1="80" y1="0" x2="80" y2="120" />
 <line x1="88" y1="0" x2="88" y2="120" />
 <line x1="92" y1="0" x2="92" y2="120" />
 <line x1="100" y1="0" x2="100" y2="120" />
 <line x1="104" y1="0" x2="104" y2="120" />
 <line x1="112" y1="0" x2="112" y2="120" />
 <line x1="120" y1="0" x2="120" y2="120" style="stroke-width:2" />
 <polygon points="0.0,0.0 120.0,0.0 120.0,120.0 0.0,120.0" style="fill:#8B4903A0;stroke-width:0"/>
 <text x="60.000000" y="140.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" >30000</text>
 <text x="140.000000" y="60.000000" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,140.000000,60.000000)">30000</text>

</svg>



```python y = x + x.T ```


```python y.sum().compute() ```



   899986697.6624435



```python y[:, :10].compute() ```



   array([[1.06732263, 1.44918471, 0.97747399, ..., 0.33531255, 0.81407412,
           1.18334048],
          [1.44918471, 0.75954053, 0.94024149, ..., 0.92522134, 0.47216556,
           1.0685656 ],
          [0.97747399, 0.94024149, 0.43425583, ..., 1.31146439, 1.13960695,
           1.17543105],
          ...,
          [0.94341018, 0.91293267, 0.4737179 , ..., 0.95118153, 0.68525142,
           0.9561541 ],
          [0.68742121, 0.76071136, 0.78407903, ..., 0.96844288, 0.57027703,
           0.44666467],
          [1.05163969, 1.174239  , 0.61947849, ..., 0.22588196, 0.7914612 ,
           0.8904959 ]])



```python

```