Metamorphoses Liber I - I

Latina English
At ego tibi sermone isto Milesio varias fabulas conseram auresque tuas benivolas lepido susurro permulceam — modo si papyrum Aegyptiam argutia¹ Nilotici calami inscriptam non spreveris inspicere — , figuras fortunasque hominum in alias imagines conversas et in se rursus mutuo nexu refectas ut mireris.² Exordior. “Quis ille?” Paucis accipe. Hymettos Attica et Isthmos Ephyrea et Taenaros Spartiatica, glebae felices aeternum libris felicioribus conditae, mea vetus prosapia est; ibi linguam Atthidem primis pueritiae stipendiis merui. Mox in urbe Latia advena studiorum Quiritium indigenam sermonem aerumnabili labore nullo magistro praeeunte aggressus excolui. En ecce praefamur veniam, siquid exotici ac forensis sermonis rudis locutor offendero. Iam haec equidem ipsa vocis immutatio desultoriae scientiae stilo quem accessimus respondet.³ Fabulam Graecanicam incipimus. Lector intende: laetaberis. I will provide you with various tales of the Milesian manner and charm your benevolent ears with sweet whispers, if you would only not deign to look at this cleverly written Egyptian manuscript, written with the reed of the Nile. You will be amazed at the figures and fates of man, turned into other forms, and turned back with a mutual connection as well. I begin. You will ask: “who are you?”. I shall explain briefly. The happy lands of the Hymettos of Attica, the Isthmus of Corinth and of cape Taenaros of Sparta, forever recorded in even happier books, form my ancient lineage. It is here that I learned the Greek language during my youth. Thereafter, in the Latin city, I learned with great effort and without any teacher showing me the way the native language of the Romans. That’s why I ask for forgiveness in advance, in case I offend anyone with my exotic and public way of talking, being an unpolished speaker. Actually, it is this very change of my tongue that corresponds so well to the changeable and precarious style that I have adopted. We begin a Greek tale. Reader beware: you shall enjoy it.
  • 1: Argutia is sing. abl. of quality.
  • 2: Not sure what the meaning is of mutuo nexu.
  • 3: Vocis is sing. gen. of quality of immutatio. desultoriae scientia sing. gen. of quality of stilo.
    Desultoriae scientia means knowledge of haphazardness, disconnectedness.

Chunking: the key to scaling with Dask and Xarray

In this article, I’ll share my experience with Xarray’s stack method when working with large datasets and how I discovered a more efficient way of stacking per chunk (blockwise) to improve performance. Please note that this article is intended for readers who are already familiar with Dask and Xarray.

Chunk management is a critical aspect of optimizing performance when dealing with large datasets. However, when using Xarray’s standard stack method, I encountered a significant problem. The method resulted in a high number of interdependencies between the resulting Dask graph’s chunks, ultimately negatively impacting performance. Specifically, each output chunk was dependent on all input chunks along the dimension to be stacked, leading to a slow and inefficient computation process. To address this issue, I developed a custom blockwise stack function that improved performance by avoiding these interdependencies and enabling parallel computation.

It’s important to note that this is not a bug in Xarray. Their implementation is necessary for reproducibility with different chunking schemas. However, since I was focused on performance, I decided to develop my own solution.

To illustrate the problem and its solution, I’ll:

  • Create a dummy dataset.
  • Stack it with Xarray’s stack method.
  • Stack it with my own “blockwise_stack” function.
  • Try to describe what’s happening under the hood during each of the above steps.

I hope that by the end of this article, readers will have a better understanding of the issues surrounding Xarray’s stack method and how to improve performance when working with large datasets.

The dummy dataset.

The following is a small dummy dataset that includes a coordinate named “chunk_idx.” This coordinate specifies the chunk index for each xy value, allowing us to easily observe how the chunks are affected by stacking later on.

import xarray as xr
import dask.array as da
import numpy as np
from blockwise_stack import blockwise_stack


one_to_six = np.array([[1, 2, 3], [4, 5, 6]])


ds = xr.Dataset(
    {
        "data": (
            ["y", "x", "band"],
            da.random.random((4, 6, 3), chunks=(2, 2, -1)),
        )
    },
    coords={
        "chunk_idx": (
            ("y", "x"),
            np.repeat(np.repeat(one_to_six, 2, axis=0), 2, axis=1),
        )
    },
)
ds.data
<xarray.DataArray 'data' (y: 4, x: 6, band: 3)>
dask.array<random_sample, shape=(4, 6, 3), dtype=float64, chunksize=(2, 2, 3), chunktype=numpy.ndarray>
Coordinates:
    chunk_idx  (y, x) int64 1 1 2 2 3 3 1 1 2 2 3 3 4 4 5 5 6 6 4 4 5 5 6 6
Dimensions without coordinates: y, x, band

My very first post!

I’ve decided to start my own blog to share all kinds of thoughts, impressions and ideas that I have as they occur to me. It’s also my intention to publish here a summary of all the projects that I’m involved in, be it work related or not. Hopefully the content that I’ll write here will be useful, or at least entertaining, to some people.