python-tinybrain.spec


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459

%global _empty_manifest_terminate_build 0
Name:		python-tinybrain
Version:	1.3.3
Release:	1
Summary:	Image pyramid generation specialized for connectomics data types and procedures.
License:	GNU General Public License v3 or later (GPLv3+)
URL:		https://github.com/seung-lab/tinybrain/
Source0:	https://mirrors.aliyun.com/pypi/web/packages/1b/aa/61077d4faf22ee795c228fd04da4a6060c382849ff410880976f3f1909b5/tinybrain-1.3.3.tar.gz

Requires:	python3-numpy

%description
[![Build Status](https://travis-ci.org/seung-lab/tinybrain.svg?branch=master)](https://travis-ci.org/seung-lab/tinybrain) [![PyPI version](https://badge.fury.io/py/tinybrain.svg)](https://badge.fury.io/py/tinybrain)  

# tinybrain

Image pyramid generation specialized for connectomics data types and procedures. If your brain wasn't tiny before, it will be now.  

```python 
import tinybrain 

img = load_3d_em_stack()

# 2x2 and 2x2x2 downsamples are on a fast path.
# e.g. (2,2), (2,2,1), (2,2,1,1), (2,2,2), (2,2,2,1)
img_pyramid = tinybrain.downsample_with_averaging(img, factor=(2,2,1), num_mips=5, sparse=False)

labels = load_3d_labels()
label_pyramid = tinybrain.downsample_segmentation(labels, factor=(2,2,1), num_mips=5, sparse=False))
```

## Installation 

```bash
pip install numpy
pip install tinybrain
```

## Motivation

Image hierarchy generation in connectomics uses a few different techniques for
visualizing data, but predominantly we create image pyramids of uint8 grayscale images using 2x2 average pooling and of uint8 to uint64 segmentation labels using 2x2 mode pooling. When images become very large and people wish to visualze upper mip levels using three axes at once, it becomes desirable to perform 2x2x2 downsamples to maintain isotropy.

It's possible to compute both of these using numpy, however as multiple packages found it useful to copy the downsample functions, it makes sense to formalize these functions into a seperate library located on PyPI.

Given the disparate circumstances that they will be used in, these functions should work 
fast as possible with low memory usage and avoid numerical issues such as integer truncation
while generating multiple mip levels.

## Considerations: downsample_with_averaging 

It's advisable to generate multiple mip levels at once rather than recursively computing
new images as for integer type images, this leads to integer truncation issues. In the common
case of 2x2x1 downsampling, a recursively computed image would lose 0.75 brightness per a 
mip level. Therefore, take advantage of the `num_mips` argument which strikes a balance
that limits integer truncation loss to once every 4 mip levels. This compromise allows
for the use of integer arithmatic and no more memory usage than 2x the input image including
the output downsamples. If you seek to eliminate the loss beyond 4 mip levels, try promoting 
the type before downsampling. 2x2x2x1 downsamples truncate every 8 mip levels.

A C++ high performance path is triggered for 2x2x1x1 and 2x2x2x1 downsample factors on uint8, uint16, float32, 
and float64 data types in Fortran order. Other factors, data types, and orderings are computed using a numpy pathway that is much slower and more memory intensive.

We also include a sparse mode for downsampling 2x2x2 patches, which prevents "ghosting" where one z-slice overlaps a black region on the next slice and becomes semi-transparent after downsampling. We deal with this by neglecting the background pixels from the averaging operation. 

### Example Benchmark 

On a 1024x1024x100 uint8 image I ran the following code. PIL and OpenCV are actually much faster than this benchmark shows because most of the time is spent writing to the numpy array. tinybrain has a large advantage working on 3D and 4D arrays. Of course, this is a very simple benchmark and it may be possible to tune each of these approaches. On single slices, Pillow was faster than tinybrain.

```python
img = np.load("image.npy")

s = time.time()
downsample_with_averaging(img, (2,2,1))
print("Original ", time.time() - s)

s = time.time()
out = tinybrain.downsample_with_averaging(img, (2,2,1))
print("tinybrain ", time.time() - s)

s = time.time()
out = np.zeros(shape=(512,512,100))
for z in range(img.shape[2]):
  out[:,:,z] = cv2.resize(img[:,:,z], dsize=(512, 512) )
print("OpenCV ", time.time() - s)

s = time.time()
out = np.zeros(shape=(512,512,100))
for z in range(img.shape[2]):
  pilimg = Image.fromarray(img[:,:,z])
  out[:,:,z] = pilimg.resize( (512, 512) )
print("Pillow ", time.time() - s)

# Method     Run Time             Rel. Perf.
# Original   1820 ms +/- 3.73 ms    1.0x
# tinybrain    67 ms +/- 0.40 ms   27.2x 
# OpenCV      469 ms +/- 1.12 ms    3.9x
# Pillow      937 ms +/- 7.63 ms    1.9x
```

Here's the output from `perf.py` on an Apple Silicon 2021 Macbook Pro M1.
Note that the image used was a random 2048x2048x64 array that was a uint8
for average pooling and a uint64 for mode pooling to represent real use cases more fairly. In the table, read it as 2D or 3D downsamples, generating a single or multiple mip levels, with sparse mode enabled or disabled. The speed values are in megavoxels per a second and are the mean of ten runs.


| dwnsmpl  |   mips  |   sparse  |   AVG (MVx/sec)  |   MODE (MVx/sec)  |
|----------|---------|-----------|------------------|-------------------|
|   2x2    |   1     |   N       |   3856.07        |   1057.87         |
|   2x2    |   2     |   N       |   2685.80        |   1062.69         |
|   2x2    |   1     |   Y       |   N/A            |   129.64          |
|   2x2    |   2     |   Y       |   N/A            |   81.62           |
|   2x2x2  |   1     |   N       |   4468.55        |   336.85          |
|   2x2x2  |   2     |   N       |   2867.80        |   298.45          |
|   2x2x2  |   1     |   Y       |   1389.47        |   337.87          |
|   2x2x2  |   2     |   Y       |   1259.58        |   293.84          |

As the downsampling code's performance is data dependent due to branching, I also used [`connectomics.npy`](https://github.com/seung-lab/connected-components-3d/blob/master/benchmarks/connectomics.npy.gz) (512<sup>3</sup> uint32 extended to uint64) to see how that affected performance. This data comes from mouse visual cortex and has many equal adjacent voxels. In this volume, the 2x2x2 non-sparse mode is much faster as the "instant" majority detection can skip examining half the voxels in many cases.

| dwnsmpl  |   mips  |   sparse  |   MODE (MVx/sec)  |
|----------|---------|-----------|-------------------|
|   2x2    |   1     |   N       |   1078.09         |
|   2x2    |   2     |   N       |   1030.90         |
|   2x2    |   1     |   Y       |   146.15          |
|   2x2    |   2     |   Y       |   69.25           |
|   2x2x2  |   1     |   N       |   1966.74         |
|   2x2x2  |   2     |   N       |   1790.60         |
|   2x2x2  |   1     |   Y       |   2041.96         |
|   2x2x2  |   2     |   Y       |   1758.42         |


## Considerations: downsample_segmentation 

The `downsample_segmentation` function performs mode pooling operations provided the downsample factor is a power of two, including in three dimensions. If the factor is a non-power of two, striding is used. The mode pooling, which is usually what you want, is computed recursively. Mode pooling is superior to striding, but the recursive calculation can introduce defects at mip levels higher than 1. This may be improved in the future.  

The way the calculation is actually done uses an ensemble of several different methods. For (2,2,1,1) and (2,2,2,1) downsamples, a Cython fast, low memory path is selected. (2,2,1,1) implements [*countless if*](https://towardsdatascience.com/countless-high-performance-2x-downsampling-of-labeled-images-using-python-and-numpy-e70ad3275589). (2,2,2,1) uses a combination of counting and "instant" majority detection. For (4,4,1) or other 2D powers of two, the [*countless 2d*](https://towardsdatascience.com/countless-high-performance-2x-downsampling-of-labeled-images-using-python-and-numpy-e70ad3275589) algorithm is used. For (4,4,4), (8,8,8) etc, the [*dynamic countless 3d*](https://towardsdatascience.com/countless-3d-vectorized-2x-downsampling-of-labeled-volume-images-using-python-and-numpy-59d686c2f75) algorithm is used. For 2D powers of two, [*stippled countless 2d*](https://medium.com/@willsilversmith/countless-2d-inflated-2x-downsampling-of-labeled-images-holding-zero-values-as-background-4d13a7675f2d) is used if the sparse flag is enabled. For all other configurations, striding is used.  

Countless 2d paths are also fast, but use slightly more memory and time. Countless 3D is okay for (2,2,2) and (4,4,4) but will use time and memory exponential in the product of dimensions. This state of affairs could be improved by implementing a counting based algorithm in Cython/C++ for arbitrary factors that doesn't compute recursively. The countless algorithms were developed before I knew how to write Cython and package libraries. However, C++ implementations of countless are much faster than counting for computing the first 2x2x1 mip level. In particular, an AVX2 SIMD implementation can saturate memory bandwidth.    

Documentation for the countless algorithm family is located here: https://github.com/william-silversmith/countless


%package -n python3-tinybrain
Summary:	Image pyramid generation specialized for connectomics data types and procedures.
Provides:	python-tinybrain
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
BuildRequires:	python3-cffi
BuildRequires:	gcc
BuildRequires:	gdb
%description -n python3-tinybrain
[![Build Status](https://travis-ci.org/seung-lab/tinybrain.svg?branch=master)](https://travis-ci.org/seung-lab/tinybrain) [![PyPI version](https://badge.fury.io/py/tinybrain.svg)](https://badge.fury.io/py/tinybrain)  

# tinybrain

Image pyramid generation specialized for connectomics data types and procedures. If your brain wasn't tiny before, it will be now.  

```python 
import tinybrain 

img = load_3d_em_stack()

# 2x2 and 2x2x2 downsamples are on a fast path.
# e.g. (2,2), (2,2,1), (2,2,1,1), (2,2,2), (2,2,2,1)
img_pyramid = tinybrain.downsample_with_averaging(img, factor=(2,2,1), num_mips=5, sparse=False)

labels = load_3d_labels()
label_pyramid = tinybrain.downsample_segmentation(labels, factor=(2,2,1), num_mips=5, sparse=False))
```

## Installation 

```bash
pip install numpy
pip install tinybrain
```

## Motivation

Image hierarchy generation in connectomics uses a few different techniques for
visualizing data, but predominantly we create image pyramids of uint8 grayscale images using 2x2 average pooling and of uint8 to uint64 segmentation labels using 2x2 mode pooling. When images become very large and people wish to visualze upper mip levels using three axes at once, it becomes desirable to perform 2x2x2 downsamples to maintain isotropy.

It's possible to compute both of these using numpy, however as multiple packages found it useful to copy the downsample functions, it makes sense to formalize these functions into a seperate library located on PyPI.

Given the disparate circumstances that they will be used in, these functions should work 
fast as possible with low memory usage and avoid numerical issues such as integer truncation
while generating multiple mip levels.

## Considerations: downsample_with_averaging 

It's advisable to generate multiple mip levels at once rather than recursively computing
new images as for integer type images, this leads to integer truncation issues. In the common
case of 2x2x1 downsampling, a recursively computed image would lose 0.75 brightness per a 
mip level. Therefore, take advantage of the `num_mips` argument which strikes a balance
that limits integer truncation loss to once every 4 mip levels. This compromise allows
for the use of integer arithmatic and no more memory usage than 2x the input image including
the output downsamples. If you seek to eliminate the loss beyond 4 mip levels, try promoting 
the type before downsampling. 2x2x2x1 downsamples truncate every 8 mip levels.

A C++ high performance path is triggered for 2x2x1x1 and 2x2x2x1 downsample factors on uint8, uint16, float32, 
and float64 data types in Fortran order. Other factors, data types, and orderings are computed using a numpy pathway that is much slower and more memory intensive.

We also include a sparse mode for downsampling 2x2x2 patches, which prevents "ghosting" where one z-slice overlaps a black region on the next slice and becomes semi-transparent after downsampling. We deal with this by neglecting the background pixels from the averaging operation. 

### Example Benchmark 

On a 1024x1024x100 uint8 image I ran the following code. PIL and OpenCV are actually much faster than this benchmark shows because most of the time is spent writing to the numpy array. tinybrain has a large advantage working on 3D and 4D arrays. Of course, this is a very simple benchmark and it may be possible to tune each of these approaches. On single slices, Pillow was faster than tinybrain.

```python
img = np.load("image.npy")

s = time.time()
downsample_with_averaging(img, (2,2,1))
print("Original ", time.time() - s)

s = time.time()
out = tinybrain.downsample_with_averaging(img, (2,2,1))
print("tinybrain ", time.time() - s)

s = time.time()
out = np.zeros(shape=(512,512,100))
for z in range(img.shape[2]):
  out[:,:,z] = cv2.resize(img[:,:,z], dsize=(512, 512) )
print("OpenCV ", time.time() - s)

s = time.time()
out = np.zeros(shape=(512,512,100))
for z in range(img.shape[2]):
  pilimg = Image.fromarray(img[:,:,z])
  out[:,:,z] = pilimg.resize( (512, 512) )
print("Pillow ", time.time() - s)

# Method     Run Time             Rel. Perf.
# Original   1820 ms +/- 3.73 ms    1.0x
# tinybrain    67 ms +/- 0.40 ms   27.2x 
# OpenCV      469 ms +/- 1.12 ms    3.9x
# Pillow      937 ms +/- 7.63 ms    1.9x
```

Here's the output from `perf.py` on an Apple Silicon 2021 Macbook Pro M1.
Note that the image used was a random 2048x2048x64 array that was a uint8
for average pooling and a uint64 for mode pooling to represent real use cases more fairly. In the table, read it as 2D or 3D downsamples, generating a single or multiple mip levels, with sparse mode enabled or disabled. The speed values are in megavoxels per a second and are the mean of ten runs.


| dwnsmpl  |   mips  |   sparse  |   AVG (MVx/sec)  |   MODE (MVx/sec)  |
|----------|---------|-----------|------------------|-------------------|
|   2x2    |   1     |   N       |   3856.07        |   1057.87         |
|   2x2    |   2     |   N       |   2685.80        |   1062.69         |
|   2x2    |   1     |   Y       |   N/A            |   129.64          |
|   2x2    |   2     |   Y       |   N/A            |   81.62           |
|   2x2x2  |   1     |   N       |   4468.55        |   336.85          |
|   2x2x2  |   2     |   N       |   2867.80        |   298.45          |
|   2x2x2  |   1     |   Y       |   1389.47        |   337.87          |
|   2x2x2  |   2     |   Y       |   1259.58        |   293.84          |

As the downsampling code's performance is data dependent due to branching, I also used [`connectomics.npy`](https://github.com/seung-lab/connected-components-3d/blob/master/benchmarks/connectomics.npy.gz) (512<sup>3</sup> uint32 extended to uint64) to see how that affected performance. This data comes from mouse visual cortex and has many equal adjacent voxels. In this volume, the 2x2x2 non-sparse mode is much faster as the "instant" majority detection can skip examining half the voxels in many cases.

| dwnsmpl  |   mips  |   sparse  |   MODE (MVx/sec)  |
|----------|---------|-----------|-------------------|
|   2x2    |   1     |   N       |   1078.09         |
|   2x2    |   2     |   N       |   1030.90         |
|   2x2    |   1     |   Y       |   146.15          |
|   2x2    |   2     |   Y       |   69.25           |
|   2x2x2  |   1     |   N       |   1966.74         |
|   2x2x2  |   2     |   N       |   1790.60         |
|   2x2x2  |   1     |   Y       |   2041.96         |
|   2x2x2  |   2     |   Y       |   1758.42         |


## Considerations: downsample_segmentation 

The `downsample_segmentation` function performs mode pooling operations provided the downsample factor is a power of two, including in three dimensions. If the factor is a non-power of two, striding is used. The mode pooling, which is usually what you want, is computed recursively. Mode pooling is superior to striding, but the recursive calculation can introduce defects at mip levels higher than 1. This may be improved in the future.  

The way the calculation is actually done uses an ensemble of several different methods. For (2,2,1,1) and (2,2,2,1) downsamples, a Cython fast, low memory path is selected. (2,2,1,1) implements [*countless if*](https://towardsdatascience.com/countless-high-performance-2x-downsampling-of-labeled-images-using-python-and-numpy-e70ad3275589). (2,2,2,1) uses a combination of counting and "instant" majority detection. For (4,4,1) or other 2D powers of two, the [*countless 2d*](https://towardsdatascience.com/countless-high-performance-2x-downsampling-of-labeled-images-using-python-and-numpy-e70ad3275589) algorithm is used. For (4,4,4), (8,8,8) etc, the [*dynamic countless 3d*](https://towardsdatascience.com/countless-3d-vectorized-2x-downsampling-of-labeled-volume-images-using-python-and-numpy-59d686c2f75) algorithm is used. For 2D powers of two, [*stippled countless 2d*](https://medium.com/@willsilversmith/countless-2d-inflated-2x-downsampling-of-labeled-images-holding-zero-values-as-background-4d13a7675f2d) is used if the sparse flag is enabled. For all other configurations, striding is used.  

Countless 2d paths are also fast, but use slightly more memory and time. Countless 3D is okay for (2,2,2) and (4,4,4) but will use time and memory exponential in the product of dimensions. This state of affairs could be improved by implementing a counting based algorithm in Cython/C++ for arbitrary factors that doesn't compute recursively. The countless algorithms were developed before I knew how to write Cython and package libraries. However, C++ implementations of countless are much faster than counting for computing the first 2x2x1 mip level. In particular, an AVX2 SIMD implementation can saturate memory bandwidth.    

Documentation for the countless algorithm family is located here: https://github.com/william-silversmith/countless


%package help
Summary:	Development documents and examples for tinybrain
Provides:	python3-tinybrain-doc
%description help
[![Build Status](https://travis-ci.org/seung-lab/tinybrain.svg?branch=master)](https://travis-ci.org/seung-lab/tinybrain) [![PyPI version](https://badge.fury.io/py/tinybrain.svg)](https://badge.fury.io/py/tinybrain)  

# tinybrain

Image pyramid generation specialized for connectomics data types and procedures. If your brain wasn't tiny before, it will be now.  

```python 
import tinybrain 

img = load_3d_em_stack()

# 2x2 and 2x2x2 downsamples are on a fast path.
# e.g. (2,2), (2,2,1), (2,2,1,1), (2,2,2), (2,2,2,1)
img_pyramid = tinybrain.downsample_with_averaging(img, factor=(2,2,1), num_mips=5, sparse=False)

labels = load_3d_labels()
label_pyramid = tinybrain.downsample_segmentation(labels, factor=(2,2,1), num_mips=5, sparse=False))
```

## Installation 

```bash
pip install numpy
pip install tinybrain
```

## Motivation

Image hierarchy generation in connectomics uses a few different techniques for
visualizing data, but predominantly we create image pyramids of uint8 grayscale images using 2x2 average pooling and of uint8 to uint64 segmentation labels using 2x2 mode pooling. When images become very large and people wish to visualze upper mip levels using three axes at once, it becomes desirable to perform 2x2x2 downsamples to maintain isotropy.

It's possible to compute both of these using numpy, however as multiple packages found it useful to copy the downsample functions, it makes sense to formalize these functions into a seperate library located on PyPI.

Given the disparate circumstances that they will be used in, these functions should work 
fast as possible with low memory usage and avoid numerical issues such as integer truncation
while generating multiple mip levels.

## Considerations: downsample_with_averaging 

It's advisable to generate multiple mip levels at once rather than recursively computing
new images as for integer type images, this leads to integer truncation issues. In the common
case of 2x2x1 downsampling, a recursively computed image would lose 0.75 brightness per a 
mip level. Therefore, take advantage of the `num_mips` argument which strikes a balance
that limits integer truncation loss to once every 4 mip levels. This compromise allows
for the use of integer arithmatic and no more memory usage than 2x the input image including
the output downsamples. If you seek to eliminate the loss beyond 4 mip levels, try promoting 
the type before downsampling. 2x2x2x1 downsamples truncate every 8 mip levels.

A C++ high performance path is triggered for 2x2x1x1 and 2x2x2x1 downsample factors on uint8, uint16, float32, 
and float64 data types in Fortran order. Other factors, data types, and orderings are computed using a numpy pathway that is much slower and more memory intensive.

We also include a sparse mode for downsampling 2x2x2 patches, which prevents "ghosting" where one z-slice overlaps a black region on the next slice and becomes semi-transparent after downsampling. We deal with this by neglecting the background pixels from the averaging operation. 

### Example Benchmark 

On a 1024x1024x100 uint8 image I ran the following code. PIL and OpenCV are actually much faster than this benchmark shows because most of the time is spent writing to the numpy array. tinybrain has a large advantage working on 3D and 4D arrays. Of course, this is a very simple benchmark and it may be possible to tune each of these approaches. On single slices, Pillow was faster than tinybrain.

```python
img = np.load("image.npy")

s = time.time()
downsample_with_averaging(img, (2,2,1))
print("Original ", time.time() - s)

s = time.time()
out = tinybrain.downsample_with_averaging(img, (2,2,1))
print("tinybrain ", time.time() - s)

s = time.time()
out = np.zeros(shape=(512,512,100))
for z in range(img.shape[2]):
  out[:,:,z] = cv2.resize(img[:,:,z], dsize=(512, 512) )
print("OpenCV ", time.time() - s)

s = time.time()
out = np.zeros(shape=(512,512,100))
for z in range(img.shape[2]):
  pilimg = Image.fromarray(img[:,:,z])
  out[:,:,z] = pilimg.resize( (512, 512) )
print("Pillow ", time.time() - s)

# Method     Run Time             Rel. Perf.
# Original   1820 ms +/- 3.73 ms    1.0x
# tinybrain    67 ms +/- 0.40 ms   27.2x 
# OpenCV      469 ms +/- 1.12 ms    3.9x
# Pillow      937 ms +/- 7.63 ms    1.9x
```

Here's the output from `perf.py` on an Apple Silicon 2021 Macbook Pro M1.
Note that the image used was a random 2048x2048x64 array that was a uint8
for average pooling and a uint64 for mode pooling to represent real use cases more fairly. In the table, read it as 2D or 3D downsamples, generating a single or multiple mip levels, with sparse mode enabled or disabled. The speed values are in megavoxels per a second and are the mean of ten runs.


| dwnsmpl  |   mips  |   sparse  |   AVG (MVx/sec)  |   MODE (MVx/sec)  |
|----------|---------|-----------|------------------|-------------------|
|   2x2    |   1     |   N       |   3856.07        |   1057.87         |
|   2x2    |   2     |   N       |   2685.80        |   1062.69         |
|   2x2    |   1     |   Y       |   N/A            |   129.64          |
|   2x2    |   2     |   Y       |   N/A            |   81.62           |
|   2x2x2  |   1     |   N       |   4468.55        |   336.85          |
|   2x2x2  |   2     |   N       |   2867.80        |   298.45          |
|   2x2x2  |   1     |   Y       |   1389.47        |   337.87          |
|   2x2x2  |   2     |   Y       |   1259.58        |   293.84          |

As the downsampling code's performance is data dependent due to branching, I also used [`connectomics.npy`](https://github.com/seung-lab/connected-components-3d/blob/master/benchmarks/connectomics.npy.gz) (512<sup>3</sup> uint32 extended to uint64) to see how that affected performance. This data comes from mouse visual cortex and has many equal adjacent voxels. In this volume, the 2x2x2 non-sparse mode is much faster as the "instant" majority detection can skip examining half the voxels in many cases.

| dwnsmpl  |   mips  |   sparse  |   MODE (MVx/sec)  |
|----------|---------|-----------|-------------------|
|   2x2    |   1     |   N       |   1078.09         |
|   2x2    |   2     |   N       |   1030.90         |
|   2x2    |   1     |   Y       |   146.15          |
|   2x2    |   2     |   Y       |   69.25           |
|   2x2x2  |   1     |   N       |   1966.74         |
|   2x2x2  |   2     |   N       |   1790.60         |
|   2x2x2  |   1     |   Y       |   2041.96         |
|   2x2x2  |   2     |   Y       |   1758.42         |


## Considerations: downsample_segmentation 

The `downsample_segmentation` function performs mode pooling operations provided the downsample factor is a power of two, including in three dimensions. If the factor is a non-power of two, striding is used. The mode pooling, which is usually what you want, is computed recursively. Mode pooling is superior to striding, but the recursive calculation can introduce defects at mip levels higher than 1. This may be improved in the future.  

The way the calculation is actually done uses an ensemble of several different methods. For (2,2,1,1) and (2,2,2,1) downsamples, a Cython fast, low memory path is selected. (2,2,1,1) implements [*countless if*](https://towardsdatascience.com/countless-high-performance-2x-downsampling-of-labeled-images-using-python-and-numpy-e70ad3275589). (2,2,2,1) uses a combination of counting and "instant" majority detection. For (4,4,1) or other 2D powers of two, the [*countless 2d*](https://towardsdatascience.com/countless-high-performance-2x-downsampling-of-labeled-images-using-python-and-numpy-e70ad3275589) algorithm is used. For (4,4,4), (8,8,8) etc, the [*dynamic countless 3d*](https://towardsdatascience.com/countless-3d-vectorized-2x-downsampling-of-labeled-volume-images-using-python-and-numpy-59d686c2f75) algorithm is used. For 2D powers of two, [*stippled countless 2d*](https://medium.com/@willsilversmith/countless-2d-inflated-2x-downsampling-of-labeled-images-holding-zero-values-as-background-4d13a7675f2d) is used if the sparse flag is enabled. For all other configurations, striding is used.  

Countless 2d paths are also fast, but use slightly more memory and time. Countless 3D is okay for (2,2,2) and (4,4,4) but will use time and memory exponential in the product of dimensions. This state of affairs could be improved by implementing a counting based algorithm in Cython/C++ for arbitrary factors that doesn't compute recursively. The countless algorithms were developed before I knew how to write Cython and package libraries. However, C++ implementations of countless are much faster than counting for computing the first 2x2x1 mip level. In particular, an AVX2 SIMD implementation can saturate memory bandwidth.    

Documentation for the countless algorithm family is located here: https://github.com/william-silversmith/countless


%prep
%autosetup -n tinybrain-1.3.3

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-tinybrain -f filelist.lst
%dir %{python3_sitearch}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Thu Jun 08 2023 Python_Bot <Python_Bot@openeuler.org> - 1.3.3-1
- Package Spec generated