%global _empty_manifest_terminate_build 0
Name:		python-clustergram
Version:	0.7.0
Release:	1
Summary:	Clustergram - visualization and diagnostics for cluster analysis
License:	MIT
URL:		https://pypi.org/project/clustergram/
Source0:	https://mirrors.aliyun.com/pypi/web/packages/3b/03/2bf3032fd8ae1f0201579d8d020099e62c30e62519a8c5f7ae73a1166b8e/clustergram-0.7.0.tar.gz
BuildArch:	noarch

Requires:	python3-pandas
Requires:	python3-numpy
Requires:	python3-matplotlib

%description
# Clustergram

![logo
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/main/doc/_static/logo.svg)

## Visualization and diagnostics for cluster analysis

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4750483.svg)](https://doi.org/10.5281/zenodo.4750483)

Clustergram is a diagram proposed by Matthias Schonlau in his paper *[The clustergram: A
graph for visualizing hierarchical and nonhierarchical cluster
analyses](https://journals.sagepub.com/doi/10.1177/1536867X0200200405)*:

> In hierarchical cluster analysis, dendrograms are used to visualize how clusters are
> formed. I propose an alternative graph called a “clustergram” to examine how cluster
> members are assigned to clusters as the number of clusters increases. This graph is
> useful in exploratory analysis for nonhierarchical clustering algorithms such as
> k-means and for hierarchical cluster algorithms when the number of observations is
> large enough to make dendrograms impractical.

The clustergram was later implemented in R by [Tal
Galili](https://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/),
who also gives a thorough explanation of the concept.

This is a Python implementation, originally based on Tal's script, written for
`scikit-learn` and RAPIDS `cuML` implementations of K-Means, Mini Batch K-Means and
Gaussian Mixture Model (scikit-learn only) clustering, plus hierarchical/agglomerative
clustering using `SciPy`. Alternatively, you can create clustergram using  `from_*`
constructors based on alternative clustering algorithms.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/martinfleis/clustergram/main?urlpath=tree/doc/notebooks/)

## Getting started

You can install clustergram from `conda` or `pip`:

```shell
conda install clustergram -c conda-forge
```

```shell
pip install clustergram
```

In any case, you still need to install your selected backend (`scikit-learn` and `scipy`
or `cuML`).

The example of clustergram on Palmer penguins dataset:

```python
import seaborn
df = seaborn.load_dataset('penguins')
```

First we have to select numerical data and scale them.

```python
from sklearn.preprocessing import scale
data = scale(df.drop(columns=['species', 'island', 'sex']).dropna())
```

And then we can simply pass the data to `clustergram`.

```python
from clustergram import Clustergram

cgram = Clustergram(range(1, 8))
cgram.fit(data)
cgram.plot()
```

![Default
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/default.png)

## Styling

`Clustergram.plot()` returns matplotlib axis and can be fully customised as any other
matplotlib plot.

```python
seaborn.set(style='whitegrid')

cgram.plot(
    ax=ax,
    size=0.5,
    linewidth=0.5,
    cluster_style={"color": "lightblue", "edgecolor": "black"},
    line_style={"color": "red", "linestyle": "-."},
    figsize=(12, 8)
)
```

![Colored
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/colors.png)

## Mean options

On the `y` axis, a clustergram can use mean values as in the original paper by Matthias
Schonlau or PCA weighted mean values as in the implementation by Tal Galili.

```python
cgram = Clustergram(range(1, 8))
cgram.fit(data)
cgram.plot(figsize=(12, 8), pca_weighted=True)
```

![Default
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/pca_true.png)

```python
cgram = Clustergram(range(1, 8))
cgram.fit(data)
cgram.plot(figsize=(12, 8), pca_weighted=False)
```

![Default
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/pca_false.png)

## Scikit-learn, SciPy and RAPIDS cuML backends

Clustergram offers three backends for the computation - `scikit-learn` and `scipy` which
use CPU and RAPIDS.AI `cuML`, which uses GPU. Note that all are optional dependencies
but you will need at least one of them to generate clustergram.

Using `scikit-learn` (default):

```python
cgram = Clustergram(range(1, 8), backend='sklearn')
cgram.fit(data)
cgram.plot()
```

Using `cuML`:

```python
cgram = Clustergram(range(1, 8), backend='cuML')
cgram.fit(data)
cgram.plot()
```

`data` can be all data types supported by the selected backend (including
`cudf.DataFrame` with `cuML` backend).

## Supported methods

Clustergram currently supports K-Means, Mini Batch K-Means, Gaussian Mixture Model and
SciPy's hierarchical clustering methods. Note tha GMM and Mini Batch K-Means are
supported only for `scikit-learn` backend and hierarchical methods are supported only
for `scipy` backend.

Using K-Means (default):

```python
cgram = Clustergram(range(1, 8), method='kmeans')
cgram.fit(data)
cgram.plot()
```

Using Mini Batch K-Means, which can provide significant speedup over K-Means:

```python
cgram = Clustergram(range(1, 8), method='minibatchkmeans', batch_size=100)
cgram.fit(data)
cgram.plot()
```

Using Gaussian Mixture Model:

```python
cgram = Clustergram(range(1, 8), method='gmm')
cgram.fit(data)
cgram.plot()
```

Using Ward's hierarchical clustering:

```python
cgram = Clustergram(range(1, 8), method='hierarchical', linkage='ward')
cgram.fit(data)
cgram.plot()
```

## Manual input

Alternatively, you can create clustergram using `from_data` or  `from_centers` methods
based on alternative clustering algorithms.

Using `Clustergram.from_data` which creates cluster centers as mean or median values:

```python
data = numpy.array([[-1, -1, 0, 10], [1, 1, 10, 2], [0, 0, 20, 4]])
labels = pandas.DataFrame({1: [0, 0, 0], 2: [0, 0, 1], 3: [0, 2, 1]})

cgram = Clustergram.from_data(data, labels)
cgram.plot()
```

Using `Clustergram.from_centers` based on explicit cluster centers.:

```python
labels = pandas.DataFrame({1: [0, 0, 0], 2: [0, 0, 1], 3: [0, 2, 1]})
centers = {
            1: np.array([[0, 0]]),
            2: np.array([[-1, -1], [1, 1]]),
            3: np.array([[-1, -1], [1, 1], [0, 0]]),
        }
cgram = Clustergram.from_centers(centers, labels)
cgram.plot(pca_weighted=False)
```

To support PCA weighted plots you also need to pass data:

```python
cgram = Clustergram.from_centers(centers, labels, data=data)
cgram.plot()
```

## Partial plot

`Clustergram.plot()` can also plot only a part of the diagram, if you want to focus on a
limited range of `k`.

```python
cgram = Clustergram(range(1, 20))
cgram.fit(data)
cgram.plot(figsize=(12, 8))
```

![Long
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/20_clusters.png)

```python
cgram.plot(k_range=range(3, 10), figsize=(12, 8))
```

![Limited
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/limited_plot.png)

## Additional clustering performance evaluation

Clustergam includes handy wrappers around a selection of clustering performance metrics
offered by `scikit-learn`. Data which were originally computed on GPU are converted to
numpy on the fly.

### Silhouette score

Compute the mean Silhouette Coefficient of all samples. See [`scikit-learn`
documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html#sklearn.metrics.silhouette_score)
for details.

```python
>>> cgram.silhouette_score()
2    0.531540
3    0.447219
4    0.400154
5    0.377720
6    0.372128
7    0.331575
Name: silhouette_score, dtype: float64
```

Once computed, resulting Series is available as `cgram.silhouette`. Calling the original
method will recompute the score.

### Calinski and Harabasz score

Compute the Calinski and Harabasz score, also known as the Variance Ratio Criterion. See
[`scikit-learn`
documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html#sklearn.metrics.calinski_harabasz_score)
for details.

```python
>>> cgram.calinski_harabasz_score()
2    482.191469
3    441.677075
4    400.392131
5    411.175066
6    382.731416
7    352.447569
Name: calinski_harabasz_score, dtype: float64
```

Once computed, resulting Series is available as `cgram.calinski_harabasz`. Calling the
original method will recompute the score.

### Davies-Bouldin score

Compute the Davies-Bouldin score. See [`scikit-learn`
documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html#sklearn.metrics.davies_bouldin_score)
for details.

```python
>>> cgram.davies_bouldin_score()
2    0.714064
3    0.943553
4    0.943320
5    0.973248
6    0.950910
7    1.074937
Name: davies_bouldin_score, dtype: float64
```

Once computed, resulting Series is available as `cgram.davies_bouldin`. Calling the
original method will recompute the score.

## Acessing labels

`Clustergram` stores resulting labels for each of the tested options, which can be
accessed as:

```python
>>> cgram.labels
     1  2  3  4  5  6  7
0    0  0  2  2  3  2  1
1    0  0  2  2  3  2  1
2    0  0  2  2  3  2  1
3    0  0  2  2  3  2  1
4    0  0  2  2  0  0  3
..  .. .. .. .. .. .. ..
337  0  1  1  3  2  5  0
338  0  1  1  3  2  5  0
339  0  1  1  1  1  1  4
340  0  1  1  3  2  5  5
341  0  1  1  1  1  1  5
```

## Saving clustergram

You can save both plot and `clustergram.Clustergram` to a disk.

### Saving plot

`Clustergram.plot()` returns matplotlib axis object and as such can be saved as any
other plot:

```python
import matplotlib.pyplot as plt

cgram.plot()
plt.savefig('clustergram.svg')
```

### Saving object

If you want to save your computed `clustergram.Clustergram` object to a disk, you can
use `pickle` library:

```python
import pickle

with open('clustergram.pickle','wb') as f:
    pickle.dump(cgram, f)
```

Then loading is equally simple:

```python
with open('clustergram.pickle','rb') as f:
    loaded = pickle.load(f)
```

## References

Schonlau M. The clustergram: a graph for visualizing hierarchical and non-hierarchical
cluster analyses. The Stata Journal, 2002; 2 (4):391-402.

Schonlau M. Visualizing Hierarchical and Non-Hierarchical Cluster Analyses with
Clustergrams. Computational Statistics: 2004; 19(1):95-111.

[https://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/](https://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/)


%package -n python3-clustergram
Summary:	Clustergram - visualization and diagnostics for cluster analysis
Provides:	python-clustergram
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-clustergram
# Clustergram

![logo
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/main/doc/_static/logo.svg)

## Visualization and diagnostics for cluster analysis

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4750483.svg)](https://doi.org/10.5281/zenodo.4750483)

Clustergram is a diagram proposed by Matthias Schonlau in his paper *[The clustergram: A
graph for visualizing hierarchical and nonhierarchical cluster
analyses](https://journals.sagepub.com/doi/10.1177/1536867X0200200405)*:

> In hierarchical cluster analysis, dendrograms are used to visualize how clusters are
> formed. I propose an alternative graph called a “clustergram” to examine how cluster
> members are assigned to clusters as the number of clusters increases. This graph is
> useful in exploratory analysis for nonhierarchical clustering algorithms such as
> k-means and for hierarchical cluster algorithms when the number of observations is
> large enough to make dendrograms impractical.

The clustergram was later implemented in R by [Tal
Galili](https://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/),
who also gives a thorough explanation of the concept.

This is a Python implementation, originally based on Tal's script, written for
`scikit-learn` and RAPIDS `cuML` implementations of K-Means, Mini Batch K-Means and
Gaussian Mixture Model (scikit-learn only) clustering, plus hierarchical/agglomerative
clustering using `SciPy`. Alternatively, you can create clustergram using  `from_*`
constructors based on alternative clustering algorithms.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/martinfleis/clustergram/main?urlpath=tree/doc/notebooks/)

## Getting started

You can install clustergram from `conda` or `pip`:

```shell
conda install clustergram -c conda-forge
```

```shell
pip install clustergram
```

In any case, you still need to install your selected backend (`scikit-learn` and `scipy`
or `cuML`).

The example of clustergram on Palmer penguins dataset:

```python
import seaborn
df = seaborn.load_dataset('penguins')
```

First we have to select numerical data and scale them.

```python
from sklearn.preprocessing import scale
data = scale(df.drop(columns=['species', 'island', 'sex']).dropna())
```

And then we can simply pass the data to `clustergram`.

```python
from clustergram import Clustergram

cgram = Clustergram(range(1, 8))
cgram.fit(data)
cgram.plot()
```

![Default
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/default.png)

## Styling

`Clustergram.plot()` returns matplotlib axis and can be fully customised as any other
matplotlib plot.

```python
seaborn.set(style='whitegrid')

cgram.plot(
    ax=ax,
    size=0.5,
    linewidth=0.5,
    cluster_style={"color": "lightblue", "edgecolor": "black"},
    line_style={"color": "red", "linestyle": "-."},
    figsize=(12, 8)
)
```

![Colored
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/colors.png)

## Mean options

On the `y` axis, a clustergram can use mean values as in the original paper by Matthias
Schonlau or PCA weighted mean values as in the implementation by Tal Galili.

```python
cgram = Clustergram(range(1, 8))
cgram.fit(data)
cgram.plot(figsize=(12, 8), pca_weighted=True)
```

![Default
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/pca_true.png)

```python
cgram = Clustergram(range(1, 8))
cgram.fit(data)
cgram.plot(figsize=(12, 8), pca_weighted=False)
```

![Default
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/pca_false.png)

## Scikit-learn, SciPy and RAPIDS cuML backends

Clustergram offers three backends for the computation - `scikit-learn` and `scipy` which
use CPU and RAPIDS.AI `cuML`, which uses GPU. Note that all are optional dependencies
but you will need at least one of them to generate clustergram.

Using `scikit-learn` (default):

```python
cgram = Clustergram(range(1, 8), backend='sklearn')
cgram.fit(data)
cgram.plot()
```

Using `cuML`:

```python
cgram = Clustergram(range(1, 8), backend='cuML')
cgram.fit(data)
cgram.plot()
```

`data` can be all data types supported by the selected backend (including
`cudf.DataFrame` with `cuML` backend).

## Supported methods

Clustergram currently supports K-Means, Mini Batch K-Means, Gaussian Mixture Model and
SciPy's hierarchical clustering methods. Note tha GMM and Mini Batch K-Means are
supported only for `scikit-learn` backend and hierarchical methods are supported only
for `scipy` backend.

Using K-Means (default):

```python
cgram = Clustergram(range(1, 8), method='kmeans')
cgram.fit(data)
cgram.plot()
```

Using Mini Batch K-Means, which can provide significant speedup over K-Means:

```python
cgram = Clustergram(range(1, 8), method='minibatchkmeans', batch_size=100)
cgram.fit(data)
cgram.plot()
```

Using Gaussian Mixture Model:

```python
cgram = Clustergram(range(1, 8), method='gmm')
cgram.fit(data)
cgram.plot()
```

Using Ward's hierarchical clustering:

```python
cgram = Clustergram(range(1, 8), method='hierarchical', linkage='ward')
cgram.fit(data)
cgram.plot()
```

## Manual input

Alternatively, you can create clustergram using `from_data` or  `from_centers` methods
based on alternative clustering algorithms.

Using `Clustergram.from_data` which creates cluster centers as mean or median values:

```python
data = numpy.array([[-1, -1, 0, 10], [1, 1, 10, 2], [0, 0, 20, 4]])
labels = pandas.DataFrame({1: [0, 0, 0], 2: [0, 0, 1], 3: [0, 2, 1]})

cgram = Clustergram.from_data(data, labels)
cgram.plot()
```

Using `Clustergram.from_centers` based on explicit cluster centers.:

```python
labels = pandas.DataFrame({1: [0, 0, 0], 2: [0, 0, 1], 3: [0, 2, 1]})
centers = {
            1: np.array([[0, 0]]),
            2: np.array([[-1, -1], [1, 1]]),
            3: np.array([[-1, -1], [1, 1], [0, 0]]),
        }
cgram = Clustergram.from_centers(centers, labels)
cgram.plot(pca_weighted=False)
```

To support PCA weighted plots you also need to pass data:

```python
cgram = Clustergram.from_centers(centers, labels, data=data)
cgram.plot()
```

## Partial plot

`Clustergram.plot()` can also plot only a part of the diagram, if you want to focus on a
limited range of `k`.

```python
cgram = Clustergram(range(1, 20))
cgram.fit(data)
cgram.plot(figsize=(12, 8))
```

![Long
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/20_clusters.png)

```python
cgram.plot(k_range=range(3, 10), figsize=(12, 8))
```

![Limited
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/limited_plot.png)

## Additional clustering performance evaluation

Clustergam includes handy wrappers around a selection of clustering performance metrics
offered by `scikit-learn`. Data which were originally computed on GPU are converted to
numpy on the fly.

### Silhouette score

Compute the mean Silhouette Coefficient of all samples. See [`scikit-learn`
documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html#sklearn.metrics.silhouette_score)
for details.

```python
>>> cgram.silhouette_score()
2    0.531540
3    0.447219
4    0.400154
5    0.377720
6    0.372128
7    0.331575
Name: silhouette_score, dtype: float64
```

Once computed, resulting Series is available as `cgram.silhouette`. Calling the original
method will recompute the score.

### Calinski and Harabasz score

Compute the Calinski and Harabasz score, also known as the Variance Ratio Criterion. See
[`scikit-learn`
documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html#sklearn.metrics.calinski_harabasz_score)
for details.

```python
>>> cgram.calinski_harabasz_score()
2    482.191469
3    441.677075
4    400.392131
5    411.175066
6    382.731416
7    352.447569
Name: calinski_harabasz_score, dtype: float64
```

Once computed, resulting Series is available as `cgram.calinski_harabasz`. Calling the
original method will recompute the score.

### Davies-Bouldin score

Compute the Davies-Bouldin score. See [`scikit-learn`
documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html#sklearn.metrics.davies_bouldin_score)
for details.

```python
>>> cgram.davies_bouldin_score()
2    0.714064
3    0.943553
4    0.943320
5    0.973248
6    0.950910
7    1.074937
Name: davies_bouldin_score, dtype: float64
```

Once computed, resulting Series is available as `cgram.davies_bouldin`. Calling the
original method will recompute the score.

## Acessing labels

`Clustergram` stores resulting labels for each of the tested options, which can be
accessed as:

```python
>>> cgram.labels
     1  2  3  4  5  6  7
0    0  0  2  2  3  2  1
1    0  0  2  2  3  2  1
2    0  0  2  2  3  2  1
3    0  0  2  2  3  2  1
4    0  0  2  2  0  0  3
..  .. .. .. .. .. .. ..
337  0  1  1  3  2  5  0
338  0  1  1  3  2  5  0
339  0  1  1  1  1  1  4
340  0  1  1  3  2  5  5
341  0  1  1  1  1  1  5
```

## Saving clustergram

You can save both plot and `clustergram.Clustergram` to a disk.

### Saving plot

`Clustergram.plot()` returns matplotlib axis object and as such can be saved as any
other plot:

```python
import matplotlib.pyplot as plt

cgram.plot()
plt.savefig('clustergram.svg')
```

### Saving object

If you want to save your computed `clustergram.Clustergram` object to a disk, you can
use `pickle` library:

```python
import pickle

with open('clustergram.pickle','wb') as f:
    pickle.dump(cgram, f)
```

Then loading is equally simple:

```python
with open('clustergram.pickle','rb') as f:
    loaded = pickle.load(f)
```

## References

Schonlau M. The clustergram: a graph for visualizing hierarchical and non-hierarchical
cluster analyses. The Stata Journal, 2002; 2 (4):391-402.

Schonlau M. Visualizing Hierarchical and Non-Hierarchical Cluster Analyses with
Clustergrams. Computational Statistics: 2004; 19(1):95-111.

[https://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/](https://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/)


%package help
Summary:	Development documents and examples for clustergram
Provides:	python3-clustergram-doc
%description help
# Clustergram

![logo
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/main/doc/_static/logo.svg)

## Visualization and diagnostics for cluster analysis

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4750483.svg)](https://doi.org/10.5281/zenodo.4750483)

Clustergram is a diagram proposed by Matthias Schonlau in his paper *[The clustergram: A
graph for visualizing hierarchical and nonhierarchical cluster
analyses](https://journals.sagepub.com/doi/10.1177/1536867X0200200405)*:

> In hierarchical cluster analysis, dendrograms are used to visualize how clusters are
> formed. I propose an alternative graph called a “clustergram” to examine how cluster
> members are assigned to clusters as the number of clusters increases. This graph is
> useful in exploratory analysis for nonhierarchical clustering algorithms such as
> k-means and for hierarchical cluster algorithms when the number of observations is
> large enough to make dendrograms impractical.

The clustergram was later implemented in R by [Tal
Galili](https://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/),
who also gives a thorough explanation of the concept.

This is a Python implementation, originally based on Tal's script, written for
`scikit-learn` and RAPIDS `cuML` implementations of K-Means, Mini Batch K-Means and
Gaussian Mixture Model (scikit-learn only) clustering, plus hierarchical/agglomerative
clustering using `SciPy`. Alternatively, you can create clustergram using  `from_*`
constructors based on alternative clustering algorithms.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/martinfleis/clustergram/main?urlpath=tree/doc/notebooks/)

## Getting started

You can install clustergram from `conda` or `pip`:

```shell
conda install clustergram -c conda-forge
```

```shell
pip install clustergram
```

In any case, you still need to install your selected backend (`scikit-learn` and `scipy`
or `cuML`).

The example of clustergram on Palmer penguins dataset:

```python
import seaborn
df = seaborn.load_dataset('penguins')
```

First we have to select numerical data and scale them.

```python
from sklearn.preprocessing import scale
data = scale(df.drop(columns=['species', 'island', 'sex']).dropna())
```

And then we can simply pass the data to `clustergram`.

```python
from clustergram import Clustergram

cgram = Clustergram(range(1, 8))
cgram.fit(data)
cgram.plot()
```

![Default
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/default.png)

## Styling

`Clustergram.plot()` returns matplotlib axis and can be fully customised as any other
matplotlib plot.

```python
seaborn.set(style='whitegrid')

cgram.plot(
    ax=ax,
    size=0.5,
    linewidth=0.5,
    cluster_style={"color": "lightblue", "edgecolor": "black"},
    line_style={"color": "red", "linestyle": "-."},
    figsize=(12, 8)
)
```

![Colored
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/colors.png)

## Mean options

On the `y` axis, a clustergram can use mean values as in the original paper by Matthias
Schonlau or PCA weighted mean values as in the implementation by Tal Galili.

```python
cgram = Clustergram(range(1, 8))
cgram.fit(data)
cgram.plot(figsize=(12, 8), pca_weighted=True)
```

![Default
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/pca_true.png)

```python
cgram = Clustergram(range(1, 8))
cgram.fit(data)
cgram.plot(figsize=(12, 8), pca_weighted=False)
```

![Default
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/pca_false.png)

## Scikit-learn, SciPy and RAPIDS cuML backends

Clustergram offers three backends for the computation - `scikit-learn` and `scipy` which
use CPU and RAPIDS.AI `cuML`, which uses GPU. Note that all are optional dependencies
but you will need at least one of them to generate clustergram.

Using `scikit-learn` (default):

```python
cgram = Clustergram(range(1, 8), backend='sklearn')
cgram.fit(data)
cgram.plot()
```

Using `cuML`:

```python
cgram = Clustergram(range(1, 8), backend='cuML')
cgram.fit(data)
cgram.plot()
```

`data` can be all data types supported by the selected backend (including
`cudf.DataFrame` with `cuML` backend).

## Supported methods

Clustergram currently supports K-Means, Mini Batch K-Means, Gaussian Mixture Model and
SciPy's hierarchical clustering methods. Note tha GMM and Mini Batch K-Means are
supported only for `scikit-learn` backend and hierarchical methods are supported only
for `scipy` backend.

Using K-Means (default):

```python
cgram = Clustergram(range(1, 8), method='kmeans')
cgram.fit(data)
cgram.plot()
```

Using Mini Batch K-Means, which can provide significant speedup over K-Means:

```python
cgram = Clustergram(range(1, 8), method='minibatchkmeans', batch_size=100)
cgram.fit(data)
cgram.plot()
```

Using Gaussian Mixture Model:

```python
cgram = Clustergram(range(1, 8), method='gmm')
cgram.fit(data)
cgram.plot()
```

Using Ward's hierarchical clustering:

```python
cgram = Clustergram(range(1, 8), method='hierarchical', linkage='ward')
cgram.fit(data)
cgram.plot()
```

## Manual input

Alternatively, you can create clustergram using `from_data` or  `from_centers` methods
based on alternative clustering algorithms.

Using `Clustergram.from_data` which creates cluster centers as mean or median values:

```python
data = numpy.array([[-1, -1, 0, 10], [1, 1, 10, 2], [0, 0, 20, 4]])
labels = pandas.DataFrame({1: [0, 0, 0], 2: [0, 0, 1], 3: [0, 2, 1]})

cgram = Clustergram.from_data(data, labels)
cgram.plot()
```

Using `Clustergram.from_centers` based on explicit cluster centers.:

```python
labels = pandas.DataFrame({1: [0, 0, 0], 2: [0, 0, 1], 3: [0, 2, 1]})
centers = {
            1: np.array([[0, 0]]),
            2: np.array([[-1, -1], [1, 1]]),
            3: np.array([[-1, -1], [1, 1], [0, 0]]),
        }
cgram = Clustergram.from_centers(centers, labels)
cgram.plot(pca_weighted=False)
```

To support PCA weighted plots you also need to pass data:

```python
cgram = Clustergram.from_centers(centers, labels, data=data)
cgram.plot()
```

## Partial plot

`Clustergram.plot()` can also plot only a part of the diagram, if you want to focus on a
limited range of `k`.

```python
cgram = Clustergram(range(1, 20))
cgram.fit(data)
cgram.plot(figsize=(12, 8))
```

![Long
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/20_clusters.png)

```python
cgram.plot(k_range=range(3, 10), figsize=(12, 8))
```

![Limited
clustergram](https://raw.githubusercontent.com/martinfleis/clustergram/master/doc/_static/limited_plot.png)

## Additional clustering performance evaluation

Clustergam includes handy wrappers around a selection of clustering performance metrics
offered by `scikit-learn`. Data which were originally computed on GPU are converted to
numpy on the fly.

### Silhouette score

Compute the mean Silhouette Coefficient of all samples. See [`scikit-learn`
documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html#sklearn.metrics.silhouette_score)
for details.

```python
>>> cgram.silhouette_score()
2    0.531540
3    0.447219
4    0.400154
5    0.377720
6    0.372128
7    0.331575
Name: silhouette_score, dtype: float64
```

Once computed, resulting Series is available as `cgram.silhouette`. Calling the original
method will recompute the score.

### Calinski and Harabasz score

Compute the Calinski and Harabasz score, also known as the Variance Ratio Criterion. See
[`scikit-learn`
documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html#sklearn.metrics.calinski_harabasz_score)
for details.

```python
>>> cgram.calinski_harabasz_score()
2    482.191469
3    441.677075
4    400.392131
5    411.175066
6    382.731416
7    352.447569
Name: calinski_harabasz_score, dtype: float64
```

Once computed, resulting Series is available as `cgram.calinski_harabasz`. Calling the
original method will recompute the score.

### Davies-Bouldin score

Compute the Davies-Bouldin score. See [`scikit-learn`
documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html#sklearn.metrics.davies_bouldin_score)
for details.

```python
>>> cgram.davies_bouldin_score()
2    0.714064
3    0.943553
4    0.943320
5    0.973248
6    0.950910
7    1.074937
Name: davies_bouldin_score, dtype: float64
```

Once computed, resulting Series is available as `cgram.davies_bouldin`. Calling the
original method will recompute the score.

## Acessing labels

`Clustergram` stores resulting labels for each of the tested options, which can be
accessed as:

```python
>>> cgram.labels
     1  2  3  4  5  6  7
0    0  0  2  2  3  2  1
1    0  0  2  2  3  2  1
2    0  0  2  2  3  2  1
3    0  0  2  2  3  2  1
4    0  0  2  2  0  0  3
..  .. .. .. .. .. .. ..
337  0  1  1  3  2  5  0
338  0  1  1  3  2  5  0
339  0  1  1  1  1  1  4
340  0  1  1  3  2  5  5
341  0  1  1  1  1  1  5
```

## Saving clustergram

You can save both plot and `clustergram.Clustergram` to a disk.

### Saving plot

`Clustergram.plot()` returns matplotlib axis object and as such can be saved as any
other plot:

```python
import matplotlib.pyplot as plt

cgram.plot()
plt.savefig('clustergram.svg')
```

### Saving object

If you want to save your computed `clustergram.Clustergram` object to a disk, you can
use `pickle` library:

```python
import pickle

with open('clustergram.pickle','wb') as f:
    pickle.dump(cgram, f)
```

Then loading is equally simple:

```python
with open('clustergram.pickle','rb') as f:
    loaded = pickle.load(f)
```

## References

Schonlau M. The clustergram: a graph for visualizing hierarchical and non-hierarchical
cluster analyses. The Stata Journal, 2002; 2 (4):391-402.

Schonlau M. Visualizing Hierarchical and Non-Hierarchical Cluster Analyses with
Clustergrams. Computational Statistics: 2004; 19(1):95-111.

[https://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/](https://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/)


%prep
%autosetup -n clustergram-0.7.0

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-clustergram -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Fri Jun 09 2023 Python_Bot <Python_Bot@openeuler.org> - 0.7.0-1
- Package Spec generated