automatic import of python-digitalcellsorteropeneuler20.03

author: CoprDistGit <infra@openeuler.org> 2023-05-05 11:57:11 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-05 11:57:11 +0000
commit: 03da39f05058d58edf9129f6c67831ad8c1c5417 (patch)
tree: 8a0a467189ba35cf93b0007b1444c12d954763eb
parent: 03a130a68836c5381177520d0ddb26ef35df1860 (diff)
3 files changed, 1640 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..78bb768 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/DigitalCellSorter-1.3.7.6.tar.gz
diff --git a/python-digitalcellsorter.spec b/python-digitalcellsorter.spec
new file mode 100644
index 0000000..53de811
--- /dev/null
+++ b/python-digitalcellsorter.spec
@@ -0,0 +1,1638 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-DigitalCellSorter
+Version:	1.3.7.6
+Release:	1
+Summary:	Toolkit for analysis and identification of cell types from heterogeneous single cell RNA-seq data
+License:	MIT License
+URL:		https://github.com/sdomanskyi/DigitalCellSorter
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/aa/ff/ba4a40eb753c899973e638e701194d198051f9f21076f68d76147ee347f5/DigitalCellSorter-1.3.7.6.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-numpy
+Requires:	python3-pandas
+Requires:	python3-patsy
+Requires:	python3-xlrd
+Requires:	python3-openpyxl
+Requires:	python3-tables
+Requires:	python3-scipy
+Requires:	python3-matplotlib
+Requires:	python3-scikit-learn
+Requires:	python3-mygene
+Requires:	python3-plotly
+Requires:	python3-adjustText
+
+%description
+# Digital Cell Sorter
+
+[![DOI](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter.svg)](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter)
+[![DOI](https://badge.fury.io/py/DigitalCellSorter.svg)](https://pypi.org/project/DigitalCellSorter)
+[![DOI](https://readthedocs.org/projects/digital-cell-sorter/badge/?version=latest)](https://digital-cell-sorter.readthedocs.io/en/latest/?badge=latest)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2533377.svg)](https://doi.org/10.5281/zenodo.2533377) 
+
+Digital Cell Sorter (DCS): a single cell RNA-seq analysis toolkit for clustering, cell type identification, and anomaly detection.
+
+> **Note:** We are currently preparing a manuscript describing the toolkit located this repository.
+> If you want to access the package detailed in our latest publication of Polled Digital Cell Sorter
+> go to https://zenodo.org/record/2603265 and download the package (v1.1).
+
+
+> **The latest publication describing the methodology of cell types identification:**
+> [Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters](
+> https://doi.org/10.1186/s12859-019-2951-x 
+> "Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters")
+> Sergii Domanskyi, Anthony Szedlak, Nathaniel T Hawkins, Jiayin Wang, Giovanni Paternostro & Carlo Piermarocchi, 
+> *BMC Bioinformatics* volume 20, Article number: 369 (**2019**)
+
+
+The documentation is available at https://digital-cell-sorter.readthedocs.io/.
+
+- [Getting Started](#getting-started)
+  * [Prerequisites](#prerequisites)
+  * [Loading the package](#loading-the-package)
+  * [Gene Expression Data Format](#gene-expression-data-format)
+  * [Other Data](#other-data)
+- [Functionality](#functionality)
+  * [Overall](#overall)
+  * [Visualization](#visualization)
+- [Demo](#demo)
+  * [Usage](#usage)
+    + [Main cell types](#main-cell-types)
+    + [Cell sub-types](#cell-sub-types)
+  * [Output](#output)
+
+## Getting Started
+
+These instructions will get you a copy of the project up and running on your machine for data analysis, development or testing purposes.
+
+### Prerequisites
+
+#### Environment setup
+The software runs in Python >= 3.7
+
+It is highly recommended to install Anaconda.
+Installers are available at https://www.anaconda.com/distribution/
+Whether you already had Anaconda installed or just installed it we recommend to
+update all packages by running:
+
+	conda update conda
+
+With conda, you can create, export, list, remove, and update environments that 
+have different versions of Python and/or packages installed in them. 
+Switching or moving between environments is called activating the environment.
+
+	conda create --name DCS
+	conda activate DCS
+
+Now, in your new environment, the packages can be installed or updated without affecting
+your other environments. Note, environments use is not necessary, and the 
+default ```(base)``` is used if you dont set up any other. For more information see
+https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
+
+
+> Note: use of conda environments (for instance DCS exemplified above)
+> with a high performance computer
+> such as MSU HPCC in a batch job, i.e. not on a development node but 
+> submitted to a SLURM queue, requires the following steps. In the slurm 
+> script, before line calling your python script, 
+> add ```conda deactivate``` to deactivate base environnment
+> and ```conda activate DCS```. After calling the script 
+> do ```conda deactivate```. The example testing script is shown below.
+
+
+<details closed><summary>SLURM script example:</summary><p>
+
+	#!/bin/bash --login
+	########## Define Resources Needed with SBATCH Lines ##########
+	#SBATCH --time=00:01:00             # limit of wall clock time - how long the job will run (same as -t)
+	#SBATCH --ntasks=1                  # number of tasks - how many tasks (nodes) that you require (same as -n)
+	#SBATCH --cpus-per-task=1           # number of CPUs (or cores) per task (same as -c)
+	#SBATCH --mem=1G                    # memory required per node - amount of memory (in bytes)
+	##SBATCH --job-name Name_of_Job     # you can give your job a name for easier identification (same as -J)
+
+	########## Command Lines to Run ##########
+	conda deactivate
+	conda activate DCS 
+	cd ./                               ### change to the directory where your code is located 
+	python test.py        		    ### call your executable
+	scontrol show job $SLURM_JOB_ID     ### write job information to output file
+	conda deactivate
+
+where ```test.py``` is the python script where you import and use 
+```DigitalCellSorter```.
+
+</p></details>
+
+
+#### Installation of the DigitalCellSorter package
+
+Install ```DigitalCellSorter``` with ```pip```. Most of the dependencies packages 
+are automatically installed with installation of the  latest release 
+of ```DigitalCellSorter```:
+
+	pip install DigitalCellSorter
+
+Alternatively, you can clone and install this module directly from GitHub using:
+
+	pip install git+https://github.com/sdomanskyi/DigitalCellSorter
+
+Similarly, one can create a local copy of this project for development purposes, and 
+install the package from the cloned directory:
+
+	git clone https://github.com/sdomanskyi/DigitalCellSorter
+	python setup.py install
+
+Our software uses packages ```numpy```, ```pandas```, ```matplotlib```, 
+```scikit-learn```, ```scipy```, ```mygene```, ```fftw```, 
+```fitsne```, ```adjustText``` and a few other standard Python packages. 
+Some of the packages used in ```DigitalCellSorter``` are not installed by default, 
+and should by installed by separately if using certain functionality with 
+Digital Cell Sorter. For example, for network-based clustering
+install packages ```pynndescent```, ```networkx```, ```python-louvain```. 
+Other packages that have to be installed separately are ```fitsne```, ```umap```, 
+```phate``` and ```orca```. The detailed instructions are below.
+
+#### t-SNE
+With datasets containing less than 2000 cells ```sklearn.manifold.TSNE``` is used.
+For large datasets Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
+implemented by **KlugerLab** is used (https://github.com/KlugerLab/FIt-SNE).
+To use FIt-SNE the following need to be installed. First update ```cython``` by
+
+	pip install --upgrade cython
+
+Then install ```fftw``` from the ```conda-forge``` channel 
+add ```conda-forge``` to your channels, and install ```fftw```:
+
+	conda config --add channels conda-forge
+	conda install fftw
+
+The next installation step is platform specific. To install FI-tSNE for Linux:
+
+	pip install fitsne
+
+On macOS Mojave C++ compiler has to be specified explicitly:
+
+	env CC=clang CXX=clang++ pip install fitsne
+
+On Windows the FI-tSNE wrapper and executable are already 
+included with ```DigitalCellSorter```. 
+
+#### Other layouts
+
+To use UMAP layout
+
+	pip install umap-learn
+
+To use PHATE 
+
+	pip install phate
+
+> Note, if neither ```fitsne```, ```umap``` nor ```phate``` are installed 
+> ```DigitalCellSorter``` defaults to PCA two largest principal components for 
+> visualization layout.
+
+#### Interactive HTML figures
+To use Sankey diagrams that are part of Digital Cell Sorter 
+install ```plotly``` and ```orca```:
+
+    conda install -c plotly plotly-orca
+    conda install -c anaconda psutil
+
+See 
+[interactive Hopfield landscape figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/dataName_energy_landscape_PC1_vs_PC0.html "Hopfield attractors figure")
+and 
+[interactive Sankey diagram figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/PanglaoDB_Sankey_SRS3296611.html "Sankey diagram of cell annotation") 
+in a browser.
+
+```orca	``` is necessary to convert Sankey diagrams to static images.
+If for any reason ```orca``` is unavailable the Sankey diagrams will be saved as 
+ineractive HTML figure, that can be opened in a browser (Chrome, Firefox etc.) and 
+saved as static image. The visualization of ```DigitalCellSorter``` are implemented 
+with ```matplotlib```, allowing all the figures to be saved in either raster or 
+vactor format. Since ```plotly``` can convert simple ```matplotlib``` figures 
+(scatter, line, bar plots, but not heatmaps, splines or other complex patch objects) to
+ineractive HTML format ```DigitalCellSorter``` can attempt to save any of its figures
+as HTML. This is particulatly useful with ```Projection``` plots, even though the color
+bars are not rendered in HTML figures.
+
+### Loading the package
+
+Use the latest release of PyPI package. 
+
+For a quick-start demo with a dataset of ~5k PBMCs execute 
+in the terminal and follow prompts:
+
+	python -m DigitalCellSorter
+
+The second, more detailed demonstration analysis with step-by-step 
+explanation is discussed here
+and in the demo section at the end of this README.
+In your script import the package:
+
+	import DigitalCellSorter
+
+Create an instance of class ```DigitalCellSorter```. Here, for simplicity, we use Default parameter values:
+
+	DCS = DigitalCellSorter.DigitalCellSorter()
+
+During the initialization a number of parameters can be specified. For detailed list see documentation.
+Many of these parameters are transfered to DCS attributes thus can be modified after initialization using, e.g.:
+
+	DCS.toggleMakeStackedBarplot = False
+
+
+
+### Gene Expression Data Format
+
+The input gene expression data is expected in one of the following formats:
+
+1. Spreadsheet of comma-separated values ```csv``` containing condensed matrix in a form ```('cell', 'gene', 'expr')```. 
+If there are batches in the data the matrix has to be of the form ```('batch', 'cell', 'gene', 'expr')```. Columns order can be arbitrary.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell | gene | expr |
+|------|------|------|
+| C1   | G1   | 3    |
+| C1   | G2   | 2    |
+| C1   | G3   | 1    |
+| C2   | G1   | 1    |
+| C2   | G4   | 5    |
+| ...  | ...  | ...  |
+
+or:
+
+| batch  | cell | gene | expr |
+|--------|------|------|------|
+| batch0 | C1   | G1   | 3    |
+| batch0 | C1   | G2   | 2    |
+| batch0 | C1   | G3   | 1    |
+| batch1 | C2   | G1   | 1    |
+| batch1 | C2   | G4   | 5    |
+| ...    | ...  | ...  | ...  |
+
+</p></details>
+
+
+2. Spreadsheet of comma-separated values ```csv``` where rows are genes, columns are cells with gene expression counts.
+If there are batches in the data the spreadsheet the first row should be ```'batch'``` and the second ```'cell'```.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell  | C1     | C2     | C3     | C4     |
+|-------|--------|--------|--------|--------|
+| G1    |        | 3      | 1      | 7      |
+| G2    | 2      | 2      |        | 2      |
+| G3    | 3      | 1      |        | 5      |
+| G4    | 10     |        | 5      | 4      |
+| ...   | ...    | ...    | ...    | ...    |
+
+or:
+
+| batch | batch0 | batch0 | batch1 | batch1 |
+|-------|--------|--------|--------|--------|
+| cell  | C1     | C2     | C3     | C4     |
+| G1    |        | 3      | 1      | 7      |
+| G2    | 2      | 2      |        | 2      |
+| G3    | 3      | 1      |        | 5      |
+| G4    | 10     |        | 5      | 4      |
+| ...   | ...    | ...    | ...    | ...    |
+
+</p></details>
+
+3. ```Pandas DataFrame``` where ```axis 0``` is genes and ```axis 1``` are cells.
+If the are batched in the data then the index of ```axis 1``` should have two levels, e.g. ```('batch', 'cell')```, 
+with the first level indicating patient, batch or expreriment where that cell was sequenced, and the
+second level containing cell barcodes for identification.
+
+<details closed><summary>Examples:</summary><p>
+
+    df = pd.DataFrame(data=[[2,np.nan],[3,8],[3,5],[np.nan,1]], 
+                      index=['G1','G2','G3','G4'], 
+                      columns=pd.MultiIndex.from_arrays([['batch0','batch1'],['C1','C2']], names=['batch', 'cell']))    
+
+
+</p></details>
+
+4. ```Pandas Series ``` where index should have two levels, e.g. ```('cell', 'gene')```. If there are batched in the data
+the first level should be indicating patient, batch or expreriment where that cell was sequenced, the second level cell barcodes for 
+identification and the third level gene names.
+
+<details closed><summary>Examples:</summary><p>
+
+    se = pd.Series(data=[1,8,3,5,5], 
+                   index=pd.MultiIndex.from_arrays([['batch0','batch0','batch1','batch1','batch1'],
+                                                    ['C1','C1','C1','C2','C2'],
+                                                    ['G1','G2','G3','G1','G4']], names=['batch', 'cell', 'gene']))
+
+
+</p></details>
+
+Any of the data types outlined above need to be prepared/validated with a function ```prepare()```. 
+Let us demonstrate this on the input of type 1:
+
+	df_expr = DCS.prepare('data/testData/dataFileCondensedWithBatches.tsv')
+
+### Other Data
+
+```markersDCS.xlsx```: An excel book with marker data. Rows are markers and columns are cell types. 
+'1' means that the gene is a marker for that cell type, '-1' means that this gene is not expressed in this cell type, and '0' otherwise.
+This gene marker file included in the package is used by Default. 
+If you use your own file it has to be prepared in the same format (including the two-line header). Note that only the first worksheet will be read,
+and its name can be arbitrary. The first column should contain gene names. The second row should contain cell types, and the first row how 
+those cell types are grouped. If any of the cell types need to be skipped, have "NA" in the corresponding cell of the first row of that cell type.
+
+<details closed><summary>Example:</summary><p>
+
+|A       |B            |C             |D           |E          |F                |G                         |H                           |I                        |J                         |K                  |L               |M                 |...      |
+|--------|-------------|--------------|------------|-----------|-----------------|--------------------------|----------------------------|-------------------------|--------------------------|-------------------|----------------|------------------|---------|
+|        |B cells      |B cells       |B cells     |T cells    |T cells          |T cells                   |T cells                     |T cells                  |T cells                   |T cells            |NK cells        |NK cells          |...      |
+|Marker  |B cells naive|B cells memory|Plasma cells|T cells CD8|T cells CD4 naive|T cells CD4 memory resting|T cells CD4 memory activated|T cells follicular helper|T cells regulatory (Tregs)|T cells gamma delta|NK cells resting|NK cells activated|...      |
+|ABCB4   |1            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ABCB9   |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ACAP1   |0            |0             |0           |0          |1                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ACHE    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ACP5    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADAM28  |1            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADAMDEC1|0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADAMTS3 |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADRB2   |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|AIF1    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|AIM2    |0            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ALOX15  |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ALOX5   |0            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|AMPD1   |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ANGPT4  |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|...     |...          |...           |...         |...        |...              |...                       |...                         |...                      |...                       |...                |...             |...               |...      |
+
+</p></details>
+
+```Human.MitoCarta2.0.csv```: An ```csv``` spreadsheet with human mitochondrial genes, created within work 
+[MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins](https://doi.org/10.1093/nar/gkv1003 "MitoCarta2.0")
+Sarah E. Calvo, Karl R. Clauser, Vamsi K. Mootha, *Nucleic Acids Research*, Volume 44, Issue D1, 4 January 2016.
+
+
+## Functionality
+
+### Overall
+
+The main class, DigitalCellSorter, includes tools for:
+
+  1. **Pre-preprocessing**
+  2. **Quality control**
+  3. **Batch effects correction**
+  4. **Cells anomaly score evaluation**
+  4. **Dimensionality reduction**
+  5. **Clustering**
+  6. **Annotating cell types**
+  7. **Vizualization**  
+  8. **Post-processing**.
+
+
+### Visualization
+
+Function ```visualize()``` will produce most of the necessary files for post-analysis of the data. 
+
+See examples of the visualization tools below.
+
+
+<details closed><summary>The visualization tools include:</summary><p>
+
+- ```makeMarkerExpressionPlot()```: a heatmap that shows all markers and their expression levels in the clusters, 
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_marker_expression.png?raw=true" width="1000"/>
+</p>
+
+- ```getIndividualGeneExpressionPlot()```:  2D layout colored by individual gene's expression
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD19_(B4_CVID3_CD19).png?raw=true" width="400"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD33_(SIGLEC-3_CD33_p67_SIGLEC3).png?raw=true" width="400"/>
+</p>
+
+- ```makeVotingResultsMatrixPlot()```: z-scores of the voting results for each input cell type and each cluster, 
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_scores_matrix.png?raw=true" height="700"/>
+</p>
+
+- ```makeHistogramNullDistributionPlot()```: null distribution for each cluster and each cell type illustrating 
+the "machinery" of the Digital Cell Sorter
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_null_distributions.png?raw=true" width="800"/>
+</p>
+
+- ```makeQualityControlHistogramPlot()```: Quality control histogram plots
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_number_of_genes_histogram.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_count_depth_histogram.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_fraction_of_mitochondrialGenes_histogram.png?raw=true" width="250"/>
+</p>
+
+- ```makeProjectionPlot()```: 2D layout colored by number of unique genes expressed, 
+number of counts measured, and a faraction of mitochondrial genes..
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_number_of_genes.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_count_depth.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_fraction_of_mitochondrialGenes.png?raw=true" width="250"/>
+</p>
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_is_quality_cell.png?raw=true" width="500"/>
+</p>
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters.png?raw=true" width="375"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+Effect of batch correction demostrated on combining BM1, BM2, BM3 and processing the data jointly without (left) and with (right) batch correction option:
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_no_corr_clusters_by_patients.png?raw=true" width="375"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_with_corr_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+- ```makeStackedBarplot()```: plot with fractions of various cell types
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters_annotated.png?raw=true" width="500"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_subclustering_stacked_barplot_BM1.png?raw=true" height="500"/>
+</p>
+
+
+- ```makeSankeyDiagram()```: river plot to compare various results 
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/Sankey_example.png?raw=true" width="800"/>
+</p>
+
+- ```getAnomalyScoresPlot()```: plot with anomaly scores per cell
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_All.png?raw=true" width="750"/>
+</p>
+
+Calculate and plot anomaly scores for an arbitrary cell type or cluster:
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_B_cells.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_T_cells.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_cluster_7.0.0.png?raw=true" width="250"/>
+</p>
+
+
+- ```getIndividualGeneTtestPlot()```: Produce heatmap plot of t-test p-Values calculated gene-pair-wise
+        from the annotated clusters
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_ttest_CD4_(CD4_CD4mut).png?raw=true" width="500"/>
+</p>
+
+
+- ```makePlotOfNewMarkers()```: genes significantly expressed in the annotated cell types
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_new_markers.png?raw=true" width="1000"/>
+</p>
+
+</p></details>
+
+
+## Demo
+
+### Usage
+
+We have made an example execution file ```demo.py``` that shows how to use ```DigitalCellSorter```.
+
+In the demo, folder ```data``` is intentionally left empty. 
+The data file (cc95ff89-2e68-4a08-a234-480eca21ce79.homo_sapiens.mtx.zip) is about 2.4Gb in size and
+will be downloaded with the ```demo.py``` script.
+
+> Previously the HCA preview data was consolidated in file ```ica_bone_marrow_h5.h5``` and downloadable  
+> from https://preview.data.humancellatlas.org/ (Raw Counts Matrix - Bone Marrow). 
+> That file was ~485Mb and containing 378000 cells from 8 bone marrow donors (BM1-BM8). 
+
+See details of the script ```demo.py``` at:
+
+> [Example walkthrough of demo.py script](https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/ "Examples")
+
+
+To execute the complete script ```demo.py``` run:
+
+	python demo.py
+
+*Note that the HCA BM1 data contains ~50000 sequenced cells, requiring more than 60Gb of RAM (we recommend to use High Performance Computers).
+If you want to run our example on a regular PC or a laptop, you can use a randomly chosen number of cells:
+
+    df_expr.sample(n=5000, axis=1)
+
+
+### Output
+
+All the output files are saved in ```output``` directory inside the directory where the ```demo.py``` script is. 
+If you specify any other directory, the results will be generetaed in it.
+If you do not provide any directory the results will appear in the root where the script was executed.
+
+
+
+
+%package -n python3-DigitalCellSorter
+Summary:	Toolkit for analysis and identification of cell types from heterogeneous single cell RNA-seq data
+Provides:	python-DigitalCellSorter
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-DigitalCellSorter
+# Digital Cell Sorter
+
+[![DOI](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter.svg)](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter)
+[![DOI](https://badge.fury.io/py/DigitalCellSorter.svg)](https://pypi.org/project/DigitalCellSorter)
+[![DOI](https://readthedocs.org/projects/digital-cell-sorter/badge/?version=latest)](https://digital-cell-sorter.readthedocs.io/en/latest/?badge=latest)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2533377.svg)](https://doi.org/10.5281/zenodo.2533377) 
+
+Digital Cell Sorter (DCS): a single cell RNA-seq analysis toolkit for clustering, cell type identification, and anomaly detection.
+
+> **Note:** We are currently preparing a manuscript describing the toolkit located this repository.
+> If you want to access the package detailed in our latest publication of Polled Digital Cell Sorter
+> go to https://zenodo.org/record/2603265 and download the package (v1.1).
+
+
+> **The latest publication describing the methodology of cell types identification:**
+> [Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters](
+> https://doi.org/10.1186/s12859-019-2951-x 
+> "Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters")
+> Sergii Domanskyi, Anthony Szedlak, Nathaniel T Hawkins, Jiayin Wang, Giovanni Paternostro & Carlo Piermarocchi, 
+> *BMC Bioinformatics* volume 20, Article number: 369 (**2019**)
+
+
+The documentation is available at https://digital-cell-sorter.readthedocs.io/.
+
+- [Getting Started](#getting-started)
+  * [Prerequisites](#prerequisites)
+  * [Loading the package](#loading-the-package)
+  * [Gene Expression Data Format](#gene-expression-data-format)
+  * [Other Data](#other-data)
+- [Functionality](#functionality)
+  * [Overall](#overall)
+  * [Visualization](#visualization)
+- [Demo](#demo)
+  * [Usage](#usage)
+    + [Main cell types](#main-cell-types)
+    + [Cell sub-types](#cell-sub-types)
+  * [Output](#output)
+
+## Getting Started
+
+These instructions will get you a copy of the project up and running on your machine for data analysis, development or testing purposes.
+
+### Prerequisites
+
+#### Environment setup
+The software runs in Python >= 3.7
+
+It is highly recommended to install Anaconda.
+Installers are available at https://www.anaconda.com/distribution/
+Whether you already had Anaconda installed or just installed it we recommend to
+update all packages by running:
+
+	conda update conda
+
+With conda, you can create, export, list, remove, and update environments that 
+have different versions of Python and/or packages installed in them. 
+Switching or moving between environments is called activating the environment.
+
+	conda create --name DCS
+	conda activate DCS
+
+Now, in your new environment, the packages can be installed or updated without affecting
+your other environments. Note, environments use is not necessary, and the 
+default ```(base)``` is used if you dont set up any other. For more information see
+https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
+
+
+> Note: use of conda environments (for instance DCS exemplified above)
+> with a high performance computer
+> such as MSU HPCC in a batch job, i.e. not on a development node but 
+> submitted to a SLURM queue, requires the following steps. In the slurm 
+> script, before line calling your python script, 
+> add ```conda deactivate``` to deactivate base environnment
+> and ```conda activate DCS```. After calling the script 
+> do ```conda deactivate```. The example testing script is shown below.
+
+
+<details closed><summary>SLURM script example:</summary><p>
+
+	#!/bin/bash --login
+	########## Define Resources Needed with SBATCH Lines ##########
+	#SBATCH --time=00:01:00             # limit of wall clock time - how long the job will run (same as -t)
+	#SBATCH --ntasks=1                  # number of tasks - how many tasks (nodes) that you require (same as -n)
+	#SBATCH --cpus-per-task=1           # number of CPUs (or cores) per task (same as -c)
+	#SBATCH --mem=1G                    # memory required per node - amount of memory (in bytes)
+	##SBATCH --job-name Name_of_Job     # you can give your job a name for easier identification (same as -J)
+
+	########## Command Lines to Run ##########
+	conda deactivate
+	conda activate DCS 
+	cd ./                               ### change to the directory where your code is located 
+	python test.py        		    ### call your executable
+	scontrol show job $SLURM_JOB_ID     ### write job information to output file
+	conda deactivate
+
+where ```test.py``` is the python script where you import and use 
+```DigitalCellSorter```.
+
+</p></details>
+
+
+#### Installation of the DigitalCellSorter package
+
+Install ```DigitalCellSorter``` with ```pip```. Most of the dependencies packages 
+are automatically installed with installation of the  latest release 
+of ```DigitalCellSorter```:
+
+	pip install DigitalCellSorter
+
+Alternatively, you can clone and install this module directly from GitHub using:
+
+	pip install git+https://github.com/sdomanskyi/DigitalCellSorter
+
+Similarly, one can create a local copy of this project for development purposes, and 
+install the package from the cloned directory:
+
+	git clone https://github.com/sdomanskyi/DigitalCellSorter
+	python setup.py install
+
+Our software uses packages ```numpy```, ```pandas```, ```matplotlib```, 
+```scikit-learn```, ```scipy```, ```mygene```, ```fftw```, 
+```fitsne```, ```adjustText``` and a few other standard Python packages. 
+Some of the packages used in ```DigitalCellSorter``` are not installed by default, 
+and should by installed by separately if using certain functionality with 
+Digital Cell Sorter. For example, for network-based clustering
+install packages ```pynndescent```, ```networkx```, ```python-louvain```. 
+Other packages that have to be installed separately are ```fitsne```, ```umap```, 
+```phate``` and ```orca```. The detailed instructions are below.
+
+#### t-SNE
+With datasets containing less than 2000 cells ```sklearn.manifold.TSNE``` is used.
+For large datasets Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
+implemented by **KlugerLab** is used (https://github.com/KlugerLab/FIt-SNE).
+To use FIt-SNE the following need to be installed. First update ```cython``` by
+
+	pip install --upgrade cython
+
+Then install ```fftw``` from the ```conda-forge``` channel 
+add ```conda-forge``` to your channels, and install ```fftw```:
+
+	conda config --add channels conda-forge
+	conda install fftw
+
+The next installation step is platform specific. To install FI-tSNE for Linux:
+
+	pip install fitsne
+
+On macOS Mojave C++ compiler has to be specified explicitly:
+
+	env CC=clang CXX=clang++ pip install fitsne
+
+On Windows the FI-tSNE wrapper and executable are already 
+included with ```DigitalCellSorter```. 
+
+#### Other layouts
+
+To use UMAP layout
+
+	pip install umap-learn
+
+To use PHATE 
+
+	pip install phate
+
+> Note, if neither ```fitsne```, ```umap``` nor ```phate``` are installed 
+> ```DigitalCellSorter``` defaults to PCA two largest principal components for 
+> visualization layout.
+
+#### Interactive HTML figures
+To use Sankey diagrams that are part of Digital Cell Sorter 
+install ```plotly``` and ```orca```:
+
+    conda install -c plotly plotly-orca
+    conda install -c anaconda psutil
+
+See 
+[interactive Hopfield landscape figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/dataName_energy_landscape_PC1_vs_PC0.html "Hopfield attractors figure")
+and 
+[interactive Sankey diagram figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/PanglaoDB_Sankey_SRS3296611.html "Sankey diagram of cell annotation") 
+in a browser.
+
+```orca	``` is necessary to convert Sankey diagrams to static images.
+If for any reason ```orca``` is unavailable the Sankey diagrams will be saved as 
+ineractive HTML figure, that can be opened in a browser (Chrome, Firefox etc.) and 
+saved as static image. The visualization of ```DigitalCellSorter``` are implemented 
+with ```matplotlib```, allowing all the figures to be saved in either raster or 
+vactor format. Since ```plotly``` can convert simple ```matplotlib``` figures 
+(scatter, line, bar plots, but not heatmaps, splines or other complex patch objects) to
+ineractive HTML format ```DigitalCellSorter``` can attempt to save any of its figures
+as HTML. This is particulatly useful with ```Projection``` plots, even though the color
+bars are not rendered in HTML figures.
+
+### Loading the package
+
+Use the latest release of PyPI package. 
+
+For a quick-start demo with a dataset of ~5k PBMCs execute 
+in the terminal and follow prompts:
+
+	python -m DigitalCellSorter
+
+The second, more detailed demonstration analysis with step-by-step 
+explanation is discussed here
+and in the demo section at the end of this README.
+In your script import the package:
+
+	import DigitalCellSorter
+
+Create an instance of class ```DigitalCellSorter```. Here, for simplicity, we use Default parameter values:
+
+	DCS = DigitalCellSorter.DigitalCellSorter()
+
+During the initialization a number of parameters can be specified. For detailed list see documentation.
+Many of these parameters are transfered to DCS attributes thus can be modified after initialization using, e.g.:
+
+	DCS.toggleMakeStackedBarplot = False
+
+
+
+### Gene Expression Data Format
+
+The input gene expression data is expected in one of the following formats:
+
+1. Spreadsheet of comma-separated values ```csv``` containing condensed matrix in a form ```('cell', 'gene', 'expr')```. 
+If there are batches in the data the matrix has to be of the form ```('batch', 'cell', 'gene', 'expr')```. Columns order can be arbitrary.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell | gene | expr |
+|------|------|------|
+| C1   | G1   | 3    |
+| C1   | G2   | 2    |
+| C1   | G3   | 1    |
+| C2   | G1   | 1    |
+| C2   | G4   | 5    |
+| ...  | ...  | ...  |
+
+or:
+
+| batch  | cell | gene | expr |
+|--------|------|------|------|
+| batch0 | C1   | G1   | 3    |
+| batch0 | C1   | G2   | 2    |
+| batch0 | C1   | G3   | 1    |
+| batch1 | C2   | G1   | 1    |
+| batch1 | C2   | G4   | 5    |
+| ...    | ...  | ...  | ...  |
+
+</p></details>
+
+
+2. Spreadsheet of comma-separated values ```csv``` where rows are genes, columns are cells with gene expression counts.
+If there are batches in the data the spreadsheet the first row should be ```'batch'``` and the second ```'cell'```.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell  | C1     | C2     | C3     | C4     |
+|-------|--------|--------|--------|--------|
+| G1    |        | 3      | 1      | 7      |
+| G2    | 2      | 2      |        | 2      |
+| G3    | 3      | 1      |        | 5      |
+| G4    | 10     |        | 5      | 4      |
+| ...   | ...    | ...    | ...    | ...    |
+
+or:
+
+| batch | batch0 | batch0 | batch1 | batch1 |
+|-------|--------|--------|--------|--------|
+| cell  | C1     | C2     | C3     | C4     |
+| G1    |        | 3      | 1      | 7      |
+| G2    | 2      | 2      |        | 2      |
+| G3    | 3      | 1      |        | 5      |
+| G4    | 10     |        | 5      | 4      |
+| ...   | ...    | ...    | ...    | ...    |
+
+</p></details>
+
+3. ```Pandas DataFrame``` where ```axis 0``` is genes and ```axis 1``` are cells.
+If the are batched in the data then the index of ```axis 1``` should have two levels, e.g. ```('batch', 'cell')```, 
+with the first level indicating patient, batch or expreriment where that cell was sequenced, and the
+second level containing cell barcodes for identification.
+
+<details closed><summary>Examples:</summary><p>
+
+    df = pd.DataFrame(data=[[2,np.nan],[3,8],[3,5],[np.nan,1]], 
+                      index=['G1','G2','G3','G4'], 
+                      columns=pd.MultiIndex.from_arrays([['batch0','batch1'],['C1','C2']], names=['batch', 'cell']))    
+
+
+</p></details>
+
+4. ```Pandas Series ``` where index should have two levels, e.g. ```('cell', 'gene')```. If there are batched in the data
+the first level should be indicating patient, batch or expreriment where that cell was sequenced, the second level cell barcodes for 
+identification and the third level gene names.
+
+<details closed><summary>Examples:</summary><p>
+
+    se = pd.Series(data=[1,8,3,5,5], 
+                   index=pd.MultiIndex.from_arrays([['batch0','batch0','batch1','batch1','batch1'],
+                                                    ['C1','C1','C1','C2','C2'],
+                                                    ['G1','G2','G3','G1','G4']], names=['batch', 'cell', 'gene']))
+
+
+</p></details>
+
+Any of the data types outlined above need to be prepared/validated with a function ```prepare()```. 
+Let us demonstrate this on the input of type 1:
+
+	df_expr = DCS.prepare('data/testData/dataFileCondensedWithBatches.tsv')
+
+### Other Data
+
+```markersDCS.xlsx```: An excel book with marker data. Rows are markers and columns are cell types. 
+'1' means that the gene is a marker for that cell type, '-1' means that this gene is not expressed in this cell type, and '0' otherwise.
+This gene marker file included in the package is used by Default. 
+If you use your own file it has to be prepared in the same format (including the two-line header). Note that only the first worksheet will be read,
+and its name can be arbitrary. The first column should contain gene names. The second row should contain cell types, and the first row how 
+those cell types are grouped. If any of the cell types need to be skipped, have "NA" in the corresponding cell of the first row of that cell type.
+
+<details closed><summary>Example:</summary><p>
+
+|A       |B            |C             |D           |E          |F                |G                         |H                           |I                        |J                         |K                  |L               |M                 |...      |
+|--------|-------------|--------------|------------|-----------|-----------------|--------------------------|----------------------------|-------------------------|--------------------------|-------------------|----------------|------------------|---------|
+|        |B cells      |B cells       |B cells     |T cells    |T cells          |T cells                   |T cells                     |T cells                  |T cells                   |T cells            |NK cells        |NK cells          |...      |
+|Marker  |B cells naive|B cells memory|Plasma cells|T cells CD8|T cells CD4 naive|T cells CD4 memory resting|T cells CD4 memory activated|T cells follicular helper|T cells regulatory (Tregs)|T cells gamma delta|NK cells resting|NK cells activated|...      |
+|ABCB4   |1            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ABCB9   |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ACAP1   |0            |0             |0           |0          |1                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ACHE    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ACP5    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADAM28  |1            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADAMDEC1|0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADAMTS3 |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADRB2   |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|AIF1    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|AIM2    |0            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ALOX15  |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ALOX5   |0            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|AMPD1   |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ANGPT4  |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|...     |...          |...           |...         |...        |...              |...                       |...                         |...                      |...                       |...                |...             |...               |...      |
+
+</p></details>
+
+```Human.MitoCarta2.0.csv```: An ```csv``` spreadsheet with human mitochondrial genes, created within work 
+[MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins](https://doi.org/10.1093/nar/gkv1003 "MitoCarta2.0")
+Sarah E. Calvo, Karl R. Clauser, Vamsi K. Mootha, *Nucleic Acids Research*, Volume 44, Issue D1, 4 January 2016.
+
+
+## Functionality
+
+### Overall
+
+The main class, DigitalCellSorter, includes tools for:
+
+  1. **Pre-preprocessing**
+  2. **Quality control**
+  3. **Batch effects correction**
+  4. **Cells anomaly score evaluation**
+  4. **Dimensionality reduction**
+  5. **Clustering**
+  6. **Annotating cell types**
+  7. **Vizualization**  
+  8. **Post-processing**.
+
+
+### Visualization
+
+Function ```visualize()``` will produce most of the necessary files for post-analysis of the data. 
+
+See examples of the visualization tools below.
+
+
+<details closed><summary>The visualization tools include:</summary><p>
+
+- ```makeMarkerExpressionPlot()```: a heatmap that shows all markers and their expression levels in the clusters, 
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_marker_expression.png?raw=true" width="1000"/>
+</p>
+
+- ```getIndividualGeneExpressionPlot()```:  2D layout colored by individual gene's expression
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD19_(B4_CVID3_CD19).png?raw=true" width="400"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD33_(SIGLEC-3_CD33_p67_SIGLEC3).png?raw=true" width="400"/>
+</p>
+
+- ```makeVotingResultsMatrixPlot()```: z-scores of the voting results for each input cell type and each cluster, 
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_scores_matrix.png?raw=true" height="700"/>
+</p>
+
+- ```makeHistogramNullDistributionPlot()```: null distribution for each cluster and each cell type illustrating 
+the "machinery" of the Digital Cell Sorter
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_null_distributions.png?raw=true" width="800"/>
+</p>
+
+- ```makeQualityControlHistogramPlot()```: Quality control histogram plots
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_number_of_genes_histogram.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_count_depth_histogram.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_fraction_of_mitochondrialGenes_histogram.png?raw=true" width="250"/>
+</p>
+
+- ```makeProjectionPlot()```: 2D layout colored by number of unique genes expressed, 
+number of counts measured, and a faraction of mitochondrial genes..
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_number_of_genes.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_count_depth.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_fraction_of_mitochondrialGenes.png?raw=true" width="250"/>
+</p>
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_is_quality_cell.png?raw=true" width="500"/>
+</p>
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters.png?raw=true" width="375"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+Effect of batch correction demostrated on combining BM1, BM2, BM3 and processing the data jointly without (left) and with (right) batch correction option:
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_no_corr_clusters_by_patients.png?raw=true" width="375"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_with_corr_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+- ```makeStackedBarplot()```: plot with fractions of various cell types
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters_annotated.png?raw=true" width="500"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_subclustering_stacked_barplot_BM1.png?raw=true" height="500"/>
+</p>
+
+
+- ```makeSankeyDiagram()```: river plot to compare various results 
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/Sankey_example.png?raw=true" width="800"/>
+</p>
+
+- ```getAnomalyScoresPlot()```: plot with anomaly scores per cell
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_All.png?raw=true" width="750"/>
+</p>
+
+Calculate and plot anomaly scores for an arbitrary cell type or cluster:
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_B_cells.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_T_cells.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_cluster_7.0.0.png?raw=true" width="250"/>
+</p>
+
+
+- ```getIndividualGeneTtestPlot()```: Produce heatmap plot of t-test p-Values calculated gene-pair-wise
+        from the annotated clusters
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_ttest_CD4_(CD4_CD4mut).png?raw=true" width="500"/>
+</p>
+
+
+- ```makePlotOfNewMarkers()```: genes significantly expressed in the annotated cell types
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_new_markers.png?raw=true" width="1000"/>
+</p>
+
+</p></details>
+
+
+## Demo
+
+### Usage
+
+We have made an example execution file ```demo.py``` that shows how to use ```DigitalCellSorter```.
+
+In the demo, folder ```data``` is intentionally left empty. 
+The data file (cc95ff89-2e68-4a08-a234-480eca21ce79.homo_sapiens.mtx.zip) is about 2.4Gb in size and
+will be downloaded with the ```demo.py``` script.
+
+> Previously the HCA preview data was consolidated in file ```ica_bone_marrow_h5.h5``` and downloadable  
+> from https://preview.data.humancellatlas.org/ (Raw Counts Matrix - Bone Marrow). 
+> That file was ~485Mb and containing 378000 cells from 8 bone marrow donors (BM1-BM8). 
+
+See details of the script ```demo.py``` at:
+
+> [Example walkthrough of demo.py script](https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/ "Examples")
+
+
+To execute the complete script ```demo.py``` run:
+
+	python demo.py
+
+*Note that the HCA BM1 data contains ~50000 sequenced cells, requiring more than 60Gb of RAM (we recommend to use High Performance Computers).
+If you want to run our example on a regular PC or a laptop, you can use a randomly chosen number of cells:
+
+    df_expr.sample(n=5000, axis=1)
+
+
+### Output
+
+All the output files are saved in ```output``` directory inside the directory where the ```demo.py``` script is. 
+If you specify any other directory, the results will be generetaed in it.
+If you do not provide any directory the results will appear in the root where the script was executed.
+
+
+
+
+%package help
+Summary:	Development documents and examples for DigitalCellSorter
+Provides:	python3-DigitalCellSorter-doc
+%description help
+# Digital Cell Sorter
+
+[![DOI](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter.svg)](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter)
+[![DOI](https://badge.fury.io/py/DigitalCellSorter.svg)](https://pypi.org/project/DigitalCellSorter)
+[![DOI](https://readthedocs.org/projects/digital-cell-sorter/badge/?version=latest)](https://digital-cell-sorter.readthedocs.io/en/latest/?badge=latest)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2533377.svg)](https://doi.org/10.5281/zenodo.2533377) 
+
+Digital Cell Sorter (DCS): a single cell RNA-seq analysis toolkit for clustering, cell type identification, and anomaly detection.
+
+> **Note:** We are currently preparing a manuscript describing the toolkit located this repository.
+> If you want to access the package detailed in our latest publication of Polled Digital Cell Sorter
+> go to https://zenodo.org/record/2603265 and download the package (v1.1).
+
+
+> **The latest publication describing the methodology of cell types identification:**
+> [Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters](
+> https://doi.org/10.1186/s12859-019-2951-x 
+> "Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters")
+> Sergii Domanskyi, Anthony Szedlak, Nathaniel T Hawkins, Jiayin Wang, Giovanni Paternostro & Carlo Piermarocchi, 
+> *BMC Bioinformatics* volume 20, Article number: 369 (**2019**)
+
+
+The documentation is available at https://digital-cell-sorter.readthedocs.io/.
+
+- [Getting Started](#getting-started)
+  * [Prerequisites](#prerequisites)
+  * [Loading the package](#loading-the-package)
+  * [Gene Expression Data Format](#gene-expression-data-format)
+  * [Other Data](#other-data)
+- [Functionality](#functionality)
+  * [Overall](#overall)
+  * [Visualization](#visualization)
+- [Demo](#demo)
+  * [Usage](#usage)
+    + [Main cell types](#main-cell-types)
+    + [Cell sub-types](#cell-sub-types)
+  * [Output](#output)
+
+## Getting Started
+
+These instructions will get you a copy of the project up and running on your machine for data analysis, development or testing purposes.
+
+### Prerequisites
+
+#### Environment setup
+The software runs in Python >= 3.7
+
+It is highly recommended to install Anaconda.
+Installers are available at https://www.anaconda.com/distribution/
+Whether you already had Anaconda installed or just installed it we recommend to
+update all packages by running:
+
+	conda update conda
+
+With conda, you can create, export, list, remove, and update environments that 
+have different versions of Python and/or packages installed in them. 
+Switching or moving between environments is called activating the environment.
+
+	conda create --name DCS
+	conda activate DCS
+
+Now, in your new environment, the packages can be installed or updated without affecting
+your other environments. Note, environments use is not necessary, and the 
+default ```(base)``` is used if you dont set up any other. For more information see
+https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
+
+
+> Note: use of conda environments (for instance DCS exemplified above)
+> with a high performance computer
+> such as MSU HPCC in a batch job, i.e. not on a development node but 
+> submitted to a SLURM queue, requires the following steps. In the slurm 
+> script, before line calling your python script, 
+> add ```conda deactivate``` to deactivate base environnment
+> and ```conda activate DCS```. After calling the script 
+> do ```conda deactivate```. The example testing script is shown below.
+
+
+<details closed><summary>SLURM script example:</summary><p>
+
+	#!/bin/bash --login
+	########## Define Resources Needed with SBATCH Lines ##########
+	#SBATCH --time=00:01:00             # limit of wall clock time - how long the job will run (same as -t)
+	#SBATCH --ntasks=1                  # number of tasks - how many tasks (nodes) that you require (same as -n)
+	#SBATCH --cpus-per-task=1           # number of CPUs (or cores) per task (same as -c)
+	#SBATCH --mem=1G                    # memory required per node - amount of memory (in bytes)
+	##SBATCH --job-name Name_of_Job     # you can give your job a name for easier identification (same as -J)
+
+	########## Command Lines to Run ##########
+	conda deactivate
+	conda activate DCS 
+	cd ./                               ### change to the directory where your code is located 
+	python test.py        		    ### call your executable
+	scontrol show job $SLURM_JOB_ID     ### write job information to output file
+	conda deactivate
+
+where ```test.py``` is the python script where you import and use 
+```DigitalCellSorter```.
+
+</p></details>
+
+
+#### Installation of the DigitalCellSorter package
+
+Install ```DigitalCellSorter``` with ```pip```. Most of the dependencies packages 
+are automatically installed with installation of the  latest release 
+of ```DigitalCellSorter```:
+
+	pip install DigitalCellSorter
+
+Alternatively, you can clone and install this module directly from GitHub using:
+
+	pip install git+https://github.com/sdomanskyi/DigitalCellSorter
+
+Similarly, one can create a local copy of this project for development purposes, and 
+install the package from the cloned directory:
+
+	git clone https://github.com/sdomanskyi/DigitalCellSorter
+	python setup.py install
+
+Our software uses packages ```numpy```, ```pandas```, ```matplotlib```, 
+```scikit-learn```, ```scipy```, ```mygene```, ```fftw```, 
+```fitsne```, ```adjustText``` and a few other standard Python packages. 
+Some of the packages used in ```DigitalCellSorter``` are not installed by default, 
+and should by installed by separately if using certain functionality with 
+Digital Cell Sorter. For example, for network-based clustering
+install packages ```pynndescent```, ```networkx```, ```python-louvain```. 
+Other packages that have to be installed separately are ```fitsne```, ```umap```, 
+```phate``` and ```orca```. The detailed instructions are below.
+
+#### t-SNE
+With datasets containing less than 2000 cells ```sklearn.manifold.TSNE``` is used.
+For large datasets Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
+implemented by **KlugerLab** is used (https://github.com/KlugerLab/FIt-SNE).
+To use FIt-SNE the following need to be installed. First update ```cython``` by
+
+	pip install --upgrade cython
+
+Then install ```fftw``` from the ```conda-forge``` channel 
+add ```conda-forge``` to your channels, and install ```fftw```:
+
+	conda config --add channels conda-forge
+	conda install fftw
+
+The next installation step is platform specific. To install FI-tSNE for Linux:
+
+	pip install fitsne
+
+On macOS Mojave C++ compiler has to be specified explicitly:
+
+	env CC=clang CXX=clang++ pip install fitsne
+
+On Windows the FI-tSNE wrapper and executable are already 
+included with ```DigitalCellSorter```. 
+
+#### Other layouts
+
+To use UMAP layout
+
+	pip install umap-learn
+
+To use PHATE 
+
+	pip install phate
+
+> Note, if neither ```fitsne```, ```umap``` nor ```phate``` are installed 
+> ```DigitalCellSorter``` defaults to PCA two largest principal components for 
+> visualization layout.
+
+#### Interactive HTML figures
+To use Sankey diagrams that are part of Digital Cell Sorter 
+install ```plotly``` and ```orca```:
+
+    conda install -c plotly plotly-orca
+    conda install -c anaconda psutil
+
+See 
+[interactive Hopfield landscape figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/dataName_energy_landscape_PC1_vs_PC0.html "Hopfield attractors figure")
+and 
+[interactive Sankey diagram figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/PanglaoDB_Sankey_SRS3296611.html "Sankey diagram of cell annotation") 
+in a browser.
+
+```orca	``` is necessary to convert Sankey diagrams to static images.
+If for any reason ```orca``` is unavailable the Sankey diagrams will be saved as 
+ineractive HTML figure, that can be opened in a browser (Chrome, Firefox etc.) and 
+saved as static image. The visualization of ```DigitalCellSorter``` are implemented 
+with ```matplotlib```, allowing all the figures to be saved in either raster or 
+vactor format. Since ```plotly``` can convert simple ```matplotlib``` figures 
+(scatter, line, bar plots, but not heatmaps, splines or other complex patch objects) to
+ineractive HTML format ```DigitalCellSorter``` can attempt to save any of its figures
+as HTML. This is particulatly useful with ```Projection``` plots, even though the color
+bars are not rendered in HTML figures.
+
+### Loading the package
+
+Use the latest release of PyPI package. 
+
+For a quick-start demo with a dataset of ~5k PBMCs execute 
+in the terminal and follow prompts:
+
+	python -m DigitalCellSorter
+
+The second, more detailed demonstration analysis with step-by-step 
+explanation is discussed here
+and in the demo section at the end of this README.
+In your script import the package:
+
+	import DigitalCellSorter
+
+Create an instance of class ```DigitalCellSorter```. Here, for simplicity, we use Default parameter values:
+
+	DCS = DigitalCellSorter.DigitalCellSorter()
+
+During the initialization a number of parameters can be specified. For detailed list see documentation.
+Many of these parameters are transfered to DCS attributes thus can be modified after initialization using, e.g.:
+
+	DCS.toggleMakeStackedBarplot = False
+
+
+
+### Gene Expression Data Format
+
+The input gene expression data is expected in one of the following formats:
+
+1. Spreadsheet of comma-separated values ```csv``` containing condensed matrix in a form ```('cell', 'gene', 'expr')```. 
+If there are batches in the data the matrix has to be of the form ```('batch', 'cell', 'gene', 'expr')```. Columns order can be arbitrary.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell | gene | expr |
+|------|------|------|
+| C1   | G1   | 3    |
+| C1   | G2   | 2    |
+| C1   | G3   | 1    |
+| C2   | G1   | 1    |
+| C2   | G4   | 5    |
+| ...  | ...  | ...  |
+
+or:
+
+| batch  | cell | gene | expr |
+|--------|------|------|------|
+| batch0 | C1   | G1   | 3    |
+| batch0 | C1   | G2   | 2    |
+| batch0 | C1   | G3   | 1    |
+| batch1 | C2   | G1   | 1    |
+| batch1 | C2   | G4   | 5    |
+| ...    | ...  | ...  | ...  |
+
+</p></details>
+
+
+2. Spreadsheet of comma-separated values ```csv``` where rows are genes, columns are cells with gene expression counts.
+If there are batches in the data the spreadsheet the first row should be ```'batch'``` and the second ```'cell'```.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell  | C1     | C2     | C3     | C4     |
+|-------|--------|--------|--------|--------|
+| G1    |        | 3      | 1      | 7      |
+| G2    | 2      | 2      |        | 2      |
+| G3    | 3      | 1      |        | 5      |
+| G4    | 10     |        | 5      | 4      |
+| ...   | ...    | ...    | ...    | ...    |
+
+or:
+
+| batch | batch0 | batch0 | batch1 | batch1 |
+|-------|--------|--------|--------|--------|
+| cell  | C1     | C2     | C3     | C4     |
+| G1    |        | 3      | 1      | 7      |
+| G2    | 2      | 2      |        | 2      |
+| G3    | 3      | 1      |        | 5      |
+| G4    | 10     |        | 5      | 4      |
+| ...   | ...    | ...    | ...    | ...    |
+
+</p></details>
+
+3. ```Pandas DataFrame``` where ```axis 0``` is genes and ```axis 1``` are cells.
+If the are batched in the data then the index of ```axis 1``` should have two levels, e.g. ```('batch', 'cell')```, 
+with the first level indicating patient, batch or expreriment where that cell was sequenced, and the
+second level containing cell barcodes for identification.
+
+<details closed><summary>Examples:</summary><p>
+
+    df = pd.DataFrame(data=[[2,np.nan],[3,8],[3,5],[np.nan,1]], 
+                      index=['G1','G2','G3','G4'], 
+                      columns=pd.MultiIndex.from_arrays([['batch0','batch1'],['C1','C2']], names=['batch', 'cell']))    
+
+
+</p></details>
+
+4. ```Pandas Series ``` where index should have two levels, e.g. ```('cell', 'gene')```. If there are batched in the data
+the first level should be indicating patient, batch or expreriment where that cell was sequenced, the second level cell barcodes for 
+identification and the third level gene names.
+
+<details closed><summary>Examples:</summary><p>
+
+    se = pd.Series(data=[1,8,3,5,5], 
+                   index=pd.MultiIndex.from_arrays([['batch0','batch0','batch1','batch1','batch1'],
+                                                    ['C1','C1','C1','C2','C2'],
+                                                    ['G1','G2','G3','G1','G4']], names=['batch', 'cell', 'gene']))
+
+
+</p></details>
+
+Any of the data types outlined above need to be prepared/validated with a function ```prepare()```. 
+Let us demonstrate this on the input of type 1:
+
+	df_expr = DCS.prepare('data/testData/dataFileCondensedWithBatches.tsv')
+
+### Other Data
+
+```markersDCS.xlsx```: An excel book with marker data. Rows are markers and columns are cell types. 
+'1' means that the gene is a marker for that cell type, '-1' means that this gene is not expressed in this cell type, and '0' otherwise.
+This gene marker file included in the package is used by Default. 
+If you use your own file it has to be prepared in the same format (including the two-line header). Note that only the first worksheet will be read,
+and its name can be arbitrary. The first column should contain gene names. The second row should contain cell types, and the first row how 
+those cell types are grouped. If any of the cell types need to be skipped, have "NA" in the corresponding cell of the first row of that cell type.
+
+<details closed><summary>Example:</summary><p>
+
+|A       |B            |C             |D           |E          |F                |G                         |H                           |I                        |J                         |K                  |L               |M                 |...      |
+|--------|-------------|--------------|------------|-----------|-----------------|--------------------------|----------------------------|-------------------------|--------------------------|-------------------|----------------|------------------|---------|
+|        |B cells      |B cells       |B cells     |T cells    |T cells          |T cells                   |T cells                     |T cells                  |T cells                   |T cells            |NK cells        |NK cells          |...      |
+|Marker  |B cells naive|B cells memory|Plasma cells|T cells CD8|T cells CD4 naive|T cells CD4 memory resting|T cells CD4 memory activated|T cells follicular helper|T cells regulatory (Tregs)|T cells gamma delta|NK cells resting|NK cells activated|...      |
+|ABCB4   |1            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ABCB9   |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ACAP1   |0            |0             |0           |0          |1                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ACHE    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ACP5    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADAM28  |1            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADAMDEC1|0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADAMTS3 |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ADRB2   |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|AIF1    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|AIM2    |0            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ALOX15  |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ALOX5   |0            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|AMPD1   |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|ANGPT4  |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
+|...     |...          |...           |...         |...        |...              |...                       |...                         |...                      |...                       |...                |...             |...               |...      |
+
+</p></details>
+
+```Human.MitoCarta2.0.csv```: An ```csv``` spreadsheet with human mitochondrial genes, created within work 
+[MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins](https://doi.org/10.1093/nar/gkv1003 "MitoCarta2.0")
+Sarah E. Calvo, Karl R. Clauser, Vamsi K. Mootha, *Nucleic Acids Research*, Volume 44, Issue D1, 4 January 2016.
+
+
+## Functionality
+
+### Overall
+
+The main class, DigitalCellSorter, includes tools for:
+
+  1. **Pre-preprocessing**
+  2. **Quality control**
+  3. **Batch effects correction**
+  4. **Cells anomaly score evaluation**
+  4. **Dimensionality reduction**
+  5. **Clustering**
+  6. **Annotating cell types**
+  7. **Vizualization**  
+  8. **Post-processing**.
+
+
+### Visualization
+
+Function ```visualize()``` will produce most of the necessary files for post-analysis of the data. 
+
+See examples of the visualization tools below.
+
+
+<details closed><summary>The visualization tools include:</summary><p>
+
+- ```makeMarkerExpressionPlot()```: a heatmap that shows all markers and their expression levels in the clusters, 
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_marker_expression.png?raw=true" width="1000"/>
+</p>
+
+- ```getIndividualGeneExpressionPlot()```:  2D layout colored by individual gene's expression
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD19_(B4_CVID3_CD19).png?raw=true" width="400"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD33_(SIGLEC-3_CD33_p67_SIGLEC3).png?raw=true" width="400"/>
+</p>
+
+- ```makeVotingResultsMatrixPlot()```: z-scores of the voting results for each input cell type and each cluster, 
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_scores_matrix.png?raw=true" height="700"/>
+</p>
+
+- ```makeHistogramNullDistributionPlot()```: null distribution for each cluster and each cell type illustrating 
+the "machinery" of the Digital Cell Sorter
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_null_distributions.png?raw=true" width="800"/>
+</p>
+
+- ```makeQualityControlHistogramPlot()```: Quality control histogram plots
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_number_of_genes_histogram.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_count_depth_histogram.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_fraction_of_mitochondrialGenes_histogram.png?raw=true" width="250"/>
+</p>
+
+- ```makeProjectionPlot()```: 2D layout colored by number of unique genes expressed, 
+number of counts measured, and a faraction of mitochondrial genes..
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_number_of_genes.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_count_depth.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_fraction_of_mitochondrialGenes.png?raw=true" width="250"/>
+</p>
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_is_quality_cell.png?raw=true" width="500"/>
+</p>
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters.png?raw=true" width="375"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+Effect of batch correction demostrated on combining BM1, BM2, BM3 and processing the data jointly without (left) and with (right) batch correction option:
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_no_corr_clusters_by_patients.png?raw=true" width="375"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_with_corr_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+- ```makeStackedBarplot()```: plot with fractions of various cell types
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters_annotated.png?raw=true" width="500"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_subclustering_stacked_barplot_BM1.png?raw=true" height="500"/>
+</p>
+
+
+- ```makeSankeyDiagram()```: river plot to compare various results 
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/Sankey_example.png?raw=true" width="800"/>
+</p>
+
+- ```getAnomalyScoresPlot()```: plot with anomaly scores per cell
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_All.png?raw=true" width="750"/>
+</p>
+
+Calculate and plot anomaly scores for an arbitrary cell type or cluster:
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_B_cells.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_T_cells.png?raw=true" width="250"/>
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_cluster_7.0.0.png?raw=true" width="250"/>
+</p>
+
+
+- ```getIndividualGeneTtestPlot()```: Produce heatmap plot of t-test p-Values calculated gene-pair-wise
+        from the annotated clusters
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_ttest_CD4_(CD4_CD4mut).png?raw=true" width="500"/>
+</p>
+
+
+- ```makePlotOfNewMarkers()```: genes significantly expressed in the annotated cell types
+
+<p align="middle">
+	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_new_markers.png?raw=true" width="1000"/>
+</p>
+
+</p></details>
+
+
+## Demo
+
+### Usage
+
+We have made an example execution file ```demo.py``` that shows how to use ```DigitalCellSorter```.
+
+In the demo, folder ```data``` is intentionally left empty. 
+The data file (cc95ff89-2e68-4a08-a234-480eca21ce79.homo_sapiens.mtx.zip) is about 2.4Gb in size and
+will be downloaded with the ```demo.py``` script.
+
+> Previously the HCA preview data was consolidated in file ```ica_bone_marrow_h5.h5``` and downloadable  
+> from https://preview.data.humancellatlas.org/ (Raw Counts Matrix - Bone Marrow). 
+> That file was ~485Mb and containing 378000 cells from 8 bone marrow donors (BM1-BM8). 
+
+See details of the script ```demo.py``` at:
+
+> [Example walkthrough of demo.py script](https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/ "Examples")
+
+
+To execute the complete script ```demo.py``` run:
+
+	python demo.py
+
+*Note that the HCA BM1 data contains ~50000 sequenced cells, requiring more than 60Gb of RAM (we recommend to use High Performance Computers).
+If you want to run our example on a regular PC or a laptop, you can use a randomly chosen number of cells:
+
+    df_expr.sample(n=5000, axis=1)
+
+
+### Output
+
+All the output files are saved in ```output``` directory inside the directory where the ```demo.py``` script is. 
+If you specify any other directory, the results will be generetaed in it.
+If you do not provide any directory the results will appear in the root where the script was executed.
+
+
+
+
+%prep
+%autosetup -n DigitalCellSorter-1.3.7.6
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-DigitalCellSorter -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 1.3.7.6-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..2ec9777
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+5603baa4e95acbbd191303f9b914693b  DigitalCellSorter-1.3.7.6.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-05 11:57:11 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-05 11:57:11 +0000
commit	03da39f05058d58edf9129f6c67831ad8c1c5417 (patch)
tree	8a0a467189ba35cf93b0007b1444c12d954763eb
parent	03a130a68836c5381177520d0ddb26ef35df1860 (diff)