summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-05 11:57:11 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-05 11:57:11 +0000
commit03da39f05058d58edf9129f6c67831ad8c1c5417 (patch)
tree8a0a467189ba35cf93b0007b1444c12d954763eb
parent03a130a68836c5381177520d0ddb26ef35df1860 (diff)
automatic import of python-digitalcellsorteropeneuler20.03
-rw-r--r--.gitignore1
-rw-r--r--python-digitalcellsorter.spec1638
-rw-r--r--sources1
3 files changed, 1640 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..78bb768 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/DigitalCellSorter-1.3.7.6.tar.gz
diff --git a/python-digitalcellsorter.spec b/python-digitalcellsorter.spec
new file mode 100644
index 0000000..53de811
--- /dev/null
+++ b/python-digitalcellsorter.spec
@@ -0,0 +1,1638 @@
+%global _empty_manifest_terminate_build 0
+Name: python-DigitalCellSorter
+Version: 1.3.7.6
+Release: 1
+Summary: Toolkit for analysis and identification of cell types from heterogeneous single cell RNA-seq data
+License: MIT License
+URL: https://github.com/sdomanskyi/DigitalCellSorter
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/aa/ff/ba4a40eb753c899973e638e701194d198051f9f21076f68d76147ee347f5/DigitalCellSorter-1.3.7.6.tar.gz
+BuildArch: noarch
+
+Requires: python3-numpy
+Requires: python3-pandas
+Requires: python3-patsy
+Requires: python3-xlrd
+Requires: python3-openpyxl
+Requires: python3-tables
+Requires: python3-scipy
+Requires: python3-matplotlib
+Requires: python3-scikit-learn
+Requires: python3-mygene
+Requires: python3-plotly
+Requires: python3-adjustText
+
+%description
+# Digital Cell Sorter
+
+[![DOI](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter.svg)](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter)
+[![DOI](https://badge.fury.io/py/DigitalCellSorter.svg)](https://pypi.org/project/DigitalCellSorter)
+[![DOI](https://readthedocs.org/projects/digital-cell-sorter/badge/?version=latest)](https://digital-cell-sorter.readthedocs.io/en/latest/?badge=latest)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2533377.svg)](https://doi.org/10.5281/zenodo.2533377)
+
+Digital Cell Sorter (DCS): a single cell RNA-seq analysis toolkit for clustering, cell type identification, and anomaly detection.
+
+> **Note:** We are currently preparing a manuscript describing the toolkit located this repository.
+> If you want to access the package detailed in our latest publication of Polled Digital Cell Sorter
+> go to https://zenodo.org/record/2603265 and download the package (v1.1).
+
+
+> **The latest publication describing the methodology of cell types identification:**
+> [Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters](
+> https://doi.org/10.1186/s12859-019-2951-x
+> "Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters")
+> Sergii Domanskyi, Anthony Szedlak, Nathaniel T Hawkins, Jiayin Wang, Giovanni Paternostro & Carlo Piermarocchi,
+> *BMC Bioinformatics* volume 20, Article number: 369 (**2019**)
+
+
+The documentation is available at https://digital-cell-sorter.readthedocs.io/.
+
+- [Getting Started](#getting-started)
+ * [Prerequisites](#prerequisites)
+ * [Loading the package](#loading-the-package)
+ * [Gene Expression Data Format](#gene-expression-data-format)
+ * [Other Data](#other-data)
+- [Functionality](#functionality)
+ * [Overall](#overall)
+ * [Visualization](#visualization)
+- [Demo](#demo)
+ * [Usage](#usage)
+ + [Main cell types](#main-cell-types)
+ + [Cell sub-types](#cell-sub-types)
+ * [Output](#output)
+
+## Getting Started
+
+These instructions will get you a copy of the project up and running on your machine for data analysis, development or testing purposes.
+
+### Prerequisites
+
+#### Environment setup
+The software runs in Python >= 3.7
+
+It is highly recommended to install Anaconda.
+Installers are available at https://www.anaconda.com/distribution/
+Whether you already had Anaconda installed or just installed it we recommend to
+update all packages by running:
+
+ conda update conda
+
+With conda, you can create, export, list, remove, and update environments that
+have different versions of Python and/or packages installed in them.
+Switching or moving between environments is called activating the environment.
+
+ conda create --name DCS
+ conda activate DCS
+
+Now, in your new environment, the packages can be installed or updated without affecting
+your other environments. Note, environments use is not necessary, and the
+default ```(base)``` is used if you dont set up any other. For more information see
+https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
+
+
+> Note: use of conda environments (for instance DCS exemplified above)
+> with a high performance computer
+> such as MSU HPCC in a batch job, i.e. not on a development node but
+> submitted to a SLURM queue, requires the following steps. In the slurm
+> script, before line calling your python script,
+> add ```conda deactivate``` to deactivate base environnment
+> and ```conda activate DCS```. After calling the script
+> do ```conda deactivate```. The example testing script is shown below.
+
+
+<details closed><summary>SLURM script example:</summary><p>
+
+ #!/bin/bash --login
+ ########## Define Resources Needed with SBATCH Lines ##########
+ #SBATCH --time=00:01:00 # limit of wall clock time - how long the job will run (same as -t)
+ #SBATCH --ntasks=1 # number of tasks - how many tasks (nodes) that you require (same as -n)
+ #SBATCH --cpus-per-task=1 # number of CPUs (or cores) per task (same as -c)
+ #SBATCH --mem=1G # memory required per node - amount of memory (in bytes)
+ ##SBATCH --job-name Name_of_Job # you can give your job a name for easier identification (same as -J)
+
+ ########## Command Lines to Run ##########
+ conda deactivate
+ conda activate DCS
+ cd ./ ### change to the directory where your code is located
+ python test.py ### call your executable
+ scontrol show job $SLURM_JOB_ID ### write job information to output file
+ conda deactivate
+
+where ```test.py``` is the python script where you import and use
+```DigitalCellSorter```.
+
+</p></details>
+
+
+#### Installation of the DigitalCellSorter package
+
+Install ```DigitalCellSorter``` with ```pip```. Most of the dependencies packages
+are automatically installed with installation of the latest release
+of ```DigitalCellSorter```:
+
+ pip install DigitalCellSorter
+
+Alternatively, you can clone and install this module directly from GitHub using:
+
+ pip install git+https://github.com/sdomanskyi/DigitalCellSorter
+
+Similarly, one can create a local copy of this project for development purposes, and
+install the package from the cloned directory:
+
+ git clone https://github.com/sdomanskyi/DigitalCellSorter
+ python setup.py install
+
+Our software uses packages ```numpy```, ```pandas```, ```matplotlib```,
+```scikit-learn```, ```scipy```, ```mygene```, ```fftw```,
+```fitsne```, ```adjustText``` and a few other standard Python packages.
+Some of the packages used in ```DigitalCellSorter``` are not installed by default,
+and should by installed by separately if using certain functionality with
+Digital Cell Sorter. For example, for network-based clustering
+install packages ```pynndescent```, ```networkx```, ```python-louvain```.
+Other packages that have to be installed separately are ```fitsne```, ```umap```,
+```phate``` and ```orca```. The detailed instructions are below.
+
+#### t-SNE
+With datasets containing less than 2000 cells ```sklearn.manifold.TSNE``` is used.
+For large datasets Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
+implemented by **KlugerLab** is used (https://github.com/KlugerLab/FIt-SNE).
+To use FIt-SNE the following need to be installed. First update ```cython``` by
+
+ pip install --upgrade cython
+
+Then install ```fftw``` from the ```conda-forge``` channel
+add ```conda-forge``` to your channels, and install ```fftw```:
+
+ conda config --add channels conda-forge
+ conda install fftw
+
+The next installation step is platform specific. To install FI-tSNE for Linux:
+
+ pip install fitsne
+
+On macOS Mojave C++ compiler has to be specified explicitly:
+
+ env CC=clang CXX=clang++ pip install fitsne
+
+On Windows the FI-tSNE wrapper and executable are already
+included with ```DigitalCellSorter```.
+
+#### Other layouts
+
+To use UMAP layout
+
+ pip install umap-learn
+
+To use PHATE
+
+ pip install phate
+
+> Note, if neither ```fitsne```, ```umap``` nor ```phate``` are installed
+> ```DigitalCellSorter``` defaults to PCA two largest principal components for
+> visualization layout.
+
+#### Interactive HTML figures
+To use Sankey diagrams that are part of Digital Cell Sorter
+install ```plotly``` and ```orca```:
+
+ conda install -c plotly plotly-orca
+ conda install -c anaconda psutil
+
+See
+[interactive Hopfield landscape figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/dataName_energy_landscape_PC1_vs_PC0.html "Hopfield attractors figure")
+and
+[interactive Sankey diagram figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/PanglaoDB_Sankey_SRS3296611.html "Sankey diagram of cell annotation")
+in a browser.
+
+```orca ``` is necessary to convert Sankey diagrams to static images.
+If for any reason ```orca``` is unavailable the Sankey diagrams will be saved as
+ineractive HTML figure, that can be opened in a browser (Chrome, Firefox etc.) and
+saved as static image. The visualization of ```DigitalCellSorter``` are implemented
+with ```matplotlib```, allowing all the figures to be saved in either raster or
+vactor format. Since ```plotly``` can convert simple ```matplotlib``` figures
+(scatter, line, bar plots, but not heatmaps, splines or other complex patch objects) to
+ineractive HTML format ```DigitalCellSorter``` can attempt to save any of its figures
+as HTML. This is particulatly useful with ```Projection``` plots, even though the color
+bars are not rendered in HTML figures.
+
+### Loading the package
+
+Use the latest release of PyPI package.
+
+For a quick-start demo with a dataset of ~5k PBMCs execute
+in the terminal and follow prompts:
+
+ python -m DigitalCellSorter
+
+The second, more detailed demonstration analysis with step-by-step
+explanation is discussed here
+and in the demo section at the end of this README.
+In your script import the package:
+
+ import DigitalCellSorter
+
+Create an instance of class ```DigitalCellSorter```. Here, for simplicity, we use Default parameter values:
+
+ DCS = DigitalCellSorter.DigitalCellSorter()
+
+During the initialization a number of parameters can be specified. For detailed list see documentation.
+Many of these parameters are transfered to DCS attributes thus can be modified after initialization using, e.g.:
+
+ DCS.toggleMakeStackedBarplot = False
+
+
+
+### Gene Expression Data Format
+
+The input gene expression data is expected in one of the following formats:
+
+1. Spreadsheet of comma-separated values ```csv``` containing condensed matrix in a form ```('cell', 'gene', 'expr')```.
+If there are batches in the data the matrix has to be of the form ```('batch', 'cell', 'gene', 'expr')```. Columns order can be arbitrary.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell | gene | expr |
+|------|------|------|
+| C1 | G1 | 3 |
+| C1 | G2 | 2 |
+| C1 | G3 | 1 |
+| C2 | G1 | 1 |
+| C2 | G4 | 5 |
+| ... | ... | ... |
+
+or:
+
+| batch | cell | gene | expr |
+|--------|------|------|------|
+| batch0 | C1 | G1 | 3 |
+| batch0 | C1 | G2 | 2 |
+| batch0 | C1 | G3 | 1 |
+| batch1 | C2 | G1 | 1 |
+| batch1 | C2 | G4 | 5 |
+| ... | ... | ... | ... |
+
+</p></details>
+
+
+2. Spreadsheet of comma-separated values ```csv``` where rows are genes, columns are cells with gene expression counts.
+If there are batches in the data the spreadsheet the first row should be ```'batch'``` and the second ```'cell'```.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell | C1 | C2 | C3 | C4 |
+|-------|--------|--------|--------|--------|
+| G1 | | 3 | 1 | 7 |
+| G2 | 2 | 2 | | 2 |
+| G3 | 3 | 1 | | 5 |
+| G4 | 10 | | 5 | 4 |
+| ... | ... | ... | ... | ... |
+
+or:
+
+| batch | batch0 | batch0 | batch1 | batch1 |
+|-------|--------|--------|--------|--------|
+| cell | C1 | C2 | C3 | C4 |
+| G1 | | 3 | 1 | 7 |
+| G2 | 2 | 2 | | 2 |
+| G3 | 3 | 1 | | 5 |
+| G4 | 10 | | 5 | 4 |
+| ... | ... | ... | ... | ... |
+
+</p></details>
+
+3. ```Pandas DataFrame``` where ```axis 0``` is genes and ```axis 1``` are cells.
+If the are batched in the data then the index of ```axis 1``` should have two levels, e.g. ```('batch', 'cell')```,
+with the first level indicating patient, batch or expreriment where that cell was sequenced, and the
+second level containing cell barcodes for identification.
+
+<details closed><summary>Examples:</summary><p>
+
+ df = pd.DataFrame(data=[[2,np.nan],[3,8],[3,5],[np.nan,1]],
+ index=['G1','G2','G3','G4'],
+ columns=pd.MultiIndex.from_arrays([['batch0','batch1'],['C1','C2']], names=['batch', 'cell']))
+
+
+</p></details>
+
+4. ```Pandas Series ``` where index should have two levels, e.g. ```('cell', 'gene')```. If there are batched in the data
+the first level should be indicating patient, batch or expreriment where that cell was sequenced, the second level cell barcodes for
+identification and the third level gene names.
+
+<details closed><summary>Examples:</summary><p>
+
+ se = pd.Series(data=[1,8,3,5,5],
+ index=pd.MultiIndex.from_arrays([['batch0','batch0','batch1','batch1','batch1'],
+ ['C1','C1','C1','C2','C2'],
+ ['G1','G2','G3','G1','G4']], names=['batch', 'cell', 'gene']))
+
+
+</p></details>
+
+Any of the data types outlined above need to be prepared/validated with a function ```prepare()```.
+Let us demonstrate this on the input of type 1:
+
+ df_expr = DCS.prepare('data/testData/dataFileCondensedWithBatches.tsv')
+
+### Other Data
+
+```markersDCS.xlsx```: An excel book with marker data. Rows are markers and columns are cell types.
+'1' means that the gene is a marker for that cell type, '-1' means that this gene is not expressed in this cell type, and '0' otherwise.
+This gene marker file included in the package is used by Default.
+If you use your own file it has to be prepared in the same format (including the two-line header). Note that only the first worksheet will be read,
+and its name can be arbitrary. The first column should contain gene names. The second row should contain cell types, and the first row how
+those cell types are grouped. If any of the cell types need to be skipped, have "NA" in the corresponding cell of the first row of that cell type.
+
+<details closed><summary>Example:</summary><p>
+
+|A |B |C |D |E |F |G |H |I |J |K |L |M |... |
+|--------|-------------|--------------|------------|-----------|-----------------|--------------------------|----------------------------|-------------------------|--------------------------|-------------------|----------------|------------------|---------|
+| |B cells |B cells |B cells |T cells |T cells |T cells |T cells |T cells |T cells |T cells |NK cells |NK cells |... |
+|Marker |B cells naive|B cells memory|Plasma cells|T cells CD8|T cells CD4 naive|T cells CD4 memory resting|T cells CD4 memory activated|T cells follicular helper|T cells regulatory (Tregs)|T cells gamma delta|NK cells resting|NK cells activated|... |
+|ABCB4 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ABCB9 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ACAP1 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |... |
+|ACHE |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ACP5 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADAM28 |1 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADAMDEC1|0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADAMTS3 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADRB2 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|AIF1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|AIM2 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ALOX15 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ALOX5 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|AMPD1 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ANGPT4 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|... |... |... |... |... |... |... |... |... |... |... |... |... |... |
+
+</p></details>
+
+```Human.MitoCarta2.0.csv```: An ```csv``` spreadsheet with human mitochondrial genes, created within work
+[MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins](https://doi.org/10.1093/nar/gkv1003 "MitoCarta2.0")
+Sarah E. Calvo, Karl R. Clauser, Vamsi K. Mootha, *Nucleic Acids Research*, Volume 44, Issue D1, 4 January 2016.
+
+
+## Functionality
+
+### Overall
+
+The main class, DigitalCellSorter, includes tools for:
+
+ 1. **Pre-preprocessing**
+ 2. **Quality control**
+ 3. **Batch effects correction**
+ 4. **Cells anomaly score evaluation**
+ 4. **Dimensionality reduction**
+ 5. **Clustering**
+ 6. **Annotating cell types**
+ 7. **Vizualization**
+ 8. **Post-processing**.
+
+
+### Visualization
+
+Function ```visualize()``` will produce most of the necessary files for post-analysis of the data.
+
+See examples of the visualization tools below.
+
+
+<details closed><summary>The visualization tools include:</summary><p>
+
+- ```makeMarkerExpressionPlot()```: a heatmap that shows all markers and their expression levels in the clusters,
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_marker_expression.png?raw=true" width="1000"/>
+</p>
+
+- ```getIndividualGeneExpressionPlot()```: 2D layout colored by individual gene's expression
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD19_(B4_CVID3_CD19).png?raw=true" width="400"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD33_(SIGLEC-3_CD33_p67_SIGLEC3).png?raw=true" width="400"/>
+</p>
+
+- ```makeVotingResultsMatrixPlot()```: z-scores of the voting results for each input cell type and each cluster,
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_scores_matrix.png?raw=true" height="700"/>
+</p>
+
+- ```makeHistogramNullDistributionPlot()```: null distribution for each cluster and each cell type illustrating
+the "machinery" of the Digital Cell Sorter
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_null_distributions.png?raw=true" width="800"/>
+</p>
+
+- ```makeQualityControlHistogramPlot()```: Quality control histogram plots
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_number_of_genes_histogram.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_count_depth_histogram.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_fraction_of_mitochondrialGenes_histogram.png?raw=true" width="250"/>
+</p>
+
+- ```makeProjectionPlot()```: 2D layout colored by number of unique genes expressed,
+number of counts measured, and a faraction of mitochondrial genes..
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_number_of_genes.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_count_depth.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_fraction_of_mitochondrialGenes.png?raw=true" width="250"/>
+</p>
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_is_quality_cell.png?raw=true" width="500"/>
+</p>
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters.png?raw=true" width="375"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+Effect of batch correction demostrated on combining BM1, BM2, BM3 and processing the data jointly without (left) and with (right) batch correction option:
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_no_corr_clusters_by_patients.png?raw=true" width="375"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_with_corr_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+- ```makeStackedBarplot()```: plot with fractions of various cell types
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters_annotated.png?raw=true" width="500"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_subclustering_stacked_barplot_BM1.png?raw=true" height="500"/>
+</p>
+
+
+- ```makeSankeyDiagram()```: river plot to compare various results
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/Sankey_example.png?raw=true" width="800"/>
+</p>
+
+- ```getAnomalyScoresPlot()```: plot with anomaly scores per cell
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_All.png?raw=true" width="750"/>
+</p>
+
+Calculate and plot anomaly scores for an arbitrary cell type or cluster:
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_B_cells.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_T_cells.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_cluster_7.0.0.png?raw=true" width="250"/>
+</p>
+
+
+- ```getIndividualGeneTtestPlot()```: Produce heatmap plot of t-test p-Values calculated gene-pair-wise
+ from the annotated clusters
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_ttest_CD4_(CD4_CD4mut).png?raw=true" width="500"/>
+</p>
+
+
+- ```makePlotOfNewMarkers()```: genes significantly expressed in the annotated cell types
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_new_markers.png?raw=true" width="1000"/>
+</p>
+
+</p></details>
+
+
+## Demo
+
+### Usage
+
+We have made an example execution file ```demo.py``` that shows how to use ```DigitalCellSorter```.
+
+In the demo, folder ```data``` is intentionally left empty.
+The data file (cc95ff89-2e68-4a08-a234-480eca21ce79.homo_sapiens.mtx.zip) is about 2.4Gb in size and
+will be downloaded with the ```demo.py``` script.
+
+> Previously the HCA preview data was consolidated in file ```ica_bone_marrow_h5.h5``` and downloadable
+> from https://preview.data.humancellatlas.org/ (Raw Counts Matrix - Bone Marrow).
+> That file was ~485Mb and containing 378000 cells from 8 bone marrow donors (BM1-BM8).
+
+See details of the script ```demo.py``` at:
+
+> [Example walkthrough of demo.py script](https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/ "Examples")
+
+
+To execute the complete script ```demo.py``` run:
+
+ python demo.py
+
+*Note that the HCA BM1 data contains ~50000 sequenced cells, requiring more than 60Gb of RAM (we recommend to use High Performance Computers).
+If you want to run our example on a regular PC or a laptop, you can use a randomly chosen number of cells:
+
+ df_expr.sample(n=5000, axis=1)
+
+
+### Output
+
+All the output files are saved in ```output``` directory inside the directory where the ```demo.py``` script is.
+If you specify any other directory, the results will be generetaed in it.
+If you do not provide any directory the results will appear in the root where the script was executed.
+
+
+
+
+%package -n python3-DigitalCellSorter
+Summary: Toolkit for analysis and identification of cell types from heterogeneous single cell RNA-seq data
+Provides: python-DigitalCellSorter
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-DigitalCellSorter
+# Digital Cell Sorter
+
+[![DOI](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter.svg)](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter)
+[![DOI](https://badge.fury.io/py/DigitalCellSorter.svg)](https://pypi.org/project/DigitalCellSorter)
+[![DOI](https://readthedocs.org/projects/digital-cell-sorter/badge/?version=latest)](https://digital-cell-sorter.readthedocs.io/en/latest/?badge=latest)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2533377.svg)](https://doi.org/10.5281/zenodo.2533377)
+
+Digital Cell Sorter (DCS): a single cell RNA-seq analysis toolkit for clustering, cell type identification, and anomaly detection.
+
+> **Note:** We are currently preparing a manuscript describing the toolkit located this repository.
+> If you want to access the package detailed in our latest publication of Polled Digital Cell Sorter
+> go to https://zenodo.org/record/2603265 and download the package (v1.1).
+
+
+> **The latest publication describing the methodology of cell types identification:**
+> [Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters](
+> https://doi.org/10.1186/s12859-019-2951-x
+> "Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters")
+> Sergii Domanskyi, Anthony Szedlak, Nathaniel T Hawkins, Jiayin Wang, Giovanni Paternostro & Carlo Piermarocchi,
+> *BMC Bioinformatics* volume 20, Article number: 369 (**2019**)
+
+
+The documentation is available at https://digital-cell-sorter.readthedocs.io/.
+
+- [Getting Started](#getting-started)
+ * [Prerequisites](#prerequisites)
+ * [Loading the package](#loading-the-package)
+ * [Gene Expression Data Format](#gene-expression-data-format)
+ * [Other Data](#other-data)
+- [Functionality](#functionality)
+ * [Overall](#overall)
+ * [Visualization](#visualization)
+- [Demo](#demo)
+ * [Usage](#usage)
+ + [Main cell types](#main-cell-types)
+ + [Cell sub-types](#cell-sub-types)
+ * [Output](#output)
+
+## Getting Started
+
+These instructions will get you a copy of the project up and running on your machine for data analysis, development or testing purposes.
+
+### Prerequisites
+
+#### Environment setup
+The software runs in Python >= 3.7
+
+It is highly recommended to install Anaconda.
+Installers are available at https://www.anaconda.com/distribution/
+Whether you already had Anaconda installed or just installed it we recommend to
+update all packages by running:
+
+ conda update conda
+
+With conda, you can create, export, list, remove, and update environments that
+have different versions of Python and/or packages installed in them.
+Switching or moving between environments is called activating the environment.
+
+ conda create --name DCS
+ conda activate DCS
+
+Now, in your new environment, the packages can be installed or updated without affecting
+your other environments. Note, environments use is not necessary, and the
+default ```(base)``` is used if you dont set up any other. For more information see
+https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
+
+
+> Note: use of conda environments (for instance DCS exemplified above)
+> with a high performance computer
+> such as MSU HPCC in a batch job, i.e. not on a development node but
+> submitted to a SLURM queue, requires the following steps. In the slurm
+> script, before line calling your python script,
+> add ```conda deactivate``` to deactivate base environnment
+> and ```conda activate DCS```. After calling the script
+> do ```conda deactivate```. The example testing script is shown below.
+
+
+<details closed><summary>SLURM script example:</summary><p>
+
+ #!/bin/bash --login
+ ########## Define Resources Needed with SBATCH Lines ##########
+ #SBATCH --time=00:01:00 # limit of wall clock time - how long the job will run (same as -t)
+ #SBATCH --ntasks=1 # number of tasks - how many tasks (nodes) that you require (same as -n)
+ #SBATCH --cpus-per-task=1 # number of CPUs (or cores) per task (same as -c)
+ #SBATCH --mem=1G # memory required per node - amount of memory (in bytes)
+ ##SBATCH --job-name Name_of_Job # you can give your job a name for easier identification (same as -J)
+
+ ########## Command Lines to Run ##########
+ conda deactivate
+ conda activate DCS
+ cd ./ ### change to the directory where your code is located
+ python test.py ### call your executable
+ scontrol show job $SLURM_JOB_ID ### write job information to output file
+ conda deactivate
+
+where ```test.py``` is the python script where you import and use
+```DigitalCellSorter```.
+
+</p></details>
+
+
+#### Installation of the DigitalCellSorter package
+
+Install ```DigitalCellSorter``` with ```pip```. Most of the dependencies packages
+are automatically installed with installation of the latest release
+of ```DigitalCellSorter```:
+
+ pip install DigitalCellSorter
+
+Alternatively, you can clone and install this module directly from GitHub using:
+
+ pip install git+https://github.com/sdomanskyi/DigitalCellSorter
+
+Similarly, one can create a local copy of this project for development purposes, and
+install the package from the cloned directory:
+
+ git clone https://github.com/sdomanskyi/DigitalCellSorter
+ python setup.py install
+
+Our software uses packages ```numpy```, ```pandas```, ```matplotlib```,
+```scikit-learn```, ```scipy```, ```mygene```, ```fftw```,
+```fitsne```, ```adjustText``` and a few other standard Python packages.
+Some of the packages used in ```DigitalCellSorter``` are not installed by default,
+and should by installed by separately if using certain functionality with
+Digital Cell Sorter. For example, for network-based clustering
+install packages ```pynndescent```, ```networkx```, ```python-louvain```.
+Other packages that have to be installed separately are ```fitsne```, ```umap```,
+```phate``` and ```orca```. The detailed instructions are below.
+
+#### t-SNE
+With datasets containing less than 2000 cells ```sklearn.manifold.TSNE``` is used.
+For large datasets Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
+implemented by **KlugerLab** is used (https://github.com/KlugerLab/FIt-SNE).
+To use FIt-SNE the following need to be installed. First update ```cython``` by
+
+ pip install --upgrade cython
+
+Then install ```fftw``` from the ```conda-forge``` channel
+add ```conda-forge``` to your channels, and install ```fftw```:
+
+ conda config --add channels conda-forge
+ conda install fftw
+
+The next installation step is platform specific. To install FI-tSNE for Linux:
+
+ pip install fitsne
+
+On macOS Mojave C++ compiler has to be specified explicitly:
+
+ env CC=clang CXX=clang++ pip install fitsne
+
+On Windows the FI-tSNE wrapper and executable are already
+included with ```DigitalCellSorter```.
+
+#### Other layouts
+
+To use UMAP layout
+
+ pip install umap-learn
+
+To use PHATE
+
+ pip install phate
+
+> Note, if neither ```fitsne```, ```umap``` nor ```phate``` are installed
+> ```DigitalCellSorter``` defaults to PCA two largest principal components for
+> visualization layout.
+
+#### Interactive HTML figures
+To use Sankey diagrams that are part of Digital Cell Sorter
+install ```plotly``` and ```orca```:
+
+ conda install -c plotly plotly-orca
+ conda install -c anaconda psutil
+
+See
+[interactive Hopfield landscape figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/dataName_energy_landscape_PC1_vs_PC0.html "Hopfield attractors figure")
+and
+[interactive Sankey diagram figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/PanglaoDB_Sankey_SRS3296611.html "Sankey diagram of cell annotation")
+in a browser.
+
+```orca ``` is necessary to convert Sankey diagrams to static images.
+If for any reason ```orca``` is unavailable the Sankey diagrams will be saved as
+ineractive HTML figure, that can be opened in a browser (Chrome, Firefox etc.) and
+saved as static image. The visualization of ```DigitalCellSorter``` are implemented
+with ```matplotlib```, allowing all the figures to be saved in either raster or
+vactor format. Since ```plotly``` can convert simple ```matplotlib``` figures
+(scatter, line, bar plots, but not heatmaps, splines or other complex patch objects) to
+ineractive HTML format ```DigitalCellSorter``` can attempt to save any of its figures
+as HTML. This is particulatly useful with ```Projection``` plots, even though the color
+bars are not rendered in HTML figures.
+
+### Loading the package
+
+Use the latest release of PyPI package.
+
+For a quick-start demo with a dataset of ~5k PBMCs execute
+in the terminal and follow prompts:
+
+ python -m DigitalCellSorter
+
+The second, more detailed demonstration analysis with step-by-step
+explanation is discussed here
+and in the demo section at the end of this README.
+In your script import the package:
+
+ import DigitalCellSorter
+
+Create an instance of class ```DigitalCellSorter```. Here, for simplicity, we use Default parameter values:
+
+ DCS = DigitalCellSorter.DigitalCellSorter()
+
+During the initialization a number of parameters can be specified. For detailed list see documentation.
+Many of these parameters are transfered to DCS attributes thus can be modified after initialization using, e.g.:
+
+ DCS.toggleMakeStackedBarplot = False
+
+
+
+### Gene Expression Data Format
+
+The input gene expression data is expected in one of the following formats:
+
+1. Spreadsheet of comma-separated values ```csv``` containing condensed matrix in a form ```('cell', 'gene', 'expr')```.
+If there are batches in the data the matrix has to be of the form ```('batch', 'cell', 'gene', 'expr')```. Columns order can be arbitrary.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell | gene | expr |
+|------|------|------|
+| C1 | G1 | 3 |
+| C1 | G2 | 2 |
+| C1 | G3 | 1 |
+| C2 | G1 | 1 |
+| C2 | G4 | 5 |
+| ... | ... | ... |
+
+or:
+
+| batch | cell | gene | expr |
+|--------|------|------|------|
+| batch0 | C1 | G1 | 3 |
+| batch0 | C1 | G2 | 2 |
+| batch0 | C1 | G3 | 1 |
+| batch1 | C2 | G1 | 1 |
+| batch1 | C2 | G4 | 5 |
+| ... | ... | ... | ... |
+
+</p></details>
+
+
+2. Spreadsheet of comma-separated values ```csv``` where rows are genes, columns are cells with gene expression counts.
+If there are batches in the data the spreadsheet the first row should be ```'batch'``` and the second ```'cell'```.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell | C1 | C2 | C3 | C4 |
+|-------|--------|--------|--------|--------|
+| G1 | | 3 | 1 | 7 |
+| G2 | 2 | 2 | | 2 |
+| G3 | 3 | 1 | | 5 |
+| G4 | 10 | | 5 | 4 |
+| ... | ... | ... | ... | ... |
+
+or:
+
+| batch | batch0 | batch0 | batch1 | batch1 |
+|-------|--------|--------|--------|--------|
+| cell | C1 | C2 | C3 | C4 |
+| G1 | | 3 | 1 | 7 |
+| G2 | 2 | 2 | | 2 |
+| G3 | 3 | 1 | | 5 |
+| G4 | 10 | | 5 | 4 |
+| ... | ... | ... | ... | ... |
+
+</p></details>
+
+3. ```Pandas DataFrame``` where ```axis 0``` is genes and ```axis 1``` are cells.
+If the are batched in the data then the index of ```axis 1``` should have two levels, e.g. ```('batch', 'cell')```,
+with the first level indicating patient, batch or expreriment where that cell was sequenced, and the
+second level containing cell barcodes for identification.
+
+<details closed><summary>Examples:</summary><p>
+
+ df = pd.DataFrame(data=[[2,np.nan],[3,8],[3,5],[np.nan,1]],
+ index=['G1','G2','G3','G4'],
+ columns=pd.MultiIndex.from_arrays([['batch0','batch1'],['C1','C2']], names=['batch', 'cell']))
+
+
+</p></details>
+
+4. ```Pandas Series ``` where index should have two levels, e.g. ```('cell', 'gene')```. If there are batched in the data
+the first level should be indicating patient, batch or expreriment where that cell was sequenced, the second level cell barcodes for
+identification and the third level gene names.
+
+<details closed><summary>Examples:</summary><p>
+
+ se = pd.Series(data=[1,8,3,5,5],
+ index=pd.MultiIndex.from_arrays([['batch0','batch0','batch1','batch1','batch1'],
+ ['C1','C1','C1','C2','C2'],
+ ['G1','G2','G3','G1','G4']], names=['batch', 'cell', 'gene']))
+
+
+</p></details>
+
+Any of the data types outlined above need to be prepared/validated with a function ```prepare()```.
+Let us demonstrate this on the input of type 1:
+
+ df_expr = DCS.prepare('data/testData/dataFileCondensedWithBatches.tsv')
+
+### Other Data
+
+```markersDCS.xlsx```: An excel book with marker data. Rows are markers and columns are cell types.
+'1' means that the gene is a marker for that cell type, '-1' means that this gene is not expressed in this cell type, and '0' otherwise.
+This gene marker file included in the package is used by Default.
+If you use your own file it has to be prepared in the same format (including the two-line header). Note that only the first worksheet will be read,
+and its name can be arbitrary. The first column should contain gene names. The second row should contain cell types, and the first row how
+those cell types are grouped. If any of the cell types need to be skipped, have "NA" in the corresponding cell of the first row of that cell type.
+
+<details closed><summary>Example:</summary><p>
+
+|A |B |C |D |E |F |G |H |I |J |K |L |M |... |
+|--------|-------------|--------------|------------|-----------|-----------------|--------------------------|----------------------------|-------------------------|--------------------------|-------------------|----------------|------------------|---------|
+| |B cells |B cells |B cells |T cells |T cells |T cells |T cells |T cells |T cells |T cells |NK cells |NK cells |... |
+|Marker |B cells naive|B cells memory|Plasma cells|T cells CD8|T cells CD4 naive|T cells CD4 memory resting|T cells CD4 memory activated|T cells follicular helper|T cells regulatory (Tregs)|T cells gamma delta|NK cells resting|NK cells activated|... |
+|ABCB4 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ABCB9 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ACAP1 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |... |
+|ACHE |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ACP5 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADAM28 |1 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADAMDEC1|0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADAMTS3 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADRB2 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|AIF1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|AIM2 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ALOX15 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ALOX5 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|AMPD1 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ANGPT4 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|... |... |... |... |... |... |... |... |... |... |... |... |... |... |
+
+</p></details>
+
+```Human.MitoCarta2.0.csv```: An ```csv``` spreadsheet with human mitochondrial genes, created within work
+[MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins](https://doi.org/10.1093/nar/gkv1003 "MitoCarta2.0")
+Sarah E. Calvo, Karl R. Clauser, Vamsi K. Mootha, *Nucleic Acids Research*, Volume 44, Issue D1, 4 January 2016.
+
+
+## Functionality
+
+### Overall
+
+The main class, DigitalCellSorter, includes tools for:
+
+ 1. **Pre-preprocessing**
+ 2. **Quality control**
+ 3. **Batch effects correction**
+ 4. **Cells anomaly score evaluation**
+ 4. **Dimensionality reduction**
+ 5. **Clustering**
+ 6. **Annotating cell types**
+ 7. **Vizualization**
+ 8. **Post-processing**.
+
+
+### Visualization
+
+Function ```visualize()``` will produce most of the necessary files for post-analysis of the data.
+
+See examples of the visualization tools below.
+
+
+<details closed><summary>The visualization tools include:</summary><p>
+
+- ```makeMarkerExpressionPlot()```: a heatmap that shows all markers and their expression levels in the clusters,
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_marker_expression.png?raw=true" width="1000"/>
+</p>
+
+- ```getIndividualGeneExpressionPlot()```: 2D layout colored by individual gene's expression
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD19_(B4_CVID3_CD19).png?raw=true" width="400"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD33_(SIGLEC-3_CD33_p67_SIGLEC3).png?raw=true" width="400"/>
+</p>
+
+- ```makeVotingResultsMatrixPlot()```: z-scores of the voting results for each input cell type and each cluster,
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_scores_matrix.png?raw=true" height="700"/>
+</p>
+
+- ```makeHistogramNullDistributionPlot()```: null distribution for each cluster and each cell type illustrating
+the "machinery" of the Digital Cell Sorter
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_null_distributions.png?raw=true" width="800"/>
+</p>
+
+- ```makeQualityControlHistogramPlot()```: Quality control histogram plots
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_number_of_genes_histogram.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_count_depth_histogram.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_fraction_of_mitochondrialGenes_histogram.png?raw=true" width="250"/>
+</p>
+
+- ```makeProjectionPlot()```: 2D layout colored by number of unique genes expressed,
+number of counts measured, and a faraction of mitochondrial genes..
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_number_of_genes.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_count_depth.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_fraction_of_mitochondrialGenes.png?raw=true" width="250"/>
+</p>
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_is_quality_cell.png?raw=true" width="500"/>
+</p>
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters.png?raw=true" width="375"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+Effect of batch correction demostrated on combining BM1, BM2, BM3 and processing the data jointly without (left) and with (right) batch correction option:
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_no_corr_clusters_by_patients.png?raw=true" width="375"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_with_corr_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+- ```makeStackedBarplot()```: plot with fractions of various cell types
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters_annotated.png?raw=true" width="500"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_subclustering_stacked_barplot_BM1.png?raw=true" height="500"/>
+</p>
+
+
+- ```makeSankeyDiagram()```: river plot to compare various results
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/Sankey_example.png?raw=true" width="800"/>
+</p>
+
+- ```getAnomalyScoresPlot()```: plot with anomaly scores per cell
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_All.png?raw=true" width="750"/>
+</p>
+
+Calculate and plot anomaly scores for an arbitrary cell type or cluster:
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_B_cells.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_T_cells.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_cluster_7.0.0.png?raw=true" width="250"/>
+</p>
+
+
+- ```getIndividualGeneTtestPlot()```: Produce heatmap plot of t-test p-Values calculated gene-pair-wise
+ from the annotated clusters
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_ttest_CD4_(CD4_CD4mut).png?raw=true" width="500"/>
+</p>
+
+
+- ```makePlotOfNewMarkers()```: genes significantly expressed in the annotated cell types
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_new_markers.png?raw=true" width="1000"/>
+</p>
+
+</p></details>
+
+
+## Demo
+
+### Usage
+
+We have made an example execution file ```demo.py``` that shows how to use ```DigitalCellSorter```.
+
+In the demo, folder ```data``` is intentionally left empty.
+The data file (cc95ff89-2e68-4a08-a234-480eca21ce79.homo_sapiens.mtx.zip) is about 2.4Gb in size and
+will be downloaded with the ```demo.py``` script.
+
+> Previously the HCA preview data was consolidated in file ```ica_bone_marrow_h5.h5``` and downloadable
+> from https://preview.data.humancellatlas.org/ (Raw Counts Matrix - Bone Marrow).
+> That file was ~485Mb and containing 378000 cells from 8 bone marrow donors (BM1-BM8).
+
+See details of the script ```demo.py``` at:
+
+> [Example walkthrough of demo.py script](https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/ "Examples")
+
+
+To execute the complete script ```demo.py``` run:
+
+ python demo.py
+
+*Note that the HCA BM1 data contains ~50000 sequenced cells, requiring more than 60Gb of RAM (we recommend to use High Performance Computers).
+If you want to run our example on a regular PC or a laptop, you can use a randomly chosen number of cells:
+
+ df_expr.sample(n=5000, axis=1)
+
+
+### Output
+
+All the output files are saved in ```output``` directory inside the directory where the ```demo.py``` script is.
+If you specify any other directory, the results will be generetaed in it.
+If you do not provide any directory the results will appear in the root where the script was executed.
+
+
+
+
+%package help
+Summary: Development documents and examples for DigitalCellSorter
+Provides: python3-DigitalCellSorter-doc
+%description help
+# Digital Cell Sorter
+
+[![DOI](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter.svg)](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter)
+[![DOI](https://badge.fury.io/py/DigitalCellSorter.svg)](https://pypi.org/project/DigitalCellSorter)
+[![DOI](https://readthedocs.org/projects/digital-cell-sorter/badge/?version=latest)](https://digital-cell-sorter.readthedocs.io/en/latest/?badge=latest)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2533377.svg)](https://doi.org/10.5281/zenodo.2533377)
+
+Digital Cell Sorter (DCS): a single cell RNA-seq analysis toolkit for clustering, cell type identification, and anomaly detection.
+
+> **Note:** We are currently preparing a manuscript describing the toolkit located this repository.
+> If you want to access the package detailed in our latest publication of Polled Digital Cell Sorter
+> go to https://zenodo.org/record/2603265 and download the package (v1.1).
+
+
+> **The latest publication describing the methodology of cell types identification:**
+> [Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters](
+> https://doi.org/10.1186/s12859-019-2951-x
+> "Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters")
+> Sergii Domanskyi, Anthony Szedlak, Nathaniel T Hawkins, Jiayin Wang, Giovanni Paternostro & Carlo Piermarocchi,
+> *BMC Bioinformatics* volume 20, Article number: 369 (**2019**)
+
+
+The documentation is available at https://digital-cell-sorter.readthedocs.io/.
+
+- [Getting Started](#getting-started)
+ * [Prerequisites](#prerequisites)
+ * [Loading the package](#loading-the-package)
+ * [Gene Expression Data Format](#gene-expression-data-format)
+ * [Other Data](#other-data)
+- [Functionality](#functionality)
+ * [Overall](#overall)
+ * [Visualization](#visualization)
+- [Demo](#demo)
+ * [Usage](#usage)
+ + [Main cell types](#main-cell-types)
+ + [Cell sub-types](#cell-sub-types)
+ * [Output](#output)
+
+## Getting Started
+
+These instructions will get you a copy of the project up and running on your machine for data analysis, development or testing purposes.
+
+### Prerequisites
+
+#### Environment setup
+The software runs in Python >= 3.7
+
+It is highly recommended to install Anaconda.
+Installers are available at https://www.anaconda.com/distribution/
+Whether you already had Anaconda installed or just installed it we recommend to
+update all packages by running:
+
+ conda update conda
+
+With conda, you can create, export, list, remove, and update environments that
+have different versions of Python and/or packages installed in them.
+Switching or moving between environments is called activating the environment.
+
+ conda create --name DCS
+ conda activate DCS
+
+Now, in your new environment, the packages can be installed or updated without affecting
+your other environments. Note, environments use is not necessary, and the
+default ```(base)``` is used if you dont set up any other. For more information see
+https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
+
+
+> Note: use of conda environments (for instance DCS exemplified above)
+> with a high performance computer
+> such as MSU HPCC in a batch job, i.e. not on a development node but
+> submitted to a SLURM queue, requires the following steps. In the slurm
+> script, before line calling your python script,
+> add ```conda deactivate``` to deactivate base environnment
+> and ```conda activate DCS```. After calling the script
+> do ```conda deactivate```. The example testing script is shown below.
+
+
+<details closed><summary>SLURM script example:</summary><p>
+
+ #!/bin/bash --login
+ ########## Define Resources Needed with SBATCH Lines ##########
+ #SBATCH --time=00:01:00 # limit of wall clock time - how long the job will run (same as -t)
+ #SBATCH --ntasks=1 # number of tasks - how many tasks (nodes) that you require (same as -n)
+ #SBATCH --cpus-per-task=1 # number of CPUs (or cores) per task (same as -c)
+ #SBATCH --mem=1G # memory required per node - amount of memory (in bytes)
+ ##SBATCH --job-name Name_of_Job # you can give your job a name for easier identification (same as -J)
+
+ ########## Command Lines to Run ##########
+ conda deactivate
+ conda activate DCS
+ cd ./ ### change to the directory where your code is located
+ python test.py ### call your executable
+ scontrol show job $SLURM_JOB_ID ### write job information to output file
+ conda deactivate
+
+where ```test.py``` is the python script where you import and use
+```DigitalCellSorter```.
+
+</p></details>
+
+
+#### Installation of the DigitalCellSorter package
+
+Install ```DigitalCellSorter``` with ```pip```. Most of the dependencies packages
+are automatically installed with installation of the latest release
+of ```DigitalCellSorter```:
+
+ pip install DigitalCellSorter
+
+Alternatively, you can clone and install this module directly from GitHub using:
+
+ pip install git+https://github.com/sdomanskyi/DigitalCellSorter
+
+Similarly, one can create a local copy of this project for development purposes, and
+install the package from the cloned directory:
+
+ git clone https://github.com/sdomanskyi/DigitalCellSorter
+ python setup.py install
+
+Our software uses packages ```numpy```, ```pandas```, ```matplotlib```,
+```scikit-learn```, ```scipy```, ```mygene```, ```fftw```,
+```fitsne```, ```adjustText``` and a few other standard Python packages.
+Some of the packages used in ```DigitalCellSorter``` are not installed by default,
+and should by installed by separately if using certain functionality with
+Digital Cell Sorter. For example, for network-based clustering
+install packages ```pynndescent```, ```networkx```, ```python-louvain```.
+Other packages that have to be installed separately are ```fitsne```, ```umap```,
+```phate``` and ```orca```. The detailed instructions are below.
+
+#### t-SNE
+With datasets containing less than 2000 cells ```sklearn.manifold.TSNE``` is used.
+For large datasets Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
+implemented by **KlugerLab** is used (https://github.com/KlugerLab/FIt-SNE).
+To use FIt-SNE the following need to be installed. First update ```cython``` by
+
+ pip install --upgrade cython
+
+Then install ```fftw``` from the ```conda-forge``` channel
+add ```conda-forge``` to your channels, and install ```fftw```:
+
+ conda config --add channels conda-forge
+ conda install fftw
+
+The next installation step is platform specific. To install FI-tSNE for Linux:
+
+ pip install fitsne
+
+On macOS Mojave C++ compiler has to be specified explicitly:
+
+ env CC=clang CXX=clang++ pip install fitsne
+
+On Windows the FI-tSNE wrapper and executable are already
+included with ```DigitalCellSorter```.
+
+#### Other layouts
+
+To use UMAP layout
+
+ pip install umap-learn
+
+To use PHATE
+
+ pip install phate
+
+> Note, if neither ```fitsne```, ```umap``` nor ```phate``` are installed
+> ```DigitalCellSorter``` defaults to PCA two largest principal components for
+> visualization layout.
+
+#### Interactive HTML figures
+To use Sankey diagrams that are part of Digital Cell Sorter
+install ```plotly``` and ```orca```:
+
+ conda install -c plotly plotly-orca
+ conda install -c anaconda psutil
+
+See
+[interactive Hopfield landscape figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/dataName_energy_landscape_PC1_vs_PC0.html "Hopfield attractors figure")
+and
+[interactive Sankey diagram figure](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/PanglaoDB_Sankey_SRS3296611.html "Sankey diagram of cell annotation")
+in a browser.
+
+```orca ``` is necessary to convert Sankey diagrams to static images.
+If for any reason ```orca``` is unavailable the Sankey diagrams will be saved as
+ineractive HTML figure, that can be opened in a browser (Chrome, Firefox etc.) and
+saved as static image. The visualization of ```DigitalCellSorter``` are implemented
+with ```matplotlib```, allowing all the figures to be saved in either raster or
+vactor format. Since ```plotly``` can convert simple ```matplotlib``` figures
+(scatter, line, bar plots, but not heatmaps, splines or other complex patch objects) to
+ineractive HTML format ```DigitalCellSorter``` can attempt to save any of its figures
+as HTML. This is particulatly useful with ```Projection``` plots, even though the color
+bars are not rendered in HTML figures.
+
+### Loading the package
+
+Use the latest release of PyPI package.
+
+For a quick-start demo with a dataset of ~5k PBMCs execute
+in the terminal and follow prompts:
+
+ python -m DigitalCellSorter
+
+The second, more detailed demonstration analysis with step-by-step
+explanation is discussed here
+and in the demo section at the end of this README.
+In your script import the package:
+
+ import DigitalCellSorter
+
+Create an instance of class ```DigitalCellSorter```. Here, for simplicity, we use Default parameter values:
+
+ DCS = DigitalCellSorter.DigitalCellSorter()
+
+During the initialization a number of parameters can be specified. For detailed list see documentation.
+Many of these parameters are transfered to DCS attributes thus can be modified after initialization using, e.g.:
+
+ DCS.toggleMakeStackedBarplot = False
+
+
+
+### Gene Expression Data Format
+
+The input gene expression data is expected in one of the following formats:
+
+1. Spreadsheet of comma-separated values ```csv``` containing condensed matrix in a form ```('cell', 'gene', 'expr')```.
+If there are batches in the data the matrix has to be of the form ```('batch', 'cell', 'gene', 'expr')```. Columns order can be arbitrary.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell | gene | expr |
+|------|------|------|
+| C1 | G1 | 3 |
+| C1 | G2 | 2 |
+| C1 | G3 | 1 |
+| C2 | G1 | 1 |
+| C2 | G4 | 5 |
+| ... | ... | ... |
+
+or:
+
+| batch | cell | gene | expr |
+|--------|------|------|------|
+| batch0 | C1 | G1 | 3 |
+| batch0 | C1 | G2 | 2 |
+| batch0 | C1 | G3 | 1 |
+| batch1 | C2 | G1 | 1 |
+| batch1 | C2 | G4 | 5 |
+| ... | ... | ... | ... |
+
+</p></details>
+
+
+2. Spreadsheet of comma-separated values ```csv``` where rows are genes, columns are cells with gene expression counts.
+If there are batches in the data the spreadsheet the first row should be ```'batch'``` and the second ```'cell'```.
+
+<details closed><summary>Examples:</summary><p>
+
+| cell | C1 | C2 | C3 | C4 |
+|-------|--------|--------|--------|--------|
+| G1 | | 3 | 1 | 7 |
+| G2 | 2 | 2 | | 2 |
+| G3 | 3 | 1 | | 5 |
+| G4 | 10 | | 5 | 4 |
+| ... | ... | ... | ... | ... |
+
+or:
+
+| batch | batch0 | batch0 | batch1 | batch1 |
+|-------|--------|--------|--------|--------|
+| cell | C1 | C2 | C3 | C4 |
+| G1 | | 3 | 1 | 7 |
+| G2 | 2 | 2 | | 2 |
+| G3 | 3 | 1 | | 5 |
+| G4 | 10 | | 5 | 4 |
+| ... | ... | ... | ... | ... |
+
+</p></details>
+
+3. ```Pandas DataFrame``` where ```axis 0``` is genes and ```axis 1``` are cells.
+If the are batched in the data then the index of ```axis 1``` should have two levels, e.g. ```('batch', 'cell')```,
+with the first level indicating patient, batch or expreriment where that cell was sequenced, and the
+second level containing cell barcodes for identification.
+
+<details closed><summary>Examples:</summary><p>
+
+ df = pd.DataFrame(data=[[2,np.nan],[3,8],[3,5],[np.nan,1]],
+ index=['G1','G2','G3','G4'],
+ columns=pd.MultiIndex.from_arrays([['batch0','batch1'],['C1','C2']], names=['batch', 'cell']))
+
+
+</p></details>
+
+4. ```Pandas Series ``` where index should have two levels, e.g. ```('cell', 'gene')```. If there are batched in the data
+the first level should be indicating patient, batch or expreriment where that cell was sequenced, the second level cell barcodes for
+identification and the third level gene names.
+
+<details closed><summary>Examples:</summary><p>
+
+ se = pd.Series(data=[1,8,3,5,5],
+ index=pd.MultiIndex.from_arrays([['batch0','batch0','batch1','batch1','batch1'],
+ ['C1','C1','C1','C2','C2'],
+ ['G1','G2','G3','G1','G4']], names=['batch', 'cell', 'gene']))
+
+
+</p></details>
+
+Any of the data types outlined above need to be prepared/validated with a function ```prepare()```.
+Let us demonstrate this on the input of type 1:
+
+ df_expr = DCS.prepare('data/testData/dataFileCondensedWithBatches.tsv')
+
+### Other Data
+
+```markersDCS.xlsx```: An excel book with marker data. Rows are markers and columns are cell types.
+'1' means that the gene is a marker for that cell type, '-1' means that this gene is not expressed in this cell type, and '0' otherwise.
+This gene marker file included in the package is used by Default.
+If you use your own file it has to be prepared in the same format (including the two-line header). Note that only the first worksheet will be read,
+and its name can be arbitrary. The first column should contain gene names. The second row should contain cell types, and the first row how
+those cell types are grouped. If any of the cell types need to be skipped, have "NA" in the corresponding cell of the first row of that cell type.
+
+<details closed><summary>Example:</summary><p>
+
+|A |B |C |D |E |F |G |H |I |J |K |L |M |... |
+|--------|-------------|--------------|------------|-----------|-----------------|--------------------------|----------------------------|-------------------------|--------------------------|-------------------|----------------|------------------|---------|
+| |B cells |B cells |B cells |T cells |T cells |T cells |T cells |T cells |T cells |T cells |NK cells |NK cells |... |
+|Marker |B cells naive|B cells memory|Plasma cells|T cells CD8|T cells CD4 naive|T cells CD4 memory resting|T cells CD4 memory activated|T cells follicular helper|T cells regulatory (Tregs)|T cells gamma delta|NK cells resting|NK cells activated|... |
+|ABCB4 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ABCB9 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ACAP1 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |... |
+|ACHE |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ACP5 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADAM28 |1 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADAMDEC1|0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADAMTS3 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ADRB2 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|AIF1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|AIM2 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ALOX15 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ALOX5 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|AMPD1 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|ANGPT4 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 |0 |... |
+|... |... |... |... |... |... |... |... |... |... |... |... |... |... |
+
+</p></details>
+
+```Human.MitoCarta2.0.csv```: An ```csv``` spreadsheet with human mitochondrial genes, created within work
+[MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins](https://doi.org/10.1093/nar/gkv1003 "MitoCarta2.0")
+Sarah E. Calvo, Karl R. Clauser, Vamsi K. Mootha, *Nucleic Acids Research*, Volume 44, Issue D1, 4 January 2016.
+
+
+## Functionality
+
+### Overall
+
+The main class, DigitalCellSorter, includes tools for:
+
+ 1. **Pre-preprocessing**
+ 2. **Quality control**
+ 3. **Batch effects correction**
+ 4. **Cells anomaly score evaluation**
+ 4. **Dimensionality reduction**
+ 5. **Clustering**
+ 6. **Annotating cell types**
+ 7. **Vizualization**
+ 8. **Post-processing**.
+
+
+### Visualization
+
+Function ```visualize()``` will produce most of the necessary files for post-analysis of the data.
+
+See examples of the visualization tools below.
+
+
+<details closed><summary>The visualization tools include:</summary><p>
+
+- ```makeMarkerExpressionPlot()```: a heatmap that shows all markers and their expression levels in the clusters,
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_marker_expression.png?raw=true" width="1000"/>
+</p>
+
+- ```getIndividualGeneExpressionPlot()```: 2D layout colored by individual gene's expression
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD19_(B4_CVID3_CD19).png?raw=true" width="400"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD33_(SIGLEC-3_CD33_p67_SIGLEC3).png?raw=true" width="400"/>
+</p>
+
+- ```makeVotingResultsMatrixPlot()```: z-scores of the voting results for each input cell type and each cluster,
+in addition this figure contains relative (%) and absolute (cell counts) cluster sizes
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_scores_matrix.png?raw=true" height="700"/>
+</p>
+
+- ```makeHistogramNullDistributionPlot()```: null distribution for each cluster and each cell type illustrating
+the "machinery" of the Digital Cell Sorter
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_null_distributions.png?raw=true" width="800"/>
+</p>
+
+- ```makeQualityControlHistogramPlot()```: Quality control histogram plots
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_number_of_genes_histogram.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_count_depth_histogram.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_fraction_of_mitochondrialGenes_histogram.png?raw=true" width="250"/>
+</p>
+
+- ```makeProjectionPlot()```: 2D layout colored by number of unique genes expressed,
+number of counts measured, and a faraction of mitochondrial genes..
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_number_of_genes.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_count_depth.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_fraction_of_mitochondrialGenes.png?raw=true" width="250"/>
+</p>
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_is_quality_cell.png?raw=true" width="500"/>
+</p>
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters.png?raw=true" width="375"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+Effect of batch correction demostrated on combining BM1, BM2, BM3 and processing the data jointly without (left) and with (right) batch correction option:
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_no_corr_clusters_by_patients.png?raw=true" width="375"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_with_corr_clusters_by_patients.png?raw=true" width="375"/>
+</p>
+
+- ```makeStackedBarplot()```: plot with fractions of various cell types
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters_annotated.png?raw=true" width="500"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_subclustering_stacked_barplot_BM1.png?raw=true" height="500"/>
+</p>
+
+
+- ```makeSankeyDiagram()```: river plot to compare various results
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/Sankey_example.png?raw=true" width="800"/>
+</p>
+
+- ```getAnomalyScoresPlot()```: plot with anomaly scores per cell
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_All.png?raw=true" width="750"/>
+</p>
+
+Calculate and plot anomaly scores for an arbitrary cell type or cluster:
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_B_cells.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_T_cells.png?raw=true" width="250"/>
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score_cluster_7.0.0.png?raw=true" width="250"/>
+</p>
+
+
+- ```getIndividualGeneTtestPlot()```: Produce heatmap plot of t-test p-Values calculated gene-pair-wise
+ from the annotated clusters
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_ttest_CD4_(CD4_CD4mut).png?raw=true" width="500"/>
+</p>
+
+
+- ```makePlotOfNewMarkers()```: genes significantly expressed in the annotated cell types
+
+<p align="middle">
+ <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_new_markers.png?raw=true" width="1000"/>
+</p>
+
+</p></details>
+
+
+## Demo
+
+### Usage
+
+We have made an example execution file ```demo.py``` that shows how to use ```DigitalCellSorter```.
+
+In the demo, folder ```data``` is intentionally left empty.
+The data file (cc95ff89-2e68-4a08-a234-480eca21ce79.homo_sapiens.mtx.zip) is about 2.4Gb in size and
+will be downloaded with the ```demo.py``` script.
+
+> Previously the HCA preview data was consolidated in file ```ica_bone_marrow_h5.h5``` and downloadable
+> from https://preview.data.humancellatlas.org/ (Raw Counts Matrix - Bone Marrow).
+> That file was ~485Mb and containing 378000 cells from 8 bone marrow donors (BM1-BM8).
+
+See details of the script ```demo.py``` at:
+
+> [Example walkthrough of demo.py script](https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/ "Examples")
+
+
+To execute the complete script ```demo.py``` run:
+
+ python demo.py
+
+*Note that the HCA BM1 data contains ~50000 sequenced cells, requiring more than 60Gb of RAM (we recommend to use High Performance Computers).
+If you want to run our example on a regular PC or a laptop, you can use a randomly chosen number of cells:
+
+ df_expr.sample(n=5000, axis=1)
+
+
+### Output
+
+All the output files are saved in ```output``` directory inside the directory where the ```demo.py``` script is.
+If you specify any other directory, the results will be generetaed in it.
+If you do not provide any directory the results will appear in the root where the script was executed.
+
+
+
+
+%prep
+%autosetup -n DigitalCellSorter-1.3.7.6
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-DigitalCellSorter -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 1.3.7.6-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..2ec9777
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+5603baa4e95acbbd191303f9b914693b DigitalCellSorter-1.3.7.6.tar.gz