summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-18 04:28:55 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-18 04:28:55 +0000
commit6020730c9dacc1557dc86bc8c13f462ec27ca581 (patch)
tree5ad5b649cd1cd0b23bd453071764b3e84314824a
parenta75dfaa3b165e70982801db1354bf2637fc17932 (diff)
automatic import of python-cctyper
-rw-r--r--.gitignore1
-rw-r--r--python-cctyper.spec1143
-rw-r--r--sources1
3 files changed, 1145 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..c678f22 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/cctyper-1.8.0.tar.gz
diff --git a/python-cctyper.spec b/python-cctyper.spec
new file mode 100644
index 0000000..cfc0ed8
--- /dev/null
+++ b/python-cctyper.spec
@@ -0,0 +1,1143 @@
+%global _empty_manifest_terminate_build 0
+Name: python-cctyper
+Version: 1.8.0
+Release: 1
+Summary: CRISPRCasTyper: Automatic detection and subtyping of CRISPR-Cas operons
+License: MIT License
+URL: https://github.com/Russel88/CRISPRCasTyper
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/f5/be/0fcbfaab346c40da7987239ad432c3df0ee124a0494f48182ca44c3737dd/cctyper-1.8.0.tar.gz
+BuildArch: noarch
+
+
+%description
+[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
+[![Conda](https://anaconda.org/russel88/cctyper/badges/installer/conda.svg)](https://anaconda.org/russel88/cctyper)
+
+# CRISPRCasTyper
+
+Detect CRISPR-Cas genes and arrays, and predict the subtype based on both Cas genes and CRISPR repeat sequence.
+
+[CRISPRCasTyper and RepeatType are also available through a webserver](https://crisprcastyper.crispr.dk)
+
+This software finds Cas genes with a large suite of HMMs, then groups these HMMs into operons, and predicts the subtype of the operons based on a scoring scheme.
+Furthermore, it finds CRISPR arrays with [minced](https://github.com/ctSkennerton/minced) and by BLASTing a large suite of known repeats, and using a kmer-based machine learning approach (extreme gradient boosting trees) it predicts the subtype of the CRISPR arrays based on the consensus repeat.
+It then connects the Cas operons and CRISPR arrays, producing as output:
+* CRISPR-Cas loci, with consensus subtype prediction based on both Cas genes (mostly) and CRISPR consensus repeats
+* Orphan Cas operons, and their predicted subtype
+* Orphan CRISPR arrays, and their predicted associated subtype
+
+#### It includes the following 46 subtypes/variants [(find typing scheme here)](https://typer.crispr.dk/#/typing):
+* I-A, I-B, I-C, I-D, I-E, I-F, I-F (transposon), I-G, II-A, II-B, II-C, III-A, III-B, III-C, III-D, III-E, III-F, IV-A1, IV-A2, IV-A3, IV-B, IV-C, IV-D, IV-E, V-A, V-B1, V-B2, V-C, V-D, V-E, V-F1, V-F2, V-F3, V-F (the rest), V-G, V-H, V-I, V-J, V-K, V-L, VI-A, VI-B1, VI-B2, VI-C, VI-D, VI-X, VI-Y.
+
+* All subtypes from the most recent Nature Reviews Microbiology (Makarova et al. 2020): [Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants](https://doi.org/10.1038/s41579-019-0299-x)
+* Updated type IV subtypes and variants based on: [Type IV CRISPR–Cas systems are highly diverse and involved in competition between plasmids](https://doi.org/10.1093/nar/gkz1197)
+* Type V-K: [RNA-guided DNA insertion with CRISPR-associated transposases](https://doi.org/10.1126/science.aax9181)
+* Transposon associated type I-F: [Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration](https://doi.org/10.1038/s41586-019-1323-z)
+* New V-A variants: [Novel Type V-A CRISPR Effectors Are Active Nucleases with Expanded Targeting Capabilities](https://doi.org/10.1089/crispr.2020.0043)
+* New Cas13s: [Programmable RNA editing with compact CRISPR–Cas13 systems from uncultivated microbes](https://doi.org/10.1038/s41592-021-01124-4)
+* V-L (cas12l): [A new family of CRISPR-type V nucleases with C-rich PAM recognition](https://doi.org/10.15252/embr.202255481)
+
+#### It can automatically draw gene maps of CRISPR-Cas systems and orphan Cas operons and CRISPR arrays
+##### in vector graphics format for direct use in scientific manuscripts
+<img src='img/plot.svg' align="left" height="200" />
+
+#### Citation
+[Jakob Russel, Rafael Pinilla-Redondo, David Mayo-Muñoz, Shiraz A. Shah, Søren J. Sørensen - CRISPRCasTyper: Automated Identification, Annotation and Classification of CRISPR-Cas loci. The CRISPR Journal Dec 2020](https://doi.org/10.1089/crispr.2020.0059)
+
+Find a free to read version on [BioRxiv](https://doi.org/10.1101/2020.05.15.097824)
+
+# Table of contents
+1. [Quick start](#quick)
+2. [Installation](#install)
+3. [CRISPRCasTyper - How to](#cctyperhow)
+ * [Plotting](#plot)
+4. [RepeatType - How to](#repeattype)
+ * [Updated models](#repeatnew)
+5. [RepeatType - Train](#repeattrain)
+6. [Troubleshoot](#trouble)
+
+## Quick start <a name="quick"></a>
+
+```sh
+conda create -n cctyper -c conda-forge -c bioconda -c russel88 cctyper
+conda activate cctyper
+cctyper my.fasta my_output
+```
+
+## Installation <a name="install"></a>
+CRISPRCasTyper can be installed either through conda or pip.
+
+It is advised to use conda, since this installs CRISPRCasTyper and all dependencies, and downloads the database in one go.
+
+### Conda
+Use [miniconda](https://docs.conda.io/en/latest/miniconda.html) or [anaconda](https://www.anaconda.com/) to install.
+
+Create the environment with CRISPRCasTyper and all dependencies and database
+```sh
+conda create -n cctyper -c conda-forge -c bioconda -c russel88 cctyper
+```
+
+### pip
+If you have the dependencies (Python >= 3.8, HMMER >= 3.2, Prodigal >= 2.6, minced, grep, sed) in your PATH you can install with pip
+
+Install cctyper python module
+```sh
+python -m pip install cctyper
+```
+
+Upgrade cctyper python module to the latest version
+```sh
+python -m pip install cctyper --upgrade
+```
+
+
+#### When installing with pip, you need to download the database manually:
+```sh
+# Download and unpack
+svn checkout https://github.com/Russel88/CRISPRCasTyper/trunk/data
+tar -xvzf data/Profiles.tar.gz
+mv Profiles/ data/
+rm data/Profiles.tar.gz
+
+# Tell CRISPRCasTyper where the data is:
+# either by setting an environment variable (has to be done for each terminal session, or added to .bashrc):
+export CCTYPER_DB="/path/to/data/"
+# or by using the --db argument each time you run CRISPRCasTyper:
+cctyper input.fa output --db /path/to/data/
+```
+
+## CRISPRCasTyper - How to <a name="cctyperhow"></a>
+CRISPRCasTyper takes as input a nucleotide fasta, and produces outputs with CRISPR-Cas predictions
+
+#### Activate environment
+```sh
+conda activate cctyper
+```
+
+#### Run with a nucleotide fasta as input
+```sh
+cctyper genome.fa my_output
+```
+
+#### If you have a complete circular genome (each entry in the fasta will be treated as having circular topology)
+```sh
+cctyper genome.fa my_output --circular
+```
+
+#### For metagenome assemblies and short contigs/plasmids/phages, change the prodigal mode
+The default prodigal mode expects the input to be a single draft or complete genome
+```sh
+cctyper assembly.fa my_output --prodigal meta
+```
+
+#### Check the different options
+```sh
+cctyper -h
+```
+
+#### Output <a name="cctyperout"></a>
+* **CRISPR_Cas.tab:** CRISPR_Cas loci, with consensus subtype prediction
+ * Contig: Sequence accession
+ * Operon: Operon ID (Sequence accession @ NUMBER)
+ * Operon_Pos: [Start, End] of operon
+ * Prediction: Consenus prediction based on both Cas operon and CRISPR arrays
+ * CRISPRs: CRISPRs adjacent to Cas operon
+ * Distances: Distances to CRISPRs from Cas operon
+ * Prediction_Cas: Subtype prediction based on Cas operon
+ * Prediction_CRISPRs: Subtype prediction of CRISPRs based on CRISPR repeat sequences
+* **cas_operons.tab:** All certain Cas operons
+ * Contig: Sequence accession
+ * Operon: Operon ID (Sequence accession @ NUMBER)
+ * Start: Start of operon
+ * End: End of operon
+ * Prediction: Subtype prediction
+ * Complete_Interference: Percent completion of the interference module(s). Can be a list if best_type is a list (Hybrid and Ambiguous)
+ * Complete_Adaptation: Percent completion of the adaptation module(s). Can be a list if best_type is a list (Hybrid and Ambiguous)
+ * Best_type: Subtype with the highest score. If the score is high then Prediction = Best_type
+ * Best_score: Score of the highest scoring subtype
+ * Genes: List of Cas genes
+ * Positions: List of Gene IDs for the genes
+ * E-values: List of E-values for the genes
+ * CoverageSeq: List of sequence coverages for the genes
+ * CoverageHMM: List of HMM coverages for the genes
+ * Strand_Interference: Strand of interference module. 1 is positive strand, -1 is negative strand, 0 is mixed, NA if no interference gene found
+ * Strand_Adaptation: Strand of adaptation module. 1 is positive strand, -1 is negative strand, 0 is mixed, NA if no adaptation gene found
+* **crisprs_all.tab:** All CRISPR arrays, also false positives
+ * Contig: Sequence accession
+ * CRISPR: CRISPR ID (minced: Sequence accession _ NUMBER; repeatBLAST: Sequence accession - NUMBER _ NUMBER)
+ * Start: Start of CRISPR
+ * End: End of CRISPR
+ * Consensus_repeat: Consensus repeat sequence
+ * N_repeats: Number of repeats
+ * Repeat_len: Length of repeat sequences
+ * Spacer_len_avg: Average spacer length
+ * Repeat_identity: Average identity of repeat sequences
+ * Spacer_identity: Average identity of spacer sequences
+ * Spacer_len_sem: Standard error of the mean of spacer lenghts
+ * Trusted: TRUE/FALSE, is the array trusted. Based on repeat/spacer identity, spacer sem, prediction probability and adjacency to a cas operon
+ * Prediction: Prediction of the associated subtype based on the repeat sequence
+ * Subtype: Subtype with highest prediction probability. Prediction = Subtype if Subtype_probability is high
+ * Subtype_probability: Probability of subtype prediction
+* **crisprs_near_cas.tab:** CRISPRs part of CRISPR-Cas loci
+ * Same columns as crisprs_all.tab
+* **crisprs_orphan.tab:** Orphan CRISPRs (those not in CRISPR_Cas.tab)
+ * Same columns as crisprs_all.tab
+* **crisprs_putative.tab:** Low quality CRISPRs. Most likely false positives
+ * Same columns as crisprs_all.tab
+* **cas_operons_orphan.tab:** Orphan Cas operons (those not in CRISPR_Cas.tab)
+ * Same columns as cas_operons.tab
+* **CRISPR_Cas_putative.tab:** Putative CRISPR_Cas loci, often lonely Cas genes next to a CRISPR array
+ * Same columns as CRISPR_Cas.tab
+* **cas_operons_putative.tab:** Putative Cas operons, mostly false positives, but also some ambiguous and partial systems
+ * Same columns as cas_operons.tab
+* **spacers/*.fa:** Fasta files with all spacer sequences
+* **hmmer.tab:** All HMM vs. ORF matches, unfiltered results
+ * Hmm: HMM name
+ * ORF: ORF name (Sequence accession _ Gene ID)
+ * tlen: ORF length
+ * qlen: HMM length
+ * Eval: E-value of alignment
+ * score: Alignment score
+ * start: ORF start
+ * end: ORF end
+ * Acc: Sequence accession
+ * Pos: Gene ID
+ * Cov_seq: Sequence coverage
+ * Cov_hmm: HMM coverage
+ * strand: Coding strand is like input (1) or reverse complement (-1)
+* **genes.tab** All genes and their positions
+ * Contig: Sequence accession
+ * Start: Start of ORF
+ * End: End of ORF
+ * Strand: Coding strand is like input (1) or reverse complement (-1)
+ * Pos: Gene ID
+* **arguments.tab:** File with arguments given to CRISPRCasTyper
+* **hmmer.log** Error messages from HMMER (only produced if any errors were encountered)
+
+##### If run with `--keep_tmp` the following is also produced
+* **prodigal.log** Log from prodigal
+* **proteins.faa** Protein sequences
+* **hmmer/*.tab** Alignment output from HMMER for each Cas HMM
+* **minced.out:** CRISPR array output from minced
+* **blast.tab:** BLAST output from repeat alignment against flanking regions of cas operons
+* **Flank....:** Fasta of flanking regions near cas operons and BLAST database of this
+
+#### Notes on output
+Files are only created if there is any data. For example, the CRISPR_Cas.tab file is only created if there are any CRISPR-Cas loci.
+
+### Plotting <a name="plot"></a>
+CRISPRCasTyper will automatically plot a map of the CRISPR-Cas loci, orphan Cas operons, and orphan CRISPR arrays.
+
+These maps can be expanded (`--expand N`) by adding unknown genes and genes with alignment scores below the thresholds. This can help in identify potentially un-annotated genes in operons. You can generate new plots without having to re-run the entire pipeline by adding `--redo_typing` to the command. This will re-use the mappings and re-type the operons and re-make the plot, based on new thresholds and plot parameters.
+
+The plot below is run with `--expand 5000`
+
+* Arrays are in alternating black/white displaying the actual number of repeats/spacers, and with their predicted subtype association based on the consensus repeat sequence.
+* The interference module is in yellow.
+* The adaptation module is in blue.
+* Cas6 is in red.
+* Accessory genes are in purple
+* Genes with alignment scores below the thresholds are lighter and with parentheses around names.
+* Unknown genes are in gray (the number matches the genes.tab file)
+
+<img src='img/plot2.svg' align="left" height="350" />
+
+## RepeatTyper - How to <a name="repeattype"></a>
+With an input of CRISPR repeats (one per line, in a simple textfile) RepeatTyper will predict the subtype, based on the kmer composition of the repeat
+
+#### Activate environment
+```sh
+conda activate cctyper
+```
+
+#### Run with a simple textfile, containing only CRISPR repeats (in capital letters), one repeat per line.
+```sh
+repeatType repeats.txt
+```
+
+#### Output <a name="repeattypeout"></a>
+The script prints:
+* Repeat sequence
+* Predicted subtype
+* Probability of prediction
+
+#### Notes on output
+* Predictions with probabilities below 0.75 are uncertain, and should be taken with a grain of salt.
+* Prior to version 1.4.0 the curated repeatTyper model was included in CCTyper
+* From version 1.4.0 and onwards updated repeatTyper models are included in CCTyper (see more information in the section below)
+* The followinig subtypes are included in the updated model as per December 2022:
+ * I-A, I-B, I-C, I-D, I-E, I-F, I-F (Transposon), I-G
+ * II-A, II-B, II-C
+ * III-A, III-B, III-C, III-D, III-E, III-F
+ * IV-A1, IV-A2, IV-A3, IV-D, IV-E
+ * V-A, V-B1, V-E, V-F1, V-F2, V-F3, V-F (the rest), V-G, V-I, V-J, V-K
+ * VI-A, VI-B1, VI-B2, VI-C, VI-D
+* This is the accuracy per subtype (on an unseen test dataset):
+* I-A 0.76
+* I-B 0.81
+* I-C 0.97
+* I-D 0.86
+* I-E 0.95
+* I-F 0.96
+* I-F_T 0.99
+* I-G 0.89
+* II-A 0.92
+* II-B 0.97
+* II-C 0.90
+* III-A 0.82
+* III-B 0.68
+* III-C 0.60
+* III-D 0.59
+* III-E 1.00
+* III-F 0.25
+* IV-A1 0.85
+* IV-A2 0.68
+* IV-A3 0.96
+* IV-D 0.85
+* IV-E 0.92
+* V-A 1.00
+* V-B1 0.90
+* V-E 0.30
+* V-F 0.87
+* V-F1 0.87
+* V-F2 0.90
+* V-F3 0.90
+* V-G 0.67
+* V-I 0.80
+* V-J 0.63
+* V-K 0.99
+* VI-A 0.96
+* VI-B1 0.96
+* VI-B2 1.00
+* VI-C 0.67
+* VI-D 0.97
+
+### Updated RepeatTyper models <a name="repeatnew"></a>
+The [CCTyper webserver](https://typer.crispr.dk) is crowdsourcing subtyped repeats and includes an updated RepeatTyper model based on a much larger set of repeats and contains additional subtypes compared to the curated RepeatTyper model.
+This updated model is automatically retrained each month and the models can be downloaded [here](http://mibi.galaxy.bio.ku.dk/russel/repeattyper/).
+
+From version 1.4.0 and onwards of CCTyper the newest repeatTyper model is included upon release of the version.
+
+Each model contains a training report (xgb_report), where you can find the training log, and in the bottom the accuracy, both overall and per subtype.
+
+#### Use new model in CRISPRCasTyper
+Save the original database files:
+```sh
+mv ${CCTYPER_DB}/type_dict.tab ${CCTYPER_DB}/type_dict_orig.tab
+mv ${CCTYPER_DB}/xgb_repeats.model ${CCTYPER_DB}/xgb_repeats_orig.model
+```
+
+Move the new model into the database folder
+```sh
+mv repeat_model/* ${CCTYPER_DB}/
+```
+
+##### CRISPRCasTyper and RepeatTyper will now use the new model for repeat prediction!
+
+## RepeatTyper - Train <a name="repeattrain"></a>
+You can train the repeat classifier with your own set of subtyped repeats. With a tab-delimeted input where 1. column contains the subtypes and 2. column contains the CRISPR repeat sequences, RepeatTrain will train a CRISPR repeat classifier that is directly usable for both RepeatTyper and CRISPRCasTyper.
+
+#### Train
+```sh
+repeatTrain typed_repeats.tab my_classifier
+```
+
+#### Use new model in RepeatTyper
+```sh
+repeatType repeats.txt --db my_classifier
+```
+
+#### Use new model in CRISPRCasTyper
+Save the original database files:
+```sh
+mv ${CCTYPER_DB}/type_dict.tab ${CCTYPER_DB}/type_dict_orig.tab
+mv ${CCTYPER_DB}/xgb_repeats.model ${CCTYPER_DB}/xgb_repeats_orig.model
+```
+
+Move the new model into the database folder
+```sh
+mv my_classifier/* ${CCTYPER_DB}/
+```
+
+##### CRISPRCasTyper and RepeatTyper will now use the new model for repeat prediction!
+
+## Troubleshoot <a name="trouble"></a>
+
+### Running out of memory
+Large metagenomic assemblies with many small contigs can exhaust the RAM on your laptop. Fortunately, as metagenomic contigs are analysed separately (when run with `--prodigal meta`) a simple solution is to split the input into smaller chunks (e.g. with [pyfasta](https://pypi.org/project/pyfasta/#command-line-interface))
+
+
+
+
+%package -n python3-cctyper
+Summary: CRISPRCasTyper: Automatic detection and subtyping of CRISPR-Cas operons
+Provides: python-cctyper
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-cctyper
+[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
+[![Conda](https://anaconda.org/russel88/cctyper/badges/installer/conda.svg)](https://anaconda.org/russel88/cctyper)
+
+# CRISPRCasTyper
+
+Detect CRISPR-Cas genes and arrays, and predict the subtype based on both Cas genes and CRISPR repeat sequence.
+
+[CRISPRCasTyper and RepeatType are also available through a webserver](https://crisprcastyper.crispr.dk)
+
+This software finds Cas genes with a large suite of HMMs, then groups these HMMs into operons, and predicts the subtype of the operons based on a scoring scheme.
+Furthermore, it finds CRISPR arrays with [minced](https://github.com/ctSkennerton/minced) and by BLASTing a large suite of known repeats, and using a kmer-based machine learning approach (extreme gradient boosting trees) it predicts the subtype of the CRISPR arrays based on the consensus repeat.
+It then connects the Cas operons and CRISPR arrays, producing as output:
+* CRISPR-Cas loci, with consensus subtype prediction based on both Cas genes (mostly) and CRISPR consensus repeats
+* Orphan Cas operons, and their predicted subtype
+* Orphan CRISPR arrays, and their predicted associated subtype
+
+#### It includes the following 46 subtypes/variants [(find typing scheme here)](https://typer.crispr.dk/#/typing):
+* I-A, I-B, I-C, I-D, I-E, I-F, I-F (transposon), I-G, II-A, II-B, II-C, III-A, III-B, III-C, III-D, III-E, III-F, IV-A1, IV-A2, IV-A3, IV-B, IV-C, IV-D, IV-E, V-A, V-B1, V-B2, V-C, V-D, V-E, V-F1, V-F2, V-F3, V-F (the rest), V-G, V-H, V-I, V-J, V-K, V-L, VI-A, VI-B1, VI-B2, VI-C, VI-D, VI-X, VI-Y.
+
+* All subtypes from the most recent Nature Reviews Microbiology (Makarova et al. 2020): [Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants](https://doi.org/10.1038/s41579-019-0299-x)
+* Updated type IV subtypes and variants based on: [Type IV CRISPR–Cas systems are highly diverse and involved in competition between plasmids](https://doi.org/10.1093/nar/gkz1197)
+* Type V-K: [RNA-guided DNA insertion with CRISPR-associated transposases](https://doi.org/10.1126/science.aax9181)
+* Transposon associated type I-F: [Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration](https://doi.org/10.1038/s41586-019-1323-z)
+* New V-A variants: [Novel Type V-A CRISPR Effectors Are Active Nucleases with Expanded Targeting Capabilities](https://doi.org/10.1089/crispr.2020.0043)
+* New Cas13s: [Programmable RNA editing with compact CRISPR–Cas13 systems from uncultivated microbes](https://doi.org/10.1038/s41592-021-01124-4)
+* V-L (cas12l): [A new family of CRISPR-type V nucleases with C-rich PAM recognition](https://doi.org/10.15252/embr.202255481)
+
+#### It can automatically draw gene maps of CRISPR-Cas systems and orphan Cas operons and CRISPR arrays
+##### in vector graphics format for direct use in scientific manuscripts
+<img src='img/plot.svg' align="left" height="200" />
+
+#### Citation
+[Jakob Russel, Rafael Pinilla-Redondo, David Mayo-Muñoz, Shiraz A. Shah, Søren J. Sørensen - CRISPRCasTyper: Automated Identification, Annotation and Classification of CRISPR-Cas loci. The CRISPR Journal Dec 2020](https://doi.org/10.1089/crispr.2020.0059)
+
+Find a free to read version on [BioRxiv](https://doi.org/10.1101/2020.05.15.097824)
+
+# Table of contents
+1. [Quick start](#quick)
+2. [Installation](#install)
+3. [CRISPRCasTyper - How to](#cctyperhow)
+ * [Plotting](#plot)
+4. [RepeatType - How to](#repeattype)
+ * [Updated models](#repeatnew)
+5. [RepeatType - Train](#repeattrain)
+6. [Troubleshoot](#trouble)
+
+## Quick start <a name="quick"></a>
+
+```sh
+conda create -n cctyper -c conda-forge -c bioconda -c russel88 cctyper
+conda activate cctyper
+cctyper my.fasta my_output
+```
+
+## Installation <a name="install"></a>
+CRISPRCasTyper can be installed either through conda or pip.
+
+It is advised to use conda, since this installs CRISPRCasTyper and all dependencies, and downloads the database in one go.
+
+### Conda
+Use [miniconda](https://docs.conda.io/en/latest/miniconda.html) or [anaconda](https://www.anaconda.com/) to install.
+
+Create the environment with CRISPRCasTyper and all dependencies and database
+```sh
+conda create -n cctyper -c conda-forge -c bioconda -c russel88 cctyper
+```
+
+### pip
+If you have the dependencies (Python >= 3.8, HMMER >= 3.2, Prodigal >= 2.6, minced, grep, sed) in your PATH you can install with pip
+
+Install cctyper python module
+```sh
+python -m pip install cctyper
+```
+
+Upgrade cctyper python module to the latest version
+```sh
+python -m pip install cctyper --upgrade
+```
+
+
+#### When installing with pip, you need to download the database manually:
+```sh
+# Download and unpack
+svn checkout https://github.com/Russel88/CRISPRCasTyper/trunk/data
+tar -xvzf data/Profiles.tar.gz
+mv Profiles/ data/
+rm data/Profiles.tar.gz
+
+# Tell CRISPRCasTyper where the data is:
+# either by setting an environment variable (has to be done for each terminal session, or added to .bashrc):
+export CCTYPER_DB="/path/to/data/"
+# or by using the --db argument each time you run CRISPRCasTyper:
+cctyper input.fa output --db /path/to/data/
+```
+
+## CRISPRCasTyper - How to <a name="cctyperhow"></a>
+CRISPRCasTyper takes as input a nucleotide fasta, and produces outputs with CRISPR-Cas predictions
+
+#### Activate environment
+```sh
+conda activate cctyper
+```
+
+#### Run with a nucleotide fasta as input
+```sh
+cctyper genome.fa my_output
+```
+
+#### If you have a complete circular genome (each entry in the fasta will be treated as having circular topology)
+```sh
+cctyper genome.fa my_output --circular
+```
+
+#### For metagenome assemblies and short contigs/plasmids/phages, change the prodigal mode
+The default prodigal mode expects the input to be a single draft or complete genome
+```sh
+cctyper assembly.fa my_output --prodigal meta
+```
+
+#### Check the different options
+```sh
+cctyper -h
+```
+
+#### Output <a name="cctyperout"></a>
+* **CRISPR_Cas.tab:** CRISPR_Cas loci, with consensus subtype prediction
+ * Contig: Sequence accession
+ * Operon: Operon ID (Sequence accession @ NUMBER)
+ * Operon_Pos: [Start, End] of operon
+ * Prediction: Consenus prediction based on both Cas operon and CRISPR arrays
+ * CRISPRs: CRISPRs adjacent to Cas operon
+ * Distances: Distances to CRISPRs from Cas operon
+ * Prediction_Cas: Subtype prediction based on Cas operon
+ * Prediction_CRISPRs: Subtype prediction of CRISPRs based on CRISPR repeat sequences
+* **cas_operons.tab:** All certain Cas operons
+ * Contig: Sequence accession
+ * Operon: Operon ID (Sequence accession @ NUMBER)
+ * Start: Start of operon
+ * End: End of operon
+ * Prediction: Subtype prediction
+ * Complete_Interference: Percent completion of the interference module(s). Can be a list if best_type is a list (Hybrid and Ambiguous)
+ * Complete_Adaptation: Percent completion of the adaptation module(s). Can be a list if best_type is a list (Hybrid and Ambiguous)
+ * Best_type: Subtype with the highest score. If the score is high then Prediction = Best_type
+ * Best_score: Score of the highest scoring subtype
+ * Genes: List of Cas genes
+ * Positions: List of Gene IDs for the genes
+ * E-values: List of E-values for the genes
+ * CoverageSeq: List of sequence coverages for the genes
+ * CoverageHMM: List of HMM coverages for the genes
+ * Strand_Interference: Strand of interference module. 1 is positive strand, -1 is negative strand, 0 is mixed, NA if no interference gene found
+ * Strand_Adaptation: Strand of adaptation module. 1 is positive strand, -1 is negative strand, 0 is mixed, NA if no adaptation gene found
+* **crisprs_all.tab:** All CRISPR arrays, also false positives
+ * Contig: Sequence accession
+ * CRISPR: CRISPR ID (minced: Sequence accession _ NUMBER; repeatBLAST: Sequence accession - NUMBER _ NUMBER)
+ * Start: Start of CRISPR
+ * End: End of CRISPR
+ * Consensus_repeat: Consensus repeat sequence
+ * N_repeats: Number of repeats
+ * Repeat_len: Length of repeat sequences
+ * Spacer_len_avg: Average spacer length
+ * Repeat_identity: Average identity of repeat sequences
+ * Spacer_identity: Average identity of spacer sequences
+ * Spacer_len_sem: Standard error of the mean of spacer lenghts
+ * Trusted: TRUE/FALSE, is the array trusted. Based on repeat/spacer identity, spacer sem, prediction probability and adjacency to a cas operon
+ * Prediction: Prediction of the associated subtype based on the repeat sequence
+ * Subtype: Subtype with highest prediction probability. Prediction = Subtype if Subtype_probability is high
+ * Subtype_probability: Probability of subtype prediction
+* **crisprs_near_cas.tab:** CRISPRs part of CRISPR-Cas loci
+ * Same columns as crisprs_all.tab
+* **crisprs_orphan.tab:** Orphan CRISPRs (those not in CRISPR_Cas.tab)
+ * Same columns as crisprs_all.tab
+* **crisprs_putative.tab:** Low quality CRISPRs. Most likely false positives
+ * Same columns as crisprs_all.tab
+* **cas_operons_orphan.tab:** Orphan Cas operons (those not in CRISPR_Cas.tab)
+ * Same columns as cas_operons.tab
+* **CRISPR_Cas_putative.tab:** Putative CRISPR_Cas loci, often lonely Cas genes next to a CRISPR array
+ * Same columns as CRISPR_Cas.tab
+* **cas_operons_putative.tab:** Putative Cas operons, mostly false positives, but also some ambiguous and partial systems
+ * Same columns as cas_operons.tab
+* **spacers/*.fa:** Fasta files with all spacer sequences
+* **hmmer.tab:** All HMM vs. ORF matches, unfiltered results
+ * Hmm: HMM name
+ * ORF: ORF name (Sequence accession _ Gene ID)
+ * tlen: ORF length
+ * qlen: HMM length
+ * Eval: E-value of alignment
+ * score: Alignment score
+ * start: ORF start
+ * end: ORF end
+ * Acc: Sequence accession
+ * Pos: Gene ID
+ * Cov_seq: Sequence coverage
+ * Cov_hmm: HMM coverage
+ * strand: Coding strand is like input (1) or reverse complement (-1)
+* **genes.tab** All genes and their positions
+ * Contig: Sequence accession
+ * Start: Start of ORF
+ * End: End of ORF
+ * Strand: Coding strand is like input (1) or reverse complement (-1)
+ * Pos: Gene ID
+* **arguments.tab:** File with arguments given to CRISPRCasTyper
+* **hmmer.log** Error messages from HMMER (only produced if any errors were encountered)
+
+##### If run with `--keep_tmp` the following is also produced
+* **prodigal.log** Log from prodigal
+* **proteins.faa** Protein sequences
+* **hmmer/*.tab** Alignment output from HMMER for each Cas HMM
+* **minced.out:** CRISPR array output from minced
+* **blast.tab:** BLAST output from repeat alignment against flanking regions of cas operons
+* **Flank....:** Fasta of flanking regions near cas operons and BLAST database of this
+
+#### Notes on output
+Files are only created if there is any data. For example, the CRISPR_Cas.tab file is only created if there are any CRISPR-Cas loci.
+
+### Plotting <a name="plot"></a>
+CRISPRCasTyper will automatically plot a map of the CRISPR-Cas loci, orphan Cas operons, and orphan CRISPR arrays.
+
+These maps can be expanded (`--expand N`) by adding unknown genes and genes with alignment scores below the thresholds. This can help in identify potentially un-annotated genes in operons. You can generate new plots without having to re-run the entire pipeline by adding `--redo_typing` to the command. This will re-use the mappings and re-type the operons and re-make the plot, based on new thresholds and plot parameters.
+
+The plot below is run with `--expand 5000`
+
+* Arrays are in alternating black/white displaying the actual number of repeats/spacers, and with their predicted subtype association based on the consensus repeat sequence.
+* The interference module is in yellow.
+* The adaptation module is in blue.
+* Cas6 is in red.
+* Accessory genes are in purple
+* Genes with alignment scores below the thresholds are lighter and with parentheses around names.
+* Unknown genes are in gray (the number matches the genes.tab file)
+
+<img src='img/plot2.svg' align="left" height="350" />
+
+## RepeatTyper - How to <a name="repeattype"></a>
+With an input of CRISPR repeats (one per line, in a simple textfile) RepeatTyper will predict the subtype, based on the kmer composition of the repeat
+
+#### Activate environment
+```sh
+conda activate cctyper
+```
+
+#### Run with a simple textfile, containing only CRISPR repeats (in capital letters), one repeat per line.
+```sh
+repeatType repeats.txt
+```
+
+#### Output <a name="repeattypeout"></a>
+The script prints:
+* Repeat sequence
+* Predicted subtype
+* Probability of prediction
+
+#### Notes on output
+* Predictions with probabilities below 0.75 are uncertain, and should be taken with a grain of salt.
+* Prior to version 1.4.0 the curated repeatTyper model was included in CCTyper
+* From version 1.4.0 and onwards updated repeatTyper models are included in CCTyper (see more information in the section below)
+* The followinig subtypes are included in the updated model as per December 2022:
+ * I-A, I-B, I-C, I-D, I-E, I-F, I-F (Transposon), I-G
+ * II-A, II-B, II-C
+ * III-A, III-B, III-C, III-D, III-E, III-F
+ * IV-A1, IV-A2, IV-A3, IV-D, IV-E
+ * V-A, V-B1, V-E, V-F1, V-F2, V-F3, V-F (the rest), V-G, V-I, V-J, V-K
+ * VI-A, VI-B1, VI-B2, VI-C, VI-D
+* This is the accuracy per subtype (on an unseen test dataset):
+* I-A 0.76
+* I-B 0.81
+* I-C 0.97
+* I-D 0.86
+* I-E 0.95
+* I-F 0.96
+* I-F_T 0.99
+* I-G 0.89
+* II-A 0.92
+* II-B 0.97
+* II-C 0.90
+* III-A 0.82
+* III-B 0.68
+* III-C 0.60
+* III-D 0.59
+* III-E 1.00
+* III-F 0.25
+* IV-A1 0.85
+* IV-A2 0.68
+* IV-A3 0.96
+* IV-D 0.85
+* IV-E 0.92
+* V-A 1.00
+* V-B1 0.90
+* V-E 0.30
+* V-F 0.87
+* V-F1 0.87
+* V-F2 0.90
+* V-F3 0.90
+* V-G 0.67
+* V-I 0.80
+* V-J 0.63
+* V-K 0.99
+* VI-A 0.96
+* VI-B1 0.96
+* VI-B2 1.00
+* VI-C 0.67
+* VI-D 0.97
+
+### Updated RepeatTyper models <a name="repeatnew"></a>
+The [CCTyper webserver](https://typer.crispr.dk) is crowdsourcing subtyped repeats and includes an updated RepeatTyper model based on a much larger set of repeats and contains additional subtypes compared to the curated RepeatTyper model.
+This updated model is automatically retrained each month and the models can be downloaded [here](http://mibi.galaxy.bio.ku.dk/russel/repeattyper/).
+
+From version 1.4.0 and onwards of CCTyper the newest repeatTyper model is included upon release of the version.
+
+Each model contains a training report (xgb_report), where you can find the training log, and in the bottom the accuracy, both overall and per subtype.
+
+#### Use new model in CRISPRCasTyper
+Save the original database files:
+```sh
+mv ${CCTYPER_DB}/type_dict.tab ${CCTYPER_DB}/type_dict_orig.tab
+mv ${CCTYPER_DB}/xgb_repeats.model ${CCTYPER_DB}/xgb_repeats_orig.model
+```
+
+Move the new model into the database folder
+```sh
+mv repeat_model/* ${CCTYPER_DB}/
+```
+
+##### CRISPRCasTyper and RepeatTyper will now use the new model for repeat prediction!
+
+## RepeatTyper - Train <a name="repeattrain"></a>
+You can train the repeat classifier with your own set of subtyped repeats. With a tab-delimeted input where 1. column contains the subtypes and 2. column contains the CRISPR repeat sequences, RepeatTrain will train a CRISPR repeat classifier that is directly usable for both RepeatTyper and CRISPRCasTyper.
+
+#### Train
+```sh
+repeatTrain typed_repeats.tab my_classifier
+```
+
+#### Use new model in RepeatTyper
+```sh
+repeatType repeats.txt --db my_classifier
+```
+
+#### Use new model in CRISPRCasTyper
+Save the original database files:
+```sh
+mv ${CCTYPER_DB}/type_dict.tab ${CCTYPER_DB}/type_dict_orig.tab
+mv ${CCTYPER_DB}/xgb_repeats.model ${CCTYPER_DB}/xgb_repeats_orig.model
+```
+
+Move the new model into the database folder
+```sh
+mv my_classifier/* ${CCTYPER_DB}/
+```
+
+##### CRISPRCasTyper and RepeatTyper will now use the new model for repeat prediction!
+
+## Troubleshoot <a name="trouble"></a>
+
+### Running out of memory
+Large metagenomic assemblies with many small contigs can exhaust the RAM on your laptop. Fortunately, as metagenomic contigs are analysed separately (when run with `--prodigal meta`) a simple solution is to split the input into smaller chunks (e.g. with [pyfasta](https://pypi.org/project/pyfasta/#command-line-interface))
+
+
+
+
+%package help
+Summary: Development documents and examples for cctyper
+Provides: python3-cctyper-doc
+%description help
+[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
+[![Conda](https://anaconda.org/russel88/cctyper/badges/installer/conda.svg)](https://anaconda.org/russel88/cctyper)
+
+# CRISPRCasTyper
+
+Detect CRISPR-Cas genes and arrays, and predict the subtype based on both Cas genes and CRISPR repeat sequence.
+
+[CRISPRCasTyper and RepeatType are also available through a webserver](https://crisprcastyper.crispr.dk)
+
+This software finds Cas genes with a large suite of HMMs, then groups these HMMs into operons, and predicts the subtype of the operons based on a scoring scheme.
+Furthermore, it finds CRISPR arrays with [minced](https://github.com/ctSkennerton/minced) and by BLASTing a large suite of known repeats, and using a kmer-based machine learning approach (extreme gradient boosting trees) it predicts the subtype of the CRISPR arrays based on the consensus repeat.
+It then connects the Cas operons and CRISPR arrays, producing as output:
+* CRISPR-Cas loci, with consensus subtype prediction based on both Cas genes (mostly) and CRISPR consensus repeats
+* Orphan Cas operons, and their predicted subtype
+* Orphan CRISPR arrays, and their predicted associated subtype
+
+#### It includes the following 46 subtypes/variants [(find typing scheme here)](https://typer.crispr.dk/#/typing):
+* I-A, I-B, I-C, I-D, I-E, I-F, I-F (transposon), I-G, II-A, II-B, II-C, III-A, III-B, III-C, III-D, III-E, III-F, IV-A1, IV-A2, IV-A3, IV-B, IV-C, IV-D, IV-E, V-A, V-B1, V-B2, V-C, V-D, V-E, V-F1, V-F2, V-F3, V-F (the rest), V-G, V-H, V-I, V-J, V-K, V-L, VI-A, VI-B1, VI-B2, VI-C, VI-D, VI-X, VI-Y.
+
+* All subtypes from the most recent Nature Reviews Microbiology (Makarova et al. 2020): [Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants](https://doi.org/10.1038/s41579-019-0299-x)
+* Updated type IV subtypes and variants based on: [Type IV CRISPR–Cas systems are highly diverse and involved in competition between plasmids](https://doi.org/10.1093/nar/gkz1197)
+* Type V-K: [RNA-guided DNA insertion with CRISPR-associated transposases](https://doi.org/10.1126/science.aax9181)
+* Transposon associated type I-F: [Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration](https://doi.org/10.1038/s41586-019-1323-z)
+* New V-A variants: [Novel Type V-A CRISPR Effectors Are Active Nucleases with Expanded Targeting Capabilities](https://doi.org/10.1089/crispr.2020.0043)
+* New Cas13s: [Programmable RNA editing with compact CRISPR–Cas13 systems from uncultivated microbes](https://doi.org/10.1038/s41592-021-01124-4)
+* V-L (cas12l): [A new family of CRISPR-type V nucleases with C-rich PAM recognition](https://doi.org/10.15252/embr.202255481)
+
+#### It can automatically draw gene maps of CRISPR-Cas systems and orphan Cas operons and CRISPR arrays
+##### in vector graphics format for direct use in scientific manuscripts
+<img src='img/plot.svg' align="left" height="200" />
+
+#### Citation
+[Jakob Russel, Rafael Pinilla-Redondo, David Mayo-Muñoz, Shiraz A. Shah, Søren J. Sørensen - CRISPRCasTyper: Automated Identification, Annotation and Classification of CRISPR-Cas loci. The CRISPR Journal Dec 2020](https://doi.org/10.1089/crispr.2020.0059)
+
+Find a free to read version on [BioRxiv](https://doi.org/10.1101/2020.05.15.097824)
+
+# Table of contents
+1. [Quick start](#quick)
+2. [Installation](#install)
+3. [CRISPRCasTyper - How to](#cctyperhow)
+ * [Plotting](#plot)
+4. [RepeatType - How to](#repeattype)
+ * [Updated models](#repeatnew)
+5. [RepeatType - Train](#repeattrain)
+6. [Troubleshoot](#trouble)
+
+## Quick start <a name="quick"></a>
+
+```sh
+conda create -n cctyper -c conda-forge -c bioconda -c russel88 cctyper
+conda activate cctyper
+cctyper my.fasta my_output
+```
+
+## Installation <a name="install"></a>
+CRISPRCasTyper can be installed either through conda or pip.
+
+It is advised to use conda, since this installs CRISPRCasTyper and all dependencies, and downloads the database in one go.
+
+### Conda
+Use [miniconda](https://docs.conda.io/en/latest/miniconda.html) or [anaconda](https://www.anaconda.com/) to install.
+
+Create the environment with CRISPRCasTyper and all dependencies and database
+```sh
+conda create -n cctyper -c conda-forge -c bioconda -c russel88 cctyper
+```
+
+### pip
+If you have the dependencies (Python >= 3.8, HMMER >= 3.2, Prodigal >= 2.6, minced, grep, sed) in your PATH you can install with pip
+
+Install cctyper python module
+```sh
+python -m pip install cctyper
+```
+
+Upgrade cctyper python module to the latest version
+```sh
+python -m pip install cctyper --upgrade
+```
+
+
+#### When installing with pip, you need to download the database manually:
+```sh
+# Download and unpack
+svn checkout https://github.com/Russel88/CRISPRCasTyper/trunk/data
+tar -xvzf data/Profiles.tar.gz
+mv Profiles/ data/
+rm data/Profiles.tar.gz
+
+# Tell CRISPRCasTyper where the data is:
+# either by setting an environment variable (has to be done for each terminal session, or added to .bashrc):
+export CCTYPER_DB="/path/to/data/"
+# or by using the --db argument each time you run CRISPRCasTyper:
+cctyper input.fa output --db /path/to/data/
+```
+
+## CRISPRCasTyper - How to <a name="cctyperhow"></a>
+CRISPRCasTyper takes as input a nucleotide fasta, and produces outputs with CRISPR-Cas predictions
+
+#### Activate environment
+```sh
+conda activate cctyper
+```
+
+#### Run with a nucleotide fasta as input
+```sh
+cctyper genome.fa my_output
+```
+
+#### If you have a complete circular genome (each entry in the fasta will be treated as having circular topology)
+```sh
+cctyper genome.fa my_output --circular
+```
+
+#### For metagenome assemblies and short contigs/plasmids/phages, change the prodigal mode
+The default prodigal mode expects the input to be a single draft or complete genome
+```sh
+cctyper assembly.fa my_output --prodigal meta
+```
+
+#### Check the different options
+```sh
+cctyper -h
+```
+
+#### Output <a name="cctyperout"></a>
+* **CRISPR_Cas.tab:** CRISPR_Cas loci, with consensus subtype prediction
+ * Contig: Sequence accession
+ * Operon: Operon ID (Sequence accession @ NUMBER)
+ * Operon_Pos: [Start, End] of operon
+ * Prediction: Consenus prediction based on both Cas operon and CRISPR arrays
+ * CRISPRs: CRISPRs adjacent to Cas operon
+ * Distances: Distances to CRISPRs from Cas operon
+ * Prediction_Cas: Subtype prediction based on Cas operon
+ * Prediction_CRISPRs: Subtype prediction of CRISPRs based on CRISPR repeat sequences
+* **cas_operons.tab:** All certain Cas operons
+ * Contig: Sequence accession
+ * Operon: Operon ID (Sequence accession @ NUMBER)
+ * Start: Start of operon
+ * End: End of operon
+ * Prediction: Subtype prediction
+ * Complete_Interference: Percent completion of the interference module(s). Can be a list if best_type is a list (Hybrid and Ambiguous)
+ * Complete_Adaptation: Percent completion of the adaptation module(s). Can be a list if best_type is a list (Hybrid and Ambiguous)
+ * Best_type: Subtype with the highest score. If the score is high then Prediction = Best_type
+ * Best_score: Score of the highest scoring subtype
+ * Genes: List of Cas genes
+ * Positions: List of Gene IDs for the genes
+ * E-values: List of E-values for the genes
+ * CoverageSeq: List of sequence coverages for the genes
+ * CoverageHMM: List of HMM coverages for the genes
+ * Strand_Interference: Strand of interference module. 1 is positive strand, -1 is negative strand, 0 is mixed, NA if no interference gene found
+ * Strand_Adaptation: Strand of adaptation module. 1 is positive strand, -1 is negative strand, 0 is mixed, NA if no adaptation gene found
+* **crisprs_all.tab:** All CRISPR arrays, also false positives
+ * Contig: Sequence accession
+ * CRISPR: CRISPR ID (minced: Sequence accession _ NUMBER; repeatBLAST: Sequence accession - NUMBER _ NUMBER)
+ * Start: Start of CRISPR
+ * End: End of CRISPR
+ * Consensus_repeat: Consensus repeat sequence
+ * N_repeats: Number of repeats
+ * Repeat_len: Length of repeat sequences
+ * Spacer_len_avg: Average spacer length
+ * Repeat_identity: Average identity of repeat sequences
+ * Spacer_identity: Average identity of spacer sequences
+ * Spacer_len_sem: Standard error of the mean of spacer lenghts
+ * Trusted: TRUE/FALSE, is the array trusted. Based on repeat/spacer identity, spacer sem, prediction probability and adjacency to a cas operon
+ * Prediction: Prediction of the associated subtype based on the repeat sequence
+ * Subtype: Subtype with highest prediction probability. Prediction = Subtype if Subtype_probability is high
+ * Subtype_probability: Probability of subtype prediction
+* **crisprs_near_cas.tab:** CRISPRs part of CRISPR-Cas loci
+ * Same columns as crisprs_all.tab
+* **crisprs_orphan.tab:** Orphan CRISPRs (those not in CRISPR_Cas.tab)
+ * Same columns as crisprs_all.tab
+* **crisprs_putative.tab:** Low quality CRISPRs. Most likely false positives
+ * Same columns as crisprs_all.tab
+* **cas_operons_orphan.tab:** Orphan Cas operons (those not in CRISPR_Cas.tab)
+ * Same columns as cas_operons.tab
+* **CRISPR_Cas_putative.tab:** Putative CRISPR_Cas loci, often lonely Cas genes next to a CRISPR array
+ * Same columns as CRISPR_Cas.tab
+* **cas_operons_putative.tab:** Putative Cas operons, mostly false positives, but also some ambiguous and partial systems
+ * Same columns as cas_operons.tab
+* **spacers/*.fa:** Fasta files with all spacer sequences
+* **hmmer.tab:** All HMM vs. ORF matches, unfiltered results
+ * Hmm: HMM name
+ * ORF: ORF name (Sequence accession _ Gene ID)
+ * tlen: ORF length
+ * qlen: HMM length
+ * Eval: E-value of alignment
+ * score: Alignment score
+ * start: ORF start
+ * end: ORF end
+ * Acc: Sequence accession
+ * Pos: Gene ID
+ * Cov_seq: Sequence coverage
+ * Cov_hmm: HMM coverage
+ * strand: Coding strand is like input (1) or reverse complement (-1)
+* **genes.tab** All genes and their positions
+ * Contig: Sequence accession
+ * Start: Start of ORF
+ * End: End of ORF
+ * Strand: Coding strand is like input (1) or reverse complement (-1)
+ * Pos: Gene ID
+* **arguments.tab:** File with arguments given to CRISPRCasTyper
+* **hmmer.log** Error messages from HMMER (only produced if any errors were encountered)
+
+##### If run with `--keep_tmp` the following is also produced
+* **prodigal.log** Log from prodigal
+* **proteins.faa** Protein sequences
+* **hmmer/*.tab** Alignment output from HMMER for each Cas HMM
+* **minced.out:** CRISPR array output from minced
+* **blast.tab:** BLAST output from repeat alignment against flanking regions of cas operons
+* **Flank....:** Fasta of flanking regions near cas operons and BLAST database of this
+
+#### Notes on output
+Files are only created if there is any data. For example, the CRISPR_Cas.tab file is only created if there are any CRISPR-Cas loci.
+
+### Plotting <a name="plot"></a>
+CRISPRCasTyper will automatically plot a map of the CRISPR-Cas loci, orphan Cas operons, and orphan CRISPR arrays.
+
+These maps can be expanded (`--expand N`) by adding unknown genes and genes with alignment scores below the thresholds. This can help in identify potentially un-annotated genes in operons. You can generate new plots without having to re-run the entire pipeline by adding `--redo_typing` to the command. This will re-use the mappings and re-type the operons and re-make the plot, based on new thresholds and plot parameters.
+
+The plot below is run with `--expand 5000`
+
+* Arrays are in alternating black/white displaying the actual number of repeats/spacers, and with their predicted subtype association based on the consensus repeat sequence.
+* The interference module is in yellow.
+* The adaptation module is in blue.
+* Cas6 is in red.
+* Accessory genes are in purple
+* Genes with alignment scores below the thresholds are lighter and with parentheses around names.
+* Unknown genes are in gray (the number matches the genes.tab file)
+
+<img src='img/plot2.svg' align="left" height="350" />
+
+## RepeatTyper - How to <a name="repeattype"></a>
+With an input of CRISPR repeats (one per line, in a simple textfile) RepeatTyper will predict the subtype, based on the kmer composition of the repeat
+
+#### Activate environment
+```sh
+conda activate cctyper
+```
+
+#### Run with a simple textfile, containing only CRISPR repeats (in capital letters), one repeat per line.
+```sh
+repeatType repeats.txt
+```
+
+#### Output <a name="repeattypeout"></a>
+The script prints:
+* Repeat sequence
+* Predicted subtype
+* Probability of prediction
+
+#### Notes on output
+* Predictions with probabilities below 0.75 are uncertain, and should be taken with a grain of salt.
+* Prior to version 1.4.0 the curated repeatTyper model was included in CCTyper
+* From version 1.4.0 and onwards updated repeatTyper models are included in CCTyper (see more information in the section below)
+* The followinig subtypes are included in the updated model as per December 2022:
+ * I-A, I-B, I-C, I-D, I-E, I-F, I-F (Transposon), I-G
+ * II-A, II-B, II-C
+ * III-A, III-B, III-C, III-D, III-E, III-F
+ * IV-A1, IV-A2, IV-A3, IV-D, IV-E
+ * V-A, V-B1, V-E, V-F1, V-F2, V-F3, V-F (the rest), V-G, V-I, V-J, V-K
+ * VI-A, VI-B1, VI-B2, VI-C, VI-D
+* This is the accuracy per subtype (on an unseen test dataset):
+* I-A 0.76
+* I-B 0.81
+* I-C 0.97
+* I-D 0.86
+* I-E 0.95
+* I-F 0.96
+* I-F_T 0.99
+* I-G 0.89
+* II-A 0.92
+* II-B 0.97
+* II-C 0.90
+* III-A 0.82
+* III-B 0.68
+* III-C 0.60
+* III-D 0.59
+* III-E 1.00
+* III-F 0.25
+* IV-A1 0.85
+* IV-A2 0.68
+* IV-A3 0.96
+* IV-D 0.85
+* IV-E 0.92
+* V-A 1.00
+* V-B1 0.90
+* V-E 0.30
+* V-F 0.87
+* V-F1 0.87
+* V-F2 0.90
+* V-F3 0.90
+* V-G 0.67
+* V-I 0.80
+* V-J 0.63
+* V-K 0.99
+* VI-A 0.96
+* VI-B1 0.96
+* VI-B2 1.00
+* VI-C 0.67
+* VI-D 0.97
+
+### Updated RepeatTyper models <a name="repeatnew"></a>
+The [CCTyper webserver](https://typer.crispr.dk) is crowdsourcing subtyped repeats and includes an updated RepeatTyper model based on a much larger set of repeats and contains additional subtypes compared to the curated RepeatTyper model.
+This updated model is automatically retrained each month and the models can be downloaded [here](http://mibi.galaxy.bio.ku.dk/russel/repeattyper/).
+
+From version 1.4.0 and onwards of CCTyper the newest repeatTyper model is included upon release of the version.
+
+Each model contains a training report (xgb_report), where you can find the training log, and in the bottom the accuracy, both overall and per subtype.
+
+#### Use new model in CRISPRCasTyper
+Save the original database files:
+```sh
+mv ${CCTYPER_DB}/type_dict.tab ${CCTYPER_DB}/type_dict_orig.tab
+mv ${CCTYPER_DB}/xgb_repeats.model ${CCTYPER_DB}/xgb_repeats_orig.model
+```
+
+Move the new model into the database folder
+```sh
+mv repeat_model/* ${CCTYPER_DB}/
+```
+
+##### CRISPRCasTyper and RepeatTyper will now use the new model for repeat prediction!
+
+## RepeatTyper - Train <a name="repeattrain"></a>
+You can train the repeat classifier with your own set of subtyped repeats. With a tab-delimeted input where 1. column contains the subtypes and 2. column contains the CRISPR repeat sequences, RepeatTrain will train a CRISPR repeat classifier that is directly usable for both RepeatTyper and CRISPRCasTyper.
+
+#### Train
+```sh
+repeatTrain typed_repeats.tab my_classifier
+```
+
+#### Use new model in RepeatTyper
+```sh
+repeatType repeats.txt --db my_classifier
+```
+
+#### Use new model in CRISPRCasTyper
+Save the original database files:
+```sh
+mv ${CCTYPER_DB}/type_dict.tab ${CCTYPER_DB}/type_dict_orig.tab
+mv ${CCTYPER_DB}/xgb_repeats.model ${CCTYPER_DB}/xgb_repeats_orig.model
+```
+
+Move the new model into the database folder
+```sh
+mv my_classifier/* ${CCTYPER_DB}/
+```
+
+##### CRISPRCasTyper and RepeatTyper will now use the new model for repeat prediction!
+
+## Troubleshoot <a name="trouble"></a>
+
+### Running out of memory
+Large metagenomic assemblies with many small contigs can exhaust the RAM on your laptop. Fortunately, as metagenomic contigs are analysed separately (when run with `--prodigal meta`) a simple solution is to split the input into smaller chunks (e.g. with [pyfasta](https://pypi.org/project/pyfasta/#command-line-interface))
+
+
+
+
+%prep
+%autosetup -n cctyper-1.8.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-cctyper -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Thu May 18 2023 Python_Bot <Python_Bot@openeuler.org> - 1.8.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..fea6786
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+70062be4fe7ed5b05ea68b2cbcfbc288 cctyper-1.8.0.tar.gz