%global _empty_manifest_terminate_build 0 Name: python-bgen Version: 1.5.4 Release: 1 Summary: Package for loading data from bgen files License: MIT URL: https://github.com/jeremymcrae/bgen Source0: https://mirrors.nju.edu.cn/pypi/web/packages/af/8c/03f4e9e6372e87eaad219c190acd240f0051135b1a3263d04339eba1b2b4/bgen-1.5.4.tar.gz Requires: python3-numpy %description ### Another bgen reader ![bgen](https://github.com/jeremymcrae/bgen/workflows/bgen/badge.svg) This is a package for reading [bgen files](https://www.well.ox.ac.uk/~gav/bgen_format). This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster? This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files). #### Install `pip install bgen` #### Usage ```python from bgen import BgenReader bfile = BgenReader(BGEN_PATH) rsids = bfile.rsids() # select a variant by indexing var = bfile[1000] # pull out genotype probabilities probs = var.probabilities # returns 2D numpy array dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant # iterate through every variant in the file with BgenReader(BGEN_PATH, delay_parsing=True) as bfile: for var in bfile: dosage = var.minor_allele_dosage # get all variants in a genomic region variants = bfile.fetch('21', 10000, 5000000) # or for writing bgen files import numpy as np from bgen import BgenWriter geno = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]).astype(np.float64) with BgenWriter(BGEN_PATH, n_samples=3) as bfile: bfile.add_variant(varid='var1', rsid='rs1', chrom='chr1', pos=1, alleles=['A', 'G'], genotypes=geno) ``` #### API documentation ``` py class BgenReader(path, sample_path='', delay_parsing=False) # opens a bgen file. If a bgenix index exists for the file, the index file # will be opened automatically for quicker access of specific variants. Arguments: path: path to bgen file sample_path: optional path to sample file. Samples will be given integer IDs if sample file is not given and sample IDs not found in the bgen file delay_parsing: True/False option to allow for not loading all variants into memory when the BgenFile is opened. This can save time when iterating across variants in the file Attributes: samples: list of sample IDs header: BgenHeader with info about the bgen version and compression. Methods: slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000] iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x) fetch(chrom, start=None, stop=None): get all variants within a genomic region drop_variants(list[int]): drops variants by index from being used in analyses with_rsid(rsid): returns BgenVar with given position at_position(pos): returns BgenVar with given rsid varids(): returns list of varids for variants in the bgen file. rsids(): returns list of rsids for variants in the bgen file. chroms(): returns list of chromosomes for variants in the bgen file. positions(): returns list of positions for variants in the bgen file. class BgenVar(handle, offset, layout, compression, n_samples): # Note: this isn't called directly, but instead returned from BgenFile methods Attributes: varid: ID for variant rsid: reference SNP ID for variant chrom: chromosome variant is on pos: nucleotide position variant is at alleles: list of alleles for variant is_phased: True/False for whether variant has phased genotype data ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples minor_allele: the least common allele (for biallelic variants) minor_allele_dosage: 1D numpy array of minor allele dosages for each sample alt_dosage: 1D numpy array of alt allele dosages for each sample probabilitiies: 2D numpy array of genotype probabilities, one sample per row BgenVars can be pickled e.g. pickle.dumps(var) ``` %package -n python3-bgen Summary: Package for loading data from bgen files Provides: python-bgen BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip BuildRequires: python3-cffi BuildRequires: gcc BuildRequires: gdb %description -n python3-bgen ### Another bgen reader ![bgen](https://github.com/jeremymcrae/bgen/workflows/bgen/badge.svg) This is a package for reading [bgen files](https://www.well.ox.ac.uk/~gav/bgen_format). This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster? This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files). #### Install `pip install bgen` #### Usage ```python from bgen import BgenReader bfile = BgenReader(BGEN_PATH) rsids = bfile.rsids() # select a variant by indexing var = bfile[1000] # pull out genotype probabilities probs = var.probabilities # returns 2D numpy array dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant # iterate through every variant in the file with BgenReader(BGEN_PATH, delay_parsing=True) as bfile: for var in bfile: dosage = var.minor_allele_dosage # get all variants in a genomic region variants = bfile.fetch('21', 10000, 5000000) # or for writing bgen files import numpy as np from bgen import BgenWriter geno = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]).astype(np.float64) with BgenWriter(BGEN_PATH, n_samples=3) as bfile: bfile.add_variant(varid='var1', rsid='rs1', chrom='chr1', pos=1, alleles=['A', 'G'], genotypes=geno) ``` #### API documentation ``` py class BgenReader(path, sample_path='', delay_parsing=False) # opens a bgen file. If a bgenix index exists for the file, the index file # will be opened automatically for quicker access of specific variants. Arguments: path: path to bgen file sample_path: optional path to sample file. Samples will be given integer IDs if sample file is not given and sample IDs not found in the bgen file delay_parsing: True/False option to allow for not loading all variants into memory when the BgenFile is opened. This can save time when iterating across variants in the file Attributes: samples: list of sample IDs header: BgenHeader with info about the bgen version and compression. Methods: slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000] iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x) fetch(chrom, start=None, stop=None): get all variants within a genomic region drop_variants(list[int]): drops variants by index from being used in analyses with_rsid(rsid): returns BgenVar with given position at_position(pos): returns BgenVar with given rsid varids(): returns list of varids for variants in the bgen file. rsids(): returns list of rsids for variants in the bgen file. chroms(): returns list of chromosomes for variants in the bgen file. positions(): returns list of positions for variants in the bgen file. class BgenVar(handle, offset, layout, compression, n_samples): # Note: this isn't called directly, but instead returned from BgenFile methods Attributes: varid: ID for variant rsid: reference SNP ID for variant chrom: chromosome variant is on pos: nucleotide position variant is at alleles: list of alleles for variant is_phased: True/False for whether variant has phased genotype data ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples minor_allele: the least common allele (for biallelic variants) minor_allele_dosage: 1D numpy array of minor allele dosages for each sample alt_dosage: 1D numpy array of alt allele dosages for each sample probabilitiies: 2D numpy array of genotype probabilities, one sample per row BgenVars can be pickled e.g. pickle.dumps(var) ``` %package help Summary: Development documents and examples for bgen Provides: python3-bgen-doc %description help ### Another bgen reader ![bgen](https://github.com/jeremymcrae/bgen/workflows/bgen/badge.svg) This is a package for reading [bgen files](https://www.well.ox.ac.uk/~gav/bgen_format). This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster? This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files). #### Install `pip install bgen` #### Usage ```python from bgen import BgenReader bfile = BgenReader(BGEN_PATH) rsids = bfile.rsids() # select a variant by indexing var = bfile[1000] # pull out genotype probabilities probs = var.probabilities # returns 2D numpy array dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant # iterate through every variant in the file with BgenReader(BGEN_PATH, delay_parsing=True) as bfile: for var in bfile: dosage = var.minor_allele_dosage # get all variants in a genomic region variants = bfile.fetch('21', 10000, 5000000) # or for writing bgen files import numpy as np from bgen import BgenWriter geno = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]).astype(np.float64) with BgenWriter(BGEN_PATH, n_samples=3) as bfile: bfile.add_variant(varid='var1', rsid='rs1', chrom='chr1', pos=1, alleles=['A', 'G'], genotypes=geno) ``` #### API documentation ``` py class BgenReader(path, sample_path='', delay_parsing=False) # opens a bgen file. If a bgenix index exists for the file, the index file # will be opened automatically for quicker access of specific variants. Arguments: path: path to bgen file sample_path: optional path to sample file. Samples will be given integer IDs if sample file is not given and sample IDs not found in the bgen file delay_parsing: True/False option to allow for not loading all variants into memory when the BgenFile is opened. This can save time when iterating across variants in the file Attributes: samples: list of sample IDs header: BgenHeader with info about the bgen version and compression. Methods: slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000] iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x) fetch(chrom, start=None, stop=None): get all variants within a genomic region drop_variants(list[int]): drops variants by index from being used in analyses with_rsid(rsid): returns BgenVar with given position at_position(pos): returns BgenVar with given rsid varids(): returns list of varids for variants in the bgen file. rsids(): returns list of rsids for variants in the bgen file. chroms(): returns list of chromosomes for variants in the bgen file. positions(): returns list of positions for variants in the bgen file. class BgenVar(handle, offset, layout, compression, n_samples): # Note: this isn't called directly, but instead returned from BgenFile methods Attributes: varid: ID for variant rsid: reference SNP ID for variant chrom: chromosome variant is on pos: nucleotide position variant is at alleles: list of alleles for variant is_phased: True/False for whether variant has phased genotype data ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples minor_allele: the least common allele (for biallelic variants) minor_allele_dosage: 1D numpy array of minor allele dosages for each sample alt_dosage: 1D numpy array of alt allele dosages for each sample probabilitiies: 2D numpy array of genotype probabilities, one sample per row BgenVars can be pickled e.g. pickle.dumps(var) ``` %prep %autosetup -n bgen-1.5.4 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-bgen -f filelist.lst %dir %{python3_sitearch}/* %files help -f doclist.lst %{_docdir}/* %changelog * Wed May 10 2023 Python_Bot - 1.5.4-1 - Package Spec generated