diff options
Diffstat (limited to 'python-identify.spec')
-rw-r--r-- | python-identify.spec | 367 |
1 files changed, 367 insertions, 0 deletions
diff --git a/python-identify.spec b/python-identify.spec new file mode 100644 index 0000000..babfd1f --- /dev/null +++ b/python-identify.spec @@ -0,0 +1,367 @@ +%global _empty_manifest_terminate_build 0 +Name: python-identify +Version: 2.5.19 +Release: 1 +Summary: File identification library for Python +License: MIT +URL: https://github.com/pre-commit/identify +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/2d/39/743c442eecc9405d98f90a3b4308740b5c09765052e526392c307e6054a3/identify-2.5.19.tar.gz +BuildArch: noarch + +Requires: python3-ukkonen + +%description +File identification library for Python. +Given a file (or some information about a file), return a set of standardized +tags identifying what the file is. +## Installation +```bash +pip install identify +``` +## Usage +### With a file on disk +If you have an actual file on disk, you can get the most information possible +(a superset of all other methods): +```python +>>> from identify import identify +>>> identify.tags_from_path('/path/to/file.py') +{'file', 'text', 'python', 'non-executable'} +>>> identify.tags_from_path('/path/to/file-with-shebang') +{'file', 'text', 'shell', 'bash', 'executable'} +>>> identify.tags_from_path('/bin/bash') +{'file', 'binary', 'executable'} +>>> identify.tags_from_path('/path/to/directory') +{'directory'} +>>> identify.tags_from_path('/path/to/symlink') +{'symlink'} +``` +When using a file on disk, the checks performed are: +* File type (file, symlink, directory, socket) +* Mode (is it executable?) +* File name (mostly based on extension) +* If executable, the shebang is read and the interpreter interpreted +### If you only have the filename +```python +>>> identify.tags_from_filename('file.py') +{'text', 'python'} +``` +### If you only have the interpreter +```python +>>> identify.tags_from_interpreter('python3.5') +{'python', 'python3'} +>>> identify.tags_from_interpreter('bash') +{'shell', 'bash'} +>>> identify.tags_from_interpreter('some-unrecognized-thing') +set() +``` +### As a cli +``` +$ identify-cli --help +usage: identify-cli [-h] [--filename-only] path +positional arguments: + path +optional arguments: + -h, --help show this help message and exit + --filename-only +``` +```console +$ identify-cli setup.py; echo $? +["file", "non-executable", "python", "text"] +0 +$ identify-cli setup.py --filename-only; echo $? +["python", "text"] +0 +$ identify-cli wat.wat; echo $? +wat.wat does not exist. +1 +$ identify-cli wat.wat --filename-only; echo $? +1 +``` +### Identifying LICENSE files +`identify` also has an api for determining what type of license is contained +in a file. This routine is roughly based on the approaches used by +[licensee] (the ruby gem that github uses to figure out the license for a +repo). +The approach that `identify` uses is as follows: +1. Strip the copyright line +2. Normalize all whitespace +3. Return any exact matches +4. Return the closest by edit distance (where edit distance < 5%) +To use the api, install via `pip install identify[license]` +```pycon +>>> from identify import identify +>>> identify.license_id('LICENSE') +'MIT' +``` +The return value of the `license_id` function is an [SPDX] id. Currently +licenses are sourced from [choosealicense.com]. +[licensee]: https://github.com/benbalter/licensee +[SPDX]: https://spdx.org/licenses/ +[choosealicense.com]: https://github.com/github/choosealicense.com +## How it works +A call to `tags_from_path` does this: +1. What is the type: file, symlink, directory? If it's not file, stop here. +2. Is it executable? Add the appropriate tag. +3. Do we recognize the file extension? If so, add the appropriate tags, stop + here. These tags would include binary/text. +4. Peek at the first X bytes of the file. Use these to determine whether it is + binary or text, add the appropriate tag. +5. If identified as text above, try to read and interpret the shebang, and add + appropriate tags. +By design, this means we don't need to partially read files where we recognize +the file extension. + +%package -n python3-identify +Summary: File identification library for Python +Provides: python-identify +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-identify +File identification library for Python. +Given a file (or some information about a file), return a set of standardized +tags identifying what the file is. +## Installation +```bash +pip install identify +``` +## Usage +### With a file on disk +If you have an actual file on disk, you can get the most information possible +(a superset of all other methods): +```python +>>> from identify import identify +>>> identify.tags_from_path('/path/to/file.py') +{'file', 'text', 'python', 'non-executable'} +>>> identify.tags_from_path('/path/to/file-with-shebang') +{'file', 'text', 'shell', 'bash', 'executable'} +>>> identify.tags_from_path('/bin/bash') +{'file', 'binary', 'executable'} +>>> identify.tags_from_path('/path/to/directory') +{'directory'} +>>> identify.tags_from_path('/path/to/symlink') +{'symlink'} +``` +When using a file on disk, the checks performed are: +* File type (file, symlink, directory, socket) +* Mode (is it executable?) +* File name (mostly based on extension) +* If executable, the shebang is read and the interpreter interpreted +### If you only have the filename +```python +>>> identify.tags_from_filename('file.py') +{'text', 'python'} +``` +### If you only have the interpreter +```python +>>> identify.tags_from_interpreter('python3.5') +{'python', 'python3'} +>>> identify.tags_from_interpreter('bash') +{'shell', 'bash'} +>>> identify.tags_from_interpreter('some-unrecognized-thing') +set() +``` +### As a cli +``` +$ identify-cli --help +usage: identify-cli [-h] [--filename-only] path +positional arguments: + path +optional arguments: + -h, --help show this help message and exit + --filename-only +``` +```console +$ identify-cli setup.py; echo $? +["file", "non-executable", "python", "text"] +0 +$ identify-cli setup.py --filename-only; echo $? +["python", "text"] +0 +$ identify-cli wat.wat; echo $? +wat.wat does not exist. +1 +$ identify-cli wat.wat --filename-only; echo $? +1 +``` +### Identifying LICENSE files +`identify` also has an api for determining what type of license is contained +in a file. This routine is roughly based on the approaches used by +[licensee] (the ruby gem that github uses to figure out the license for a +repo). +The approach that `identify` uses is as follows: +1. Strip the copyright line +2. Normalize all whitespace +3. Return any exact matches +4. Return the closest by edit distance (where edit distance < 5%) +To use the api, install via `pip install identify[license]` +```pycon +>>> from identify import identify +>>> identify.license_id('LICENSE') +'MIT' +``` +The return value of the `license_id` function is an [SPDX] id. Currently +licenses are sourced from [choosealicense.com]. +[licensee]: https://github.com/benbalter/licensee +[SPDX]: https://spdx.org/licenses/ +[choosealicense.com]: https://github.com/github/choosealicense.com +## How it works +A call to `tags_from_path` does this: +1. What is the type: file, symlink, directory? If it's not file, stop here. +2. Is it executable? Add the appropriate tag. +3. Do we recognize the file extension? If so, add the appropriate tags, stop + here. These tags would include binary/text. +4. Peek at the first X bytes of the file. Use these to determine whether it is + binary or text, add the appropriate tag. +5. If identified as text above, try to read and interpret the shebang, and add + appropriate tags. +By design, this means we don't need to partially read files where we recognize +the file extension. + +%package help +Summary: Development documents and examples for identify +Provides: python3-identify-doc +%description help +File identification library for Python. +Given a file (or some information about a file), return a set of standardized +tags identifying what the file is. +## Installation +```bash +pip install identify +``` +## Usage +### With a file on disk +If you have an actual file on disk, you can get the most information possible +(a superset of all other methods): +```python +>>> from identify import identify +>>> identify.tags_from_path('/path/to/file.py') +{'file', 'text', 'python', 'non-executable'} +>>> identify.tags_from_path('/path/to/file-with-shebang') +{'file', 'text', 'shell', 'bash', 'executable'} +>>> identify.tags_from_path('/bin/bash') +{'file', 'binary', 'executable'} +>>> identify.tags_from_path('/path/to/directory') +{'directory'} +>>> identify.tags_from_path('/path/to/symlink') +{'symlink'} +``` +When using a file on disk, the checks performed are: +* File type (file, symlink, directory, socket) +* Mode (is it executable?) +* File name (mostly based on extension) +* If executable, the shebang is read and the interpreter interpreted +### If you only have the filename +```python +>>> identify.tags_from_filename('file.py') +{'text', 'python'} +``` +### If you only have the interpreter +```python +>>> identify.tags_from_interpreter('python3.5') +{'python', 'python3'} +>>> identify.tags_from_interpreter('bash') +{'shell', 'bash'} +>>> identify.tags_from_interpreter('some-unrecognized-thing') +set() +``` +### As a cli +``` +$ identify-cli --help +usage: identify-cli [-h] [--filename-only] path +positional arguments: + path +optional arguments: + -h, --help show this help message and exit + --filename-only +``` +```console +$ identify-cli setup.py; echo $? +["file", "non-executable", "python", "text"] +0 +$ identify-cli setup.py --filename-only; echo $? +["python", "text"] +0 +$ identify-cli wat.wat; echo $? +wat.wat does not exist. +1 +$ identify-cli wat.wat --filename-only; echo $? +1 +``` +### Identifying LICENSE files +`identify` also has an api for determining what type of license is contained +in a file. This routine is roughly based on the approaches used by +[licensee] (the ruby gem that github uses to figure out the license for a +repo). +The approach that `identify` uses is as follows: +1. Strip the copyright line +2. Normalize all whitespace +3. Return any exact matches +4. Return the closest by edit distance (where edit distance < 5%) +To use the api, install via `pip install identify[license]` +```pycon +>>> from identify import identify +>>> identify.license_id('LICENSE') +'MIT' +``` +The return value of the `license_id` function is an [SPDX] id. Currently +licenses are sourced from [choosealicense.com]. +[licensee]: https://github.com/benbalter/licensee +[SPDX]: https://spdx.org/licenses/ +[choosealicense.com]: https://github.com/github/choosealicense.com +## How it works +A call to `tags_from_path` does this: +1. What is the type: file, symlink, directory? If it's not file, stop here. +2. Is it executable? Add the appropriate tag. +3. Do we recognize the file extension? If so, add the appropriate tags, stop + here. These tags would include binary/text. +4. Peek at the first X bytes of the file. Use these to determine whether it is + binary or text, add the appropriate tag. +5. If identified as text above, try to read and interpret the shebang, and add + appropriate tags. +By design, this means we don't need to partially read files where we recognize +the file extension. + +%prep +%autosetup -n identify-2.5.19 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-identify -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Thu Mar 09 2023 Python_Bot <Python_Bot@openeuler.org> - 2.5.19-1 +- Package Spec generated |