diff options
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-act-scio.spec | 519 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 521 insertions, 0 deletions
@@ -0,0 +1 @@ +/act-scio-0.0.56.tar.gz diff --git a/python-act-scio.spec b/python-act-scio.spec new file mode 100644 index 0000000..6a23a6c --- /dev/null +++ b/python-act-scio.spec @@ -0,0 +1,519 @@ +%global _empty_manifest_terminate_build 0 +Name: python-act-scio +Version: 0.0.56 +Release: 1 +Summary: ACT SCIO +License: ISC +URL: https://github.com/mnemonic-no/act-scio2 +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/3d/62/03ccaad5893ff99f89880644adfb67713a11cb41642c103bf5babdb33f3d/act-scio-0.0.56.tar.gz +BuildArch: noarch + + +%description +# act-scio2 +Scio v2 is a reimplementation of [Scio](https://github.com/mnemonic-no/act-scio) in Python3. + +Scio uses [tika](https://tika.apache.org) to extract text from documents (PDF, HTML, DOC, etc). + +The result is sent to the Scio Analyzer that extracts information using a combination of NLP +(Natural Language Processing) and pattern matching. + +## Changelog + +### 0.0.42 + +SCIO now supports setting TLP on data upload, to annotate documents with `tlp` tag. Documents downloaded by feeds will have a default TLP white, but this can be changed in the config for feeds. + +## Source code + +The source code the workers are available on [github](https://github.com/mnemonic-no/act-scio2). + +## Setup + +To setup, first install from PyPi: + +```bash +sudo pip3 install act-scio +``` + +You will also need to install [beanstalkd](https://beanstalkd.github.io/). On debian/ubuntu you can run: + +```bash +sudo apt install beanstalkd +``` + +Configure beanstalk to accept larger payloads with the `-z` option. For red hat derived setups this can be configured in `/etc/sysconfig/beanstalkd`: + +```bash +MAX_JOB_SIZE=-z 524288 +``` + +You then need to install NLTK data files. A helper utility to do this is included: + +```bash +scio-nltk-download +``` + +You will also need to create a default configuration: + +```bash +scio-config user +``` + +## API + +To run the api, execute: + + +```bash +scio-api +``` + +This will setup the API on 127.0.0.1:3000. Use `--port <PORT> and --host <IP>` to listen on another port and/or another interface. + +For documentation of the API endpoint see [API.md](API.md). + +## Configuration + +You can create a default configuration using this command (should be run as the user running scio): + +```bash +scio-config user +``` + +Common configuration can be found under ~/.config/scio/etc/scio.ini + +## Running Manually + +### Scio Tika Server + +The Scio Tika server reads jobs from the beanstalk tube `scio_doc` and the extracted text will be sent to the tube `scio_analyze`. + +The first time the server runs, it will download tika using maven. It will use a proxy if `$https_proxy` is set. + +```bash +scio-tika-server +``` + +`scio-tika-server` uses [tika-python](https://github.com/chrismattmann/tika-python) which depends on tika-server.jar. If your server has internet access, this will downloaded automatically. If not or you need proxy to connect to the internet, follow the instructions on "Airagap Environment Setup" here: [https://github.com/chrismattmann/tika-python](https://github.com/chrismattmann/tika-python). Currently only tested with tika-server version 1.24.1. + +### Scio Analyze Server + +Scio Analyze Server reads (by default) jobs from the beanstalk tube `scio_analyze`. + +```bash +scio-analyze +``` + +You can also read directly from stdin like this: + +```bash +echo "The companies in the Bus; Finanical, Aviation and Automobile industry are large." | scio-analyze --beanstalk= +``` + +### Scio Submit + +Submit document (from file or URI) to `scio_api`. + +Example: + +```bash +scio-submit \ + --uri https://www2.fireeye.com/rs/848-DID-242/images/rpt-apt29-hammertoss.pdf \ + --scio-baseuri http://localhost:3000/submit \ + --tlp white +``` + +## Running as a service + +Systemd compatible service scripts can be found under examples/systemd. + +To install: + +```bash +sudo cp examples/systemd/*.service /usr/lib/systemd/system +sudo systemctl enable scio-tika-server +sudo systemctl enable scio-analyze +sudo service start scio-tika-server +sudo service start scio-analyze +``` + +## scio-feed cron job + +To continously fetch new content from feeds, you can add scio-feed to cron like this (make sure the directory $HOME/logs exists): + +``` +# Fetch scio feeds every hour +0 * * * * /usr/local/bin/scio-feeds >> $HOME/logs/scio-feed.log.$(date +\%s) 2>&1 + +# Delete logs from scio-feeds older than 7 days +0 * * * * find $HOME/logs/ -name 'scio-feed.log.*' -mmin +10080 -exec rm {} \; +``` + +## Local development + +Use pip to install in [local development mode](https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs). act-scio uses namespacing, so it is not compatible with using `setup.py install` or `setup.py develop`. + +In repository, run: + +```bash +pip3 install --user -e . +``` + + +%package -n python3-act-scio +Summary: ACT SCIO +Provides: python-act-scio +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-act-scio +# act-scio2 +Scio v2 is a reimplementation of [Scio](https://github.com/mnemonic-no/act-scio) in Python3. + +Scio uses [tika](https://tika.apache.org) to extract text from documents (PDF, HTML, DOC, etc). + +The result is sent to the Scio Analyzer that extracts information using a combination of NLP +(Natural Language Processing) and pattern matching. + +## Changelog + +### 0.0.42 + +SCIO now supports setting TLP on data upload, to annotate documents with `tlp` tag. Documents downloaded by feeds will have a default TLP white, but this can be changed in the config for feeds. + +## Source code + +The source code the workers are available on [github](https://github.com/mnemonic-no/act-scio2). + +## Setup + +To setup, first install from PyPi: + +```bash +sudo pip3 install act-scio +``` + +You will also need to install [beanstalkd](https://beanstalkd.github.io/). On debian/ubuntu you can run: + +```bash +sudo apt install beanstalkd +``` + +Configure beanstalk to accept larger payloads with the `-z` option. For red hat derived setups this can be configured in `/etc/sysconfig/beanstalkd`: + +```bash +MAX_JOB_SIZE=-z 524288 +``` + +You then need to install NLTK data files. A helper utility to do this is included: + +```bash +scio-nltk-download +``` + +You will also need to create a default configuration: + +```bash +scio-config user +``` + +## API + +To run the api, execute: + + +```bash +scio-api +``` + +This will setup the API on 127.0.0.1:3000. Use `--port <PORT> and --host <IP>` to listen on another port and/or another interface. + +For documentation of the API endpoint see [API.md](API.md). + +## Configuration + +You can create a default configuration using this command (should be run as the user running scio): + +```bash +scio-config user +``` + +Common configuration can be found under ~/.config/scio/etc/scio.ini + +## Running Manually + +### Scio Tika Server + +The Scio Tika server reads jobs from the beanstalk tube `scio_doc` and the extracted text will be sent to the tube `scio_analyze`. + +The first time the server runs, it will download tika using maven. It will use a proxy if `$https_proxy` is set. + +```bash +scio-tika-server +``` + +`scio-tika-server` uses [tika-python](https://github.com/chrismattmann/tika-python) which depends on tika-server.jar. If your server has internet access, this will downloaded automatically. If not or you need proxy to connect to the internet, follow the instructions on "Airagap Environment Setup" here: [https://github.com/chrismattmann/tika-python](https://github.com/chrismattmann/tika-python). Currently only tested with tika-server version 1.24.1. + +### Scio Analyze Server + +Scio Analyze Server reads (by default) jobs from the beanstalk tube `scio_analyze`. + +```bash +scio-analyze +``` + +You can also read directly from stdin like this: + +```bash +echo "The companies in the Bus; Finanical, Aviation and Automobile industry are large." | scio-analyze --beanstalk= +``` + +### Scio Submit + +Submit document (from file or URI) to `scio_api`. + +Example: + +```bash +scio-submit \ + --uri https://www2.fireeye.com/rs/848-DID-242/images/rpt-apt29-hammertoss.pdf \ + --scio-baseuri http://localhost:3000/submit \ + --tlp white +``` + +## Running as a service + +Systemd compatible service scripts can be found under examples/systemd. + +To install: + +```bash +sudo cp examples/systemd/*.service /usr/lib/systemd/system +sudo systemctl enable scio-tika-server +sudo systemctl enable scio-analyze +sudo service start scio-tika-server +sudo service start scio-analyze +``` + +## scio-feed cron job + +To continously fetch new content from feeds, you can add scio-feed to cron like this (make sure the directory $HOME/logs exists): + +``` +# Fetch scio feeds every hour +0 * * * * /usr/local/bin/scio-feeds >> $HOME/logs/scio-feed.log.$(date +\%s) 2>&1 + +# Delete logs from scio-feeds older than 7 days +0 * * * * find $HOME/logs/ -name 'scio-feed.log.*' -mmin +10080 -exec rm {} \; +``` + +## Local development + +Use pip to install in [local development mode](https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs). act-scio uses namespacing, so it is not compatible with using `setup.py install` or `setup.py develop`. + +In repository, run: + +```bash +pip3 install --user -e . +``` + + +%package help +Summary: Development documents and examples for act-scio +Provides: python3-act-scio-doc +%description help +# act-scio2 +Scio v2 is a reimplementation of [Scio](https://github.com/mnemonic-no/act-scio) in Python3. + +Scio uses [tika](https://tika.apache.org) to extract text from documents (PDF, HTML, DOC, etc). + +The result is sent to the Scio Analyzer that extracts information using a combination of NLP +(Natural Language Processing) and pattern matching. + +## Changelog + +### 0.0.42 + +SCIO now supports setting TLP on data upload, to annotate documents with `tlp` tag. Documents downloaded by feeds will have a default TLP white, but this can be changed in the config for feeds. + +## Source code + +The source code the workers are available on [github](https://github.com/mnemonic-no/act-scio2). + +## Setup + +To setup, first install from PyPi: + +```bash +sudo pip3 install act-scio +``` + +You will also need to install [beanstalkd](https://beanstalkd.github.io/). On debian/ubuntu you can run: + +```bash +sudo apt install beanstalkd +``` + +Configure beanstalk to accept larger payloads with the `-z` option. For red hat derived setups this can be configured in `/etc/sysconfig/beanstalkd`: + +```bash +MAX_JOB_SIZE=-z 524288 +``` + +You then need to install NLTK data files. A helper utility to do this is included: + +```bash +scio-nltk-download +``` + +You will also need to create a default configuration: + +```bash +scio-config user +``` + +## API + +To run the api, execute: + + +```bash +scio-api +``` + +This will setup the API on 127.0.0.1:3000. Use `--port <PORT> and --host <IP>` to listen on another port and/or another interface. + +For documentation of the API endpoint see [API.md](API.md). + +## Configuration + +You can create a default configuration using this command (should be run as the user running scio): + +```bash +scio-config user +``` + +Common configuration can be found under ~/.config/scio/etc/scio.ini + +## Running Manually + +### Scio Tika Server + +The Scio Tika server reads jobs from the beanstalk tube `scio_doc` and the extracted text will be sent to the tube `scio_analyze`. + +The first time the server runs, it will download tika using maven. It will use a proxy if `$https_proxy` is set. + +```bash +scio-tika-server +``` + +`scio-tika-server` uses [tika-python](https://github.com/chrismattmann/tika-python) which depends on tika-server.jar. If your server has internet access, this will downloaded automatically. If not or you need proxy to connect to the internet, follow the instructions on "Airagap Environment Setup" here: [https://github.com/chrismattmann/tika-python](https://github.com/chrismattmann/tika-python). Currently only tested with tika-server version 1.24.1. + +### Scio Analyze Server + +Scio Analyze Server reads (by default) jobs from the beanstalk tube `scio_analyze`. + +```bash +scio-analyze +``` + +You can also read directly from stdin like this: + +```bash +echo "The companies in the Bus; Finanical, Aviation and Automobile industry are large." | scio-analyze --beanstalk= +``` + +### Scio Submit + +Submit document (from file or URI) to `scio_api`. + +Example: + +```bash +scio-submit \ + --uri https://www2.fireeye.com/rs/848-DID-242/images/rpt-apt29-hammertoss.pdf \ + --scio-baseuri http://localhost:3000/submit \ + --tlp white +``` + +## Running as a service + +Systemd compatible service scripts can be found under examples/systemd. + +To install: + +```bash +sudo cp examples/systemd/*.service /usr/lib/systemd/system +sudo systemctl enable scio-tika-server +sudo systemctl enable scio-analyze +sudo service start scio-tika-server +sudo service start scio-analyze +``` + +## scio-feed cron job + +To continously fetch new content from feeds, you can add scio-feed to cron like this (make sure the directory $HOME/logs exists): + +``` +# Fetch scio feeds every hour +0 * * * * /usr/local/bin/scio-feeds >> $HOME/logs/scio-feed.log.$(date +\%s) 2>&1 + +# Delete logs from scio-feeds older than 7 days +0 * * * * find $HOME/logs/ -name 'scio-feed.log.*' -mmin +10080 -exec rm {} \; +``` + +## Local development + +Use pip to install in [local development mode](https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs). act-scio uses namespacing, so it is not compatible with using `setup.py install` or `setup.py develop`. + +In repository, run: + +```bash +pip3 install --user -e . +``` + + +%prep +%autosetup -n act-scio-0.0.56 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-act-scio -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed May 17 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.56-1 +- Package Spec generated @@ -0,0 +1 @@ +ddc477224902b84d3dff9a29b9cf662d act-scio-0.0.56.tar.gz |