summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-act-scio.spec519
-rw-r--r--sources1
3 files changed, 521 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..31c4626 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/act-scio-0.0.56.tar.gz
diff --git a/python-act-scio.spec b/python-act-scio.spec
new file mode 100644
index 0000000..6a23a6c
--- /dev/null
+++ b/python-act-scio.spec
@@ -0,0 +1,519 @@
+%global _empty_manifest_terminate_build 0
+Name: python-act-scio
+Version: 0.0.56
+Release: 1
+Summary: ACT SCIO
+License: ISC
+URL: https://github.com/mnemonic-no/act-scio2
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/3d/62/03ccaad5893ff99f89880644adfb67713a11cb41642c103bf5babdb33f3d/act-scio-0.0.56.tar.gz
+BuildArch: noarch
+
+
+%description
+# act-scio2
+Scio v2 is a reimplementation of [Scio](https://github.com/mnemonic-no/act-scio) in Python3.
+
+Scio uses [tika](https://tika.apache.org) to extract text from documents (PDF, HTML, DOC, etc).
+
+The result is sent to the Scio Analyzer that extracts information using a combination of NLP
+(Natural Language Processing) and pattern matching.
+
+## Changelog
+
+### 0.0.42
+
+SCIO now supports setting TLP on data upload, to annotate documents with `tlp` tag. Documents downloaded by feeds will have a default TLP white, but this can be changed in the config for feeds.
+
+## Source code
+
+The source code the workers are available on [github](https://github.com/mnemonic-no/act-scio2).
+
+## Setup
+
+To setup, first install from PyPi:
+
+```bash
+sudo pip3 install act-scio
+```
+
+You will also need to install [beanstalkd](https://beanstalkd.github.io/). On debian/ubuntu you can run:
+
+```bash
+sudo apt install beanstalkd
+```
+
+Configure beanstalk to accept larger payloads with the `-z` option. For red hat derived setups this can be configured in `/etc/sysconfig/beanstalkd`:
+
+```bash
+MAX_JOB_SIZE=-z 524288
+```
+
+You then need to install NLTK data files. A helper utility to do this is included:
+
+```bash
+scio-nltk-download
+```
+
+You will also need to create a default configuration:
+
+```bash
+scio-config user
+```
+
+## API
+
+To run the api, execute:
+
+
+```bash
+scio-api
+```
+
+This will setup the API on 127.0.0.1:3000. Use `--port <PORT> and --host <IP>` to listen on another port and/or another interface.
+
+For documentation of the API endpoint see [API.md](API.md).
+
+## Configuration
+
+You can create a default configuration using this command (should be run as the user running scio):
+
+```bash
+scio-config user
+```
+
+Common configuration can be found under ~/.config/scio/etc/scio.ini
+
+## Running Manually
+
+### Scio Tika Server
+
+The Scio Tika server reads jobs from the beanstalk tube `scio_doc` and the extracted text will be sent to the tube `scio_analyze`.
+
+The first time the server runs, it will download tika using maven. It will use a proxy if `$https_proxy` is set.
+
+```bash
+scio-tika-server
+```
+
+`scio-tika-server` uses [tika-python](https://github.com/chrismattmann/tika-python) which depends on tika-server.jar. If your server has internet access, this will downloaded automatically. If not or you need proxy to connect to the internet, follow the instructions on "Airagap Environment Setup" here: [https://github.com/chrismattmann/tika-python](https://github.com/chrismattmann/tika-python). Currently only tested with tika-server version 1.24.1.
+
+### Scio Analyze Server
+
+Scio Analyze Server reads (by default) jobs from the beanstalk tube `scio_analyze`.
+
+```bash
+scio-analyze
+```
+
+You can also read directly from stdin like this:
+
+```bash
+echo "The companies in the Bus; Finanical, Aviation and Automobile industry are large." | scio-analyze --beanstalk=
+```
+
+### Scio Submit
+
+Submit document (from file or URI) to `scio_api`.
+
+Example:
+
+```bash
+scio-submit \
+ --uri https://www2.fireeye.com/rs/848-DID-242/images/rpt-apt29-hammertoss.pdf \
+ --scio-baseuri http://localhost:3000/submit \
+ --tlp white
+```
+
+## Running as a service
+
+Systemd compatible service scripts can be found under examples/systemd.
+
+To install:
+
+```bash
+sudo cp examples/systemd/*.service /usr/lib/systemd/system
+sudo systemctl enable scio-tika-server
+sudo systemctl enable scio-analyze
+sudo service start scio-tika-server
+sudo service start scio-analyze
+```
+
+## scio-feed cron job
+
+To continously fetch new content from feeds, you can add scio-feed to cron like this (make sure the directory $HOME/logs exists):
+
+```
+# Fetch scio feeds every hour
+0 * * * * /usr/local/bin/scio-feeds >> $HOME/logs/scio-feed.log.$(date +\%s) 2>&1
+
+# Delete logs from scio-feeds older than 7 days
+0 * * * * find $HOME/logs/ -name 'scio-feed.log.*' -mmin +10080 -exec rm {} \;
+```
+
+## Local development
+
+Use pip to install in [local development mode](https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs). act-scio uses namespacing, so it is not compatible with using `setup.py install` or `setup.py develop`.
+
+In repository, run:
+
+```bash
+pip3 install --user -e .
+```
+
+
+%package -n python3-act-scio
+Summary: ACT SCIO
+Provides: python-act-scio
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-act-scio
+# act-scio2
+Scio v2 is a reimplementation of [Scio](https://github.com/mnemonic-no/act-scio) in Python3.
+
+Scio uses [tika](https://tika.apache.org) to extract text from documents (PDF, HTML, DOC, etc).
+
+The result is sent to the Scio Analyzer that extracts information using a combination of NLP
+(Natural Language Processing) and pattern matching.
+
+## Changelog
+
+### 0.0.42
+
+SCIO now supports setting TLP on data upload, to annotate documents with `tlp` tag. Documents downloaded by feeds will have a default TLP white, but this can be changed in the config for feeds.
+
+## Source code
+
+The source code the workers are available on [github](https://github.com/mnemonic-no/act-scio2).
+
+## Setup
+
+To setup, first install from PyPi:
+
+```bash
+sudo pip3 install act-scio
+```
+
+You will also need to install [beanstalkd](https://beanstalkd.github.io/). On debian/ubuntu you can run:
+
+```bash
+sudo apt install beanstalkd
+```
+
+Configure beanstalk to accept larger payloads with the `-z` option. For red hat derived setups this can be configured in `/etc/sysconfig/beanstalkd`:
+
+```bash
+MAX_JOB_SIZE=-z 524288
+```
+
+You then need to install NLTK data files. A helper utility to do this is included:
+
+```bash
+scio-nltk-download
+```
+
+You will also need to create a default configuration:
+
+```bash
+scio-config user
+```
+
+## API
+
+To run the api, execute:
+
+
+```bash
+scio-api
+```
+
+This will setup the API on 127.0.0.1:3000. Use `--port <PORT> and --host <IP>` to listen on another port and/or another interface.
+
+For documentation of the API endpoint see [API.md](API.md).
+
+## Configuration
+
+You can create a default configuration using this command (should be run as the user running scio):
+
+```bash
+scio-config user
+```
+
+Common configuration can be found under ~/.config/scio/etc/scio.ini
+
+## Running Manually
+
+### Scio Tika Server
+
+The Scio Tika server reads jobs from the beanstalk tube `scio_doc` and the extracted text will be sent to the tube `scio_analyze`.
+
+The first time the server runs, it will download tika using maven. It will use a proxy if `$https_proxy` is set.
+
+```bash
+scio-tika-server
+```
+
+`scio-tika-server` uses [tika-python](https://github.com/chrismattmann/tika-python) which depends on tika-server.jar. If your server has internet access, this will downloaded automatically. If not or you need proxy to connect to the internet, follow the instructions on "Airagap Environment Setup" here: [https://github.com/chrismattmann/tika-python](https://github.com/chrismattmann/tika-python). Currently only tested with tika-server version 1.24.1.
+
+### Scio Analyze Server
+
+Scio Analyze Server reads (by default) jobs from the beanstalk tube `scio_analyze`.
+
+```bash
+scio-analyze
+```
+
+You can also read directly from stdin like this:
+
+```bash
+echo "The companies in the Bus; Finanical, Aviation and Automobile industry are large." | scio-analyze --beanstalk=
+```
+
+### Scio Submit
+
+Submit document (from file or URI) to `scio_api`.
+
+Example:
+
+```bash
+scio-submit \
+ --uri https://www2.fireeye.com/rs/848-DID-242/images/rpt-apt29-hammertoss.pdf \
+ --scio-baseuri http://localhost:3000/submit \
+ --tlp white
+```
+
+## Running as a service
+
+Systemd compatible service scripts can be found under examples/systemd.
+
+To install:
+
+```bash
+sudo cp examples/systemd/*.service /usr/lib/systemd/system
+sudo systemctl enable scio-tika-server
+sudo systemctl enable scio-analyze
+sudo service start scio-tika-server
+sudo service start scio-analyze
+```
+
+## scio-feed cron job
+
+To continously fetch new content from feeds, you can add scio-feed to cron like this (make sure the directory $HOME/logs exists):
+
+```
+# Fetch scio feeds every hour
+0 * * * * /usr/local/bin/scio-feeds >> $HOME/logs/scio-feed.log.$(date +\%s) 2>&1
+
+# Delete logs from scio-feeds older than 7 days
+0 * * * * find $HOME/logs/ -name 'scio-feed.log.*' -mmin +10080 -exec rm {} \;
+```
+
+## Local development
+
+Use pip to install in [local development mode](https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs). act-scio uses namespacing, so it is not compatible with using `setup.py install` or `setup.py develop`.
+
+In repository, run:
+
+```bash
+pip3 install --user -e .
+```
+
+
+%package help
+Summary: Development documents and examples for act-scio
+Provides: python3-act-scio-doc
+%description help
+# act-scio2
+Scio v2 is a reimplementation of [Scio](https://github.com/mnemonic-no/act-scio) in Python3.
+
+Scio uses [tika](https://tika.apache.org) to extract text from documents (PDF, HTML, DOC, etc).
+
+The result is sent to the Scio Analyzer that extracts information using a combination of NLP
+(Natural Language Processing) and pattern matching.
+
+## Changelog
+
+### 0.0.42
+
+SCIO now supports setting TLP on data upload, to annotate documents with `tlp` tag. Documents downloaded by feeds will have a default TLP white, but this can be changed in the config for feeds.
+
+## Source code
+
+The source code the workers are available on [github](https://github.com/mnemonic-no/act-scio2).
+
+## Setup
+
+To setup, first install from PyPi:
+
+```bash
+sudo pip3 install act-scio
+```
+
+You will also need to install [beanstalkd](https://beanstalkd.github.io/). On debian/ubuntu you can run:
+
+```bash
+sudo apt install beanstalkd
+```
+
+Configure beanstalk to accept larger payloads with the `-z` option. For red hat derived setups this can be configured in `/etc/sysconfig/beanstalkd`:
+
+```bash
+MAX_JOB_SIZE=-z 524288
+```
+
+You then need to install NLTK data files. A helper utility to do this is included:
+
+```bash
+scio-nltk-download
+```
+
+You will also need to create a default configuration:
+
+```bash
+scio-config user
+```
+
+## API
+
+To run the api, execute:
+
+
+```bash
+scio-api
+```
+
+This will setup the API on 127.0.0.1:3000. Use `--port <PORT> and --host <IP>` to listen on another port and/or another interface.
+
+For documentation of the API endpoint see [API.md](API.md).
+
+## Configuration
+
+You can create a default configuration using this command (should be run as the user running scio):
+
+```bash
+scio-config user
+```
+
+Common configuration can be found under ~/.config/scio/etc/scio.ini
+
+## Running Manually
+
+### Scio Tika Server
+
+The Scio Tika server reads jobs from the beanstalk tube `scio_doc` and the extracted text will be sent to the tube `scio_analyze`.
+
+The first time the server runs, it will download tika using maven. It will use a proxy if `$https_proxy` is set.
+
+```bash
+scio-tika-server
+```
+
+`scio-tika-server` uses [tika-python](https://github.com/chrismattmann/tika-python) which depends on tika-server.jar. If your server has internet access, this will downloaded automatically. If not or you need proxy to connect to the internet, follow the instructions on "Airagap Environment Setup" here: [https://github.com/chrismattmann/tika-python](https://github.com/chrismattmann/tika-python). Currently only tested with tika-server version 1.24.1.
+
+### Scio Analyze Server
+
+Scio Analyze Server reads (by default) jobs from the beanstalk tube `scio_analyze`.
+
+```bash
+scio-analyze
+```
+
+You can also read directly from stdin like this:
+
+```bash
+echo "The companies in the Bus; Finanical, Aviation and Automobile industry are large." | scio-analyze --beanstalk=
+```
+
+### Scio Submit
+
+Submit document (from file or URI) to `scio_api`.
+
+Example:
+
+```bash
+scio-submit \
+ --uri https://www2.fireeye.com/rs/848-DID-242/images/rpt-apt29-hammertoss.pdf \
+ --scio-baseuri http://localhost:3000/submit \
+ --tlp white
+```
+
+## Running as a service
+
+Systemd compatible service scripts can be found under examples/systemd.
+
+To install:
+
+```bash
+sudo cp examples/systemd/*.service /usr/lib/systemd/system
+sudo systemctl enable scio-tika-server
+sudo systemctl enable scio-analyze
+sudo service start scio-tika-server
+sudo service start scio-analyze
+```
+
+## scio-feed cron job
+
+To continously fetch new content from feeds, you can add scio-feed to cron like this (make sure the directory $HOME/logs exists):
+
+```
+# Fetch scio feeds every hour
+0 * * * * /usr/local/bin/scio-feeds >> $HOME/logs/scio-feed.log.$(date +\%s) 2>&1
+
+# Delete logs from scio-feeds older than 7 days
+0 * * * * find $HOME/logs/ -name 'scio-feed.log.*' -mmin +10080 -exec rm {} \;
+```
+
+## Local development
+
+Use pip to install in [local development mode](https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs). act-scio uses namespacing, so it is not compatible with using `setup.py install` or `setup.py develop`.
+
+In repository, run:
+
+```bash
+pip3 install --user -e .
+```
+
+
+%prep
+%autosetup -n act-scio-0.0.56
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-act-scio -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 17 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.56-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..6097cb1
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+ddc477224902b84d3dff9a29b9cf662d act-scio-0.0.56.tar.gz