summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-06-20 04:07:05 +0000
committerCoprDistGit <infra@openeuler.org>2023-06-20 04:07:05 +0000
commit8d07cbeceabbc3b2da1dc1250e9dcceef01e42d5 (patch)
tree2d9cfe9771bc98429965e50dd11fde6eeb89f4bc
parent3fce96434c9e24c06c693cc261cc465a1ddfbc10 (diff)
automatic import of python-pyClickModelsopeneuler20.03
-rw-r--r--.gitignore1
-rw-r--r--python-pyclickmodels.spec384
-rw-r--r--sources1
3 files changed, 386 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..df0980f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/pyClickModels-0.0.2.tar.gz
diff --git a/python-pyclickmodels.spec b/python-pyclickmodels.spec
new file mode 100644
index 0000000..4a70926
--- /dev/null
+++ b/python-pyclickmodels.spec
@@ -0,0 +1,384 @@
+%global _empty_manifest_terminate_build 0
+Name: python-pyClickModels
+Version: 0.0.2
+Release: 1
+Summary: ClickModels for Search Engines Implemented on top of Cython.
+License: MIT
+URL: https://pypi.org/project/pyClickModels/
+Source0: https://mirrors.aliyun.com/pypi/web/packages/3f/5f/5229d10f6eec879ad957594e179cc1e320353e4870f77e20987e2cc34117/pyClickModels-0.0.2.tar.gz
+BuildArch: noarch
+
+Requires: python3-cython
+Requires: python3-numpy
+Requires: python3-ujson
+
+%description
+# pyClickModels [![Build Status](https://travis-ci.org/WillianFuks/pyClickModels.svg?branch=master)](https://travis-ci.org/WillianFuks/pyClickModels) [![Coverage Status](https://coveralls.io/repos/github/WillianFuks/pyClickModels/badge.svg?branch=master)](https://coveralls.io/github/WillianFuks/pyClickModels?branch=master) [![PyPI version](https://badge.fury.io/py/pyClickModels.svg)](https://badge.fury.io/py/pyClickModels) [![Pyversions](https://img.shields.io/pypi/pyversions/pyClickModels.svg)](https://pypi.python.org/pypi/pyClickModels) [![GitHub license](https://img.shields.io/github/license/WillianFuks/pyClickModels.svg)](https://github.com/WillianFuks/pyClickModels/blob/master/LICENSE)
+
+A Cython implementation of [ClickModels](https://github.com/varepsilon/clickmodels) that uses Probabilistic Graphical Models to infer user behavior when interacting with Search Page Results (Ranking).
+
+## How It Works
+
+ClickModels uses the concept of [Probabilistic Graphical Models](https://en.wikipedia.org/wiki/Graphical_model) to model components that describe the interactions between users and a list of items ranked by a set of retrieval rules.
+
+These models tend to be useful when it's desired to understand whether a given document is a good match for a given search query or not which is also known in literature as *Judgments* grades. This is possible through evaluating past observed clicks and the positions at which the document appeared on the results pages for each query.
+
+There are several [proposed approaches](https://clickmodels.weebly.com/uploads/5/2/2/5/52257029/mc2015-clickmodels.pdf) to handle this problem. This repository implements a Dynamic Bayesian Network, similar to [previous works](https://github.com/varepsilon/clickmodels) also done in Python:
+
+![dbn](notebooks/dbn.png)
+
+Main differences are:
+
+1. **Implemented on top of Cython**: solutions already public available rely on CPython integrated with PyPy for additional speed ups. Unfortunatelly this still might not be good enough in terms of performance. To work on that, this implementation relies 100% on C/C++ for further optimization in speed. Despite not having an official benchmark, it's expected an improvement of **15x** ~ **18x** on top of CPython (same data lead to an increase of ~3x when using PyPy).
+2. **Memory Friendly**: expects input data to follow a JSON format with all sessions of clickstream already expressed for each row. This saves memory and allows for the library to process bigger amounts of data.
+3. **Purchase variable**: as businesses such as eCommerces can greately benefit from better understanding their search engine, this repository added the variable Purchase to further describe customers behaviors.
+
+The file [notebooks/DBN.ipynb](notebooks/DBN.ipynb) has a complete description of how the model has been implemented along with all the mathematics involved.
+
+
+
+## Instalation
+
+As this project relies on binaries compiled by Cython, currently only Linux (manylinux) platform is supported. It can be installed with:
+
+ pip install pyClickModels
+
+## Getting Started
+
+### Input Data
+
+pyClickModels expects input data to be stored in a set of compressed `gz` files located on the same folder. They all should start with the string "judgments", for instance, `judgments0.gz`.
+Each file should contain line separated JSONs. The following is an example of each JSON line:
+
+```json
+{
+ "search_keys": {
+ "search_term": "blue shoes",
+ "region": "south",
+ "favorite_brand": "super brand",
+ "user_size": "L",
+ "avg_ticket": 10
+ },
+ "judgment_keys": [
+ {
+ "session": [
+ {"click": 0, "purchase": 0, "doc": "doc0"}
+ {"click": 1, "purchase": 0, "doc": "doc1"}
+ {"click": 1, "purchase": 1, "doc": "doc2"}
+ ]
+ },
+ {
+ "session": [
+ {"click": 1, "purchase": 0, "doc": "doc0"}
+ {"click": 0, "purchase": 0, "doc": "doc1"}
+ {"click": 0, "purchase": 0, "doc": "doc2"}
+ ]
+ }
+ ]
+}
+```
+
+The key `search_keys` sets the context for the search. In the above example, a given customer (or cluster of customers with the same context) searched for `blue shoes`. Their region is `south` (it could be any chosen value), favorite brand is `super brand` and so on.
+
+These keys sets the context for which the search happened. When pyClickModels runs its optimization, it will consider all the context at once. This means that the Judgments obtained are also on the whole context setting.
+
+If no context is desired, just use `{"search_keys": {"search_term": "user search"}}`.
+
+There's no required schema here which means the library loops through all keys available in `search_keys` and builds the optimization process considering the whole context as a single query.
+
+As for the `judgment_keys`, this is a list of sessions. The key `session` is mandatory. Each session contains the clickstream of users (if the variable purchase is not required set it to 0).
+
+For running DBN from pyClickModels, here's a simple example:
+
+```python
+from pyClickModels.DBN import DBN
+
+model = DBN()
+model.fit(input_folder="/tmp/clicks_data/", iters=10)
+model.export_judgments("/tmp/output.gz")
+```
+
+Output file will contain a NEWLINE JSON separated file with the judgments for each query and each document observed for that query, i.e.:
+
+```json
+{"search_term:blue shoes|region:south|brand:super brand": {"doc0": 0.2, "doc1": 0.3, "doc2": 0.4}}
+{"search_term:query|region:north|brand:other_brand": {"doc0": 0.0, "doc1": 0.0, "doc2": 0.1}}
+```
+
+Judgments here varies between 0 and 1. Some libraries requires it to range between integers 0 and 4. Choose a proper transformation in this case that better suits your data.
+
+## Warnings
+
+**This library is still alpha!** Use it with caution. It's been fully unittested but still parts of it uses pure C whose exceptions might not have been fully considered yet. It's recommended to, before using this library in production evironments, to fully test it with different datasets and sizes to evaluate how it performs.
+
+## Contributing
+
+Contributions are very welcome! Also, if you find bugs, please report them :).
+
+
+
+
+%package -n python3-pyClickModels
+Summary: ClickModels for Search Engines Implemented on top of Cython.
+Provides: python-pyClickModels
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-pyClickModels
+# pyClickModels [![Build Status](https://travis-ci.org/WillianFuks/pyClickModels.svg?branch=master)](https://travis-ci.org/WillianFuks/pyClickModels) [![Coverage Status](https://coveralls.io/repos/github/WillianFuks/pyClickModels/badge.svg?branch=master)](https://coveralls.io/github/WillianFuks/pyClickModels?branch=master) [![PyPI version](https://badge.fury.io/py/pyClickModels.svg)](https://badge.fury.io/py/pyClickModels) [![Pyversions](https://img.shields.io/pypi/pyversions/pyClickModels.svg)](https://pypi.python.org/pypi/pyClickModels) [![GitHub license](https://img.shields.io/github/license/WillianFuks/pyClickModels.svg)](https://github.com/WillianFuks/pyClickModels/blob/master/LICENSE)
+
+A Cython implementation of [ClickModels](https://github.com/varepsilon/clickmodels) that uses Probabilistic Graphical Models to infer user behavior when interacting with Search Page Results (Ranking).
+
+## How It Works
+
+ClickModels uses the concept of [Probabilistic Graphical Models](https://en.wikipedia.org/wiki/Graphical_model) to model components that describe the interactions between users and a list of items ranked by a set of retrieval rules.
+
+These models tend to be useful when it's desired to understand whether a given document is a good match for a given search query or not which is also known in literature as *Judgments* grades. This is possible through evaluating past observed clicks and the positions at which the document appeared on the results pages for each query.
+
+There are several [proposed approaches](https://clickmodels.weebly.com/uploads/5/2/2/5/52257029/mc2015-clickmodels.pdf) to handle this problem. This repository implements a Dynamic Bayesian Network, similar to [previous works](https://github.com/varepsilon/clickmodels) also done in Python:
+
+![dbn](notebooks/dbn.png)
+
+Main differences are:
+
+1. **Implemented on top of Cython**: solutions already public available rely on CPython integrated with PyPy for additional speed ups. Unfortunatelly this still might not be good enough in terms of performance. To work on that, this implementation relies 100% on C/C++ for further optimization in speed. Despite not having an official benchmark, it's expected an improvement of **15x** ~ **18x** on top of CPython (same data lead to an increase of ~3x when using PyPy).
+2. **Memory Friendly**: expects input data to follow a JSON format with all sessions of clickstream already expressed for each row. This saves memory and allows for the library to process bigger amounts of data.
+3. **Purchase variable**: as businesses such as eCommerces can greately benefit from better understanding their search engine, this repository added the variable Purchase to further describe customers behaviors.
+
+The file [notebooks/DBN.ipynb](notebooks/DBN.ipynb) has a complete description of how the model has been implemented along with all the mathematics involved.
+
+
+
+## Instalation
+
+As this project relies on binaries compiled by Cython, currently only Linux (manylinux) platform is supported. It can be installed with:
+
+ pip install pyClickModels
+
+## Getting Started
+
+### Input Data
+
+pyClickModels expects input data to be stored in a set of compressed `gz` files located on the same folder. They all should start with the string "judgments", for instance, `judgments0.gz`.
+Each file should contain line separated JSONs. The following is an example of each JSON line:
+
+```json
+{
+ "search_keys": {
+ "search_term": "blue shoes",
+ "region": "south",
+ "favorite_brand": "super brand",
+ "user_size": "L",
+ "avg_ticket": 10
+ },
+ "judgment_keys": [
+ {
+ "session": [
+ {"click": 0, "purchase": 0, "doc": "doc0"}
+ {"click": 1, "purchase": 0, "doc": "doc1"}
+ {"click": 1, "purchase": 1, "doc": "doc2"}
+ ]
+ },
+ {
+ "session": [
+ {"click": 1, "purchase": 0, "doc": "doc0"}
+ {"click": 0, "purchase": 0, "doc": "doc1"}
+ {"click": 0, "purchase": 0, "doc": "doc2"}
+ ]
+ }
+ ]
+}
+```
+
+The key `search_keys` sets the context for the search. In the above example, a given customer (or cluster of customers with the same context) searched for `blue shoes`. Their region is `south` (it could be any chosen value), favorite brand is `super brand` and so on.
+
+These keys sets the context for which the search happened. When pyClickModels runs its optimization, it will consider all the context at once. This means that the Judgments obtained are also on the whole context setting.
+
+If no context is desired, just use `{"search_keys": {"search_term": "user search"}}`.
+
+There's no required schema here which means the library loops through all keys available in `search_keys` and builds the optimization process considering the whole context as a single query.
+
+As for the `judgment_keys`, this is a list of sessions. The key `session` is mandatory. Each session contains the clickstream of users (if the variable purchase is not required set it to 0).
+
+For running DBN from pyClickModels, here's a simple example:
+
+```python
+from pyClickModels.DBN import DBN
+
+model = DBN()
+model.fit(input_folder="/tmp/clicks_data/", iters=10)
+model.export_judgments("/tmp/output.gz")
+```
+
+Output file will contain a NEWLINE JSON separated file with the judgments for each query and each document observed for that query, i.e.:
+
+```json
+{"search_term:blue shoes|region:south|brand:super brand": {"doc0": 0.2, "doc1": 0.3, "doc2": 0.4}}
+{"search_term:query|region:north|brand:other_brand": {"doc0": 0.0, "doc1": 0.0, "doc2": 0.1}}
+```
+
+Judgments here varies between 0 and 1. Some libraries requires it to range between integers 0 and 4. Choose a proper transformation in this case that better suits your data.
+
+## Warnings
+
+**This library is still alpha!** Use it with caution. It's been fully unittested but still parts of it uses pure C whose exceptions might not have been fully considered yet. It's recommended to, before using this library in production evironments, to fully test it with different datasets and sizes to evaluate how it performs.
+
+## Contributing
+
+Contributions are very welcome! Also, if you find bugs, please report them :).
+
+
+
+
+%package help
+Summary: Development documents and examples for pyClickModels
+Provides: python3-pyClickModels-doc
+%description help
+# pyClickModels [![Build Status](https://travis-ci.org/WillianFuks/pyClickModels.svg?branch=master)](https://travis-ci.org/WillianFuks/pyClickModels) [![Coverage Status](https://coveralls.io/repos/github/WillianFuks/pyClickModels/badge.svg?branch=master)](https://coveralls.io/github/WillianFuks/pyClickModels?branch=master) [![PyPI version](https://badge.fury.io/py/pyClickModels.svg)](https://badge.fury.io/py/pyClickModels) [![Pyversions](https://img.shields.io/pypi/pyversions/pyClickModels.svg)](https://pypi.python.org/pypi/pyClickModels) [![GitHub license](https://img.shields.io/github/license/WillianFuks/pyClickModels.svg)](https://github.com/WillianFuks/pyClickModels/blob/master/LICENSE)
+
+A Cython implementation of [ClickModels](https://github.com/varepsilon/clickmodels) that uses Probabilistic Graphical Models to infer user behavior when interacting with Search Page Results (Ranking).
+
+## How It Works
+
+ClickModels uses the concept of [Probabilistic Graphical Models](https://en.wikipedia.org/wiki/Graphical_model) to model components that describe the interactions between users and a list of items ranked by a set of retrieval rules.
+
+These models tend to be useful when it's desired to understand whether a given document is a good match for a given search query or not which is also known in literature as *Judgments* grades. This is possible through evaluating past observed clicks and the positions at which the document appeared on the results pages for each query.
+
+There are several [proposed approaches](https://clickmodels.weebly.com/uploads/5/2/2/5/52257029/mc2015-clickmodels.pdf) to handle this problem. This repository implements a Dynamic Bayesian Network, similar to [previous works](https://github.com/varepsilon/clickmodels) also done in Python:
+
+![dbn](notebooks/dbn.png)
+
+Main differences are:
+
+1. **Implemented on top of Cython**: solutions already public available rely on CPython integrated with PyPy for additional speed ups. Unfortunatelly this still might not be good enough in terms of performance. To work on that, this implementation relies 100% on C/C++ for further optimization in speed. Despite not having an official benchmark, it's expected an improvement of **15x** ~ **18x** on top of CPython (same data lead to an increase of ~3x when using PyPy).
+2. **Memory Friendly**: expects input data to follow a JSON format with all sessions of clickstream already expressed for each row. This saves memory and allows for the library to process bigger amounts of data.
+3. **Purchase variable**: as businesses such as eCommerces can greately benefit from better understanding their search engine, this repository added the variable Purchase to further describe customers behaviors.
+
+The file [notebooks/DBN.ipynb](notebooks/DBN.ipynb) has a complete description of how the model has been implemented along with all the mathematics involved.
+
+
+
+## Instalation
+
+As this project relies on binaries compiled by Cython, currently only Linux (manylinux) platform is supported. It can be installed with:
+
+ pip install pyClickModels
+
+## Getting Started
+
+### Input Data
+
+pyClickModels expects input data to be stored in a set of compressed `gz` files located on the same folder. They all should start with the string "judgments", for instance, `judgments0.gz`.
+Each file should contain line separated JSONs. The following is an example of each JSON line:
+
+```json
+{
+ "search_keys": {
+ "search_term": "blue shoes",
+ "region": "south",
+ "favorite_brand": "super brand",
+ "user_size": "L",
+ "avg_ticket": 10
+ },
+ "judgment_keys": [
+ {
+ "session": [
+ {"click": 0, "purchase": 0, "doc": "doc0"}
+ {"click": 1, "purchase": 0, "doc": "doc1"}
+ {"click": 1, "purchase": 1, "doc": "doc2"}
+ ]
+ },
+ {
+ "session": [
+ {"click": 1, "purchase": 0, "doc": "doc0"}
+ {"click": 0, "purchase": 0, "doc": "doc1"}
+ {"click": 0, "purchase": 0, "doc": "doc2"}
+ ]
+ }
+ ]
+}
+```
+
+The key `search_keys` sets the context for the search. In the above example, a given customer (or cluster of customers with the same context) searched for `blue shoes`. Their region is `south` (it could be any chosen value), favorite brand is `super brand` and so on.
+
+These keys sets the context for which the search happened. When pyClickModels runs its optimization, it will consider all the context at once. This means that the Judgments obtained are also on the whole context setting.
+
+If no context is desired, just use `{"search_keys": {"search_term": "user search"}}`.
+
+There's no required schema here which means the library loops through all keys available in `search_keys` and builds the optimization process considering the whole context as a single query.
+
+As for the `judgment_keys`, this is a list of sessions. The key `session` is mandatory. Each session contains the clickstream of users (if the variable purchase is not required set it to 0).
+
+For running DBN from pyClickModels, here's a simple example:
+
+```python
+from pyClickModels.DBN import DBN
+
+model = DBN()
+model.fit(input_folder="/tmp/clicks_data/", iters=10)
+model.export_judgments("/tmp/output.gz")
+```
+
+Output file will contain a NEWLINE JSON separated file with the judgments for each query and each document observed for that query, i.e.:
+
+```json
+{"search_term:blue shoes|region:south|brand:super brand": {"doc0": 0.2, "doc1": 0.3, "doc2": 0.4}}
+{"search_term:query|region:north|brand:other_brand": {"doc0": 0.0, "doc1": 0.0, "doc2": 0.1}}
+```
+
+Judgments here varies between 0 and 1. Some libraries requires it to range between integers 0 and 4. Choose a proper transformation in this case that better suits your data.
+
+## Warnings
+
+**This library is still alpha!** Use it with caution. It's been fully unittested but still parts of it uses pure C whose exceptions might not have been fully considered yet. It's recommended to, before using this library in production evironments, to fully test it with different datasets and sizes to evaluate how it performs.
+
+## Contributing
+
+Contributions are very welcome! Also, if you find bugs, please report them :).
+
+
+
+
+%prep
+%autosetup -n pyClickModels-0.0.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-pyClickModels -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.2-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..639a7b5
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+7ec05f733a1d10ca6310cee442c1e4b2 pyClickModels-0.0.2.tar.gz