automatic import of python-detextopeneuler20.03

author: CoprDistGit <infra@openeuler.org> 2023-05-05 10:49:27 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-05 10:49:27 +0000
commit: 190a691c88e39f560bc99671b1ef7be181183d18 (patch)
tree: 9f91f52507d4cbdf29f96976c2e7c33e98b3695b
parent: 85a12b4a85d8a454379fb37fdfaa475350058875 (diff)
3 files changed, 539 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..9156905 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/detext-3.2.0.tar.gz
diff --git a/python-detext.spec b/python-detext.spec
new file mode 100644
index 0000000..5eaec48
--- /dev/null
+++ b/python-detext.spec
@@ -0,0 +1,537 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-detext
+Version:	3.2.0
+Release:	1
+Summary:	please add a summary manually as the author left a blank one
+License:	BSD-2-CLAUSE
+URL:		https://pypi.org/project/detext/
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/be/b5/7830e3f839fe0de22356c5e59982dc03706588ccd63e32162a903252ff1b/detext-3.2.0.tar.gz
+BuildArch:	noarch
+
+
+%description
+**DeText** is a <b>_De_</b>ep **_Text_** understanding framework for NLP related ranking, classification, 
+and language generation tasks.  It leverages semantic matching using deep neural networks to 
+understand member intents in search and recommender systems. 
+As a general NLP framework, DeText can be applied to many tasks, including search & recommendation ranking, 
+multi-class classification and query understanding tasks.
+More details can be found in the [LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2020/open-sourcing-detext).
+## Highlight
+* Natural language understanding powered by state-of-the-art deep neural networks
+  * automatic feature extraction with deep models
+  * end-to-end training
+  * interaction modeling between ranking sources and targets
+* A general framework with great flexibility
+  * customizable model architectures
+  * multiple text encoder support
+  * multiple data input types support
+  * various optimization choices
+  * standard training flow control
+* Easy-to-use
+  * Configuration based modeling (e.g., all configurations through command line)
+## General Model Architecture
+DeText supports a general model architecture that contains following components:
+* **Word embedding layer**.  It converts the sequence of words into a d by n matrix.
+* **CNN/BERT/LSTM for text encoding layer**.  It takes into the word embedding matrix as input, and maps the text data into a fixed length embedding.
+* **Interaction layer**.  It generates deep features based on the text embeddings.  Options include concatenation, cosine similarity, etc.
+* **Wide & Deep Feature Processing**.  We combine the traditional features with the interaction features (deep features) in a wide & deep fashion.
+* **MLP layer**. The MLP layer is to combine wide features and deep features. 
+All parameters are jointly updated to optimize the training objective.
+![](detext_model_architecture.png) 
+### Model Configurables
+DeText offers great flexibility for clients to build customized networks for their own use cases:
+* **LTR/classification layer**: in-house LTR loss implementation, or tf-ranking LTR loss, multi-class classification support.
+* **MLP layer**: customizable number of layers and number of dimensions.
+* **Interaction layer**: support Cosine Similarity, Hadamard Product, and Concatenation.
+* **Text embedding layer**: support CNN, BERT, LSTM with customized parameters on filters, layers, dimensions, etc.
+* **Continuous feature normalization**: element-wise rescaling, value normalization.
+* **Categorical feature processing**: modeled as entity embedding.
+All these can be customized via hyper-parameters in the DeText template. Note that tf-ranking is 
+supported in the DeText framework, i.e., users can choose the LTR loss and metrics defined in DeText.
+## User Guide
+### Dev environment set up
+1. Create your virtualenv (Python version >= 3.7)
+    ```shell script
+    VENV_DIR = <your venv dir>
+    python3 -m venv $VENV_DIR  # Make sure your python version >= 3.7
+    source $VENV_DIR/bin/activate  # Enter the virtual environment
+    ```
+1. Upgrade pip and setuptools version
+    ```shell script
+    pip3 install -U pip
+    pip3 install -U setuptools
+    ```
+1. Run setup for DeText:
+    ```shell script
+    pip install . -e
+    ```
+1. Verify environment setup through pytest. If all tests pass, the environment is correctly set up
+    ```shell script
+    pytest 
+    ```
+1. Refer to the training manual ([TRAINING.md](user_guide/TRAINING.md)) to find information about customizing the model:
+    * Training data format and preparation
+    * Key parameters to customize and train DeText models
+    * Detailed information about all DeText training parameters for full customization
+1. Train a model using DeText (e.g., [run_detext.sh](test/resources/run_detext.sh))
+### Tutorial
+If you would like a simple try out of the library, you can refer to the following notebooks for tutorial
+* [text_classification_demo.ipynb](user_guide/notebooks/text_classification_demo.ipynb)
+    This notebook shows how to use DeText to train a multi-class text classification model on a public query intent 
+    classification dataset. Detailed instructions on data preparation, model training, model inference are included.
+* [autocompletion.ipynb](user_guide/notebooks/autocompletion.ipynb)
+    This notebook shows how to use DeText to train a text ranking model on a public query auto completion dataset.
+    Detailed steps on data preparation, model training, model inference examples are included.
+## **Citation**
+Please cite DeText in your publications if it helps your research:
+```
+@manual{guo-liu20,
+  author    = {Weiwei Guo and
+               Xiaowei Liu and
+               Sida Wang and 
+               Huiji Gao and
+               Bo Long},
+  title     = {DeText: A Deep NLP Framework for Intelligent Text Understanding},
+  url       = {https://engineering.linkedin.com/blog/2020/open-sourcing-detext},
+  year      = {2020}
+}
+@inproceedings{guo-gao19,
+  author    = {Weiwei Guo and
+               Huiji Gao and
+               Jun Shi and 
+               Bo Long},
+  title     = {Deep Natural Language Processing for Search Systems},
+  booktitle = {ACM SIGIR 2019},
+  year      = {2019}
+}
+@inproceedings{guo-gao19,
+  author    = {Weiwei Guo and
+               Huiji Gao and
+               Jun Shi and 
+               Bo Long and 
+               Liang Zhang and
+               Bee-Chung Chen and
+               Deepak Agarwal},
+  title     = {Deep Natural Language Processing for Search and Recommender Systems},
+  booktitle = {ACM SIGKDD 2019},
+  year      = {2019}
+}
+@inproceedings{guo-liu20,
+  author    = {Weiwei Guo and
+               Xiaowei Liu and
+               Sida Wang and 
+               Huiji Gao and
+               Ananth Sankar and 
+               Zimeng Yang and 
+               Qi Guo and 
+               Liang Zhang and
+               Bo Long and 
+               Bee-Chung Chen and 
+               Deepak Agarwal},
+  title     = {DeText: A Deep Text Ranking Framework with BERT},
+  booktitle = {ACM CIKM 2020},
+  year      = {2020}
+}
+@inproceedings{jia-long20,
+  author    = {Jun Jia and
+               Bo Long and
+               Huiji Gao and 
+               Weiwei Guo and 
+               Jun Shi and
+               Xiaowei Liu and
+               Mingzhou Zhou and
+               Zhoutong Fu and
+               Sida Wang and
+               Sandeep Kumar Jha},
+  title     = {Deep Learning for Search and Recommender Systems in Practice},
+  booktitle = {ACM SIGKDD 2020},
+  year      = {2020}
+}
+@inproceedings{wang-guo20,
+  author    = {Sida Wang and
+               Weiwei Guo and
+               Huiji Gao and
+               Bo Long},
+  title     = {Efficient Neural Query Auto Completion},
+  booktitle = {ACM CIKM 2020},
+  year      = {2020}
+}
+@inproceedings{liu-guo20,
+  author    = {Xiaowei Liu and
+               Weiwei Guo and
+               Huiji Gao and
+               Bo Long},
+  title     = {Deep Search Query Intent Understanding},
+  booktitle = {arXiv:2008.06759},
+  year      = {2020}
+}
+```
+
+%package -n python3-detext
+Summary:	please add a summary manually as the author left a blank one
+Provides:	python-detext
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-detext
+**DeText** is a <b>_De_</b>ep **_Text_** understanding framework for NLP related ranking, classification, 
+and language generation tasks.  It leverages semantic matching using deep neural networks to 
+understand member intents in search and recommender systems. 
+As a general NLP framework, DeText can be applied to many tasks, including search & recommendation ranking, 
+multi-class classification and query understanding tasks.
+More details can be found in the [LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2020/open-sourcing-detext).
+## Highlight
+* Natural language understanding powered by state-of-the-art deep neural networks
+  * automatic feature extraction with deep models
+  * end-to-end training
+  * interaction modeling between ranking sources and targets
+* A general framework with great flexibility
+  * customizable model architectures
+  * multiple text encoder support
+  * multiple data input types support
+  * various optimization choices
+  * standard training flow control
+* Easy-to-use
+  * Configuration based modeling (e.g., all configurations through command line)
+## General Model Architecture
+DeText supports a general model architecture that contains following components:
+* **Word embedding layer**.  It converts the sequence of words into a d by n matrix.
+* **CNN/BERT/LSTM for text encoding layer**.  It takes into the word embedding matrix as input, and maps the text data into a fixed length embedding.
+* **Interaction layer**.  It generates deep features based on the text embeddings.  Options include concatenation, cosine similarity, etc.
+* **Wide & Deep Feature Processing**.  We combine the traditional features with the interaction features (deep features) in a wide & deep fashion.
+* **MLP layer**. The MLP layer is to combine wide features and deep features. 
+All parameters are jointly updated to optimize the training objective.
+![](detext_model_architecture.png) 
+### Model Configurables
+DeText offers great flexibility for clients to build customized networks for their own use cases:
+* **LTR/classification layer**: in-house LTR loss implementation, or tf-ranking LTR loss, multi-class classification support.
+* **MLP layer**: customizable number of layers and number of dimensions.
+* **Interaction layer**: support Cosine Similarity, Hadamard Product, and Concatenation.
+* **Text embedding layer**: support CNN, BERT, LSTM with customized parameters on filters, layers, dimensions, etc.
+* **Continuous feature normalization**: element-wise rescaling, value normalization.
+* **Categorical feature processing**: modeled as entity embedding.
+All these can be customized via hyper-parameters in the DeText template. Note that tf-ranking is 
+supported in the DeText framework, i.e., users can choose the LTR loss and metrics defined in DeText.
+## User Guide
+### Dev environment set up
+1. Create your virtualenv (Python version >= 3.7)
+    ```shell script
+    VENV_DIR = <your venv dir>
+    python3 -m venv $VENV_DIR  # Make sure your python version >= 3.7
+    source $VENV_DIR/bin/activate  # Enter the virtual environment
+    ```
+1. Upgrade pip and setuptools version
+    ```shell script
+    pip3 install -U pip
+    pip3 install -U setuptools
+    ```
+1. Run setup for DeText:
+    ```shell script
+    pip install . -e
+    ```
+1. Verify environment setup through pytest. If all tests pass, the environment is correctly set up
+    ```shell script
+    pytest 
+    ```
+1. Refer to the training manual ([TRAINING.md](user_guide/TRAINING.md)) to find information about customizing the model:
+    * Training data format and preparation
+    * Key parameters to customize and train DeText models
+    * Detailed information about all DeText training parameters for full customization
+1. Train a model using DeText (e.g., [run_detext.sh](test/resources/run_detext.sh))
+### Tutorial
+If you would like a simple try out of the library, you can refer to the following notebooks for tutorial
+* [text_classification_demo.ipynb](user_guide/notebooks/text_classification_demo.ipynb)
+    This notebook shows how to use DeText to train a multi-class text classification model on a public query intent 
+    classification dataset. Detailed instructions on data preparation, model training, model inference are included.
+* [autocompletion.ipynb](user_guide/notebooks/autocompletion.ipynb)
+    This notebook shows how to use DeText to train a text ranking model on a public query auto completion dataset.
+    Detailed steps on data preparation, model training, model inference examples are included.
+## **Citation**
+Please cite DeText in your publications if it helps your research:
+```
+@manual{guo-liu20,
+  author    = {Weiwei Guo and
+               Xiaowei Liu and
+               Sida Wang and 
+               Huiji Gao and
+               Bo Long},
+  title     = {DeText: A Deep NLP Framework for Intelligent Text Understanding},
+  url       = {https://engineering.linkedin.com/blog/2020/open-sourcing-detext},
+  year      = {2020}
+}
+@inproceedings{guo-gao19,
+  author    = {Weiwei Guo and
+               Huiji Gao and
+               Jun Shi and 
+               Bo Long},
+  title     = {Deep Natural Language Processing for Search Systems},
+  booktitle = {ACM SIGIR 2019},
+  year      = {2019}
+}
+@inproceedings{guo-gao19,
+  author    = {Weiwei Guo and
+               Huiji Gao and
+               Jun Shi and 
+               Bo Long and 
+               Liang Zhang and
+               Bee-Chung Chen and
+               Deepak Agarwal},
+  title     = {Deep Natural Language Processing for Search and Recommender Systems},
+  booktitle = {ACM SIGKDD 2019},
+  year      = {2019}
+}
+@inproceedings{guo-liu20,
+  author    = {Weiwei Guo and
+               Xiaowei Liu and
+               Sida Wang and 
+               Huiji Gao and
+               Ananth Sankar and 
+               Zimeng Yang and 
+               Qi Guo and 
+               Liang Zhang and
+               Bo Long and 
+               Bee-Chung Chen and 
+               Deepak Agarwal},
+  title     = {DeText: A Deep Text Ranking Framework with BERT},
+  booktitle = {ACM CIKM 2020},
+  year      = {2020}
+}
+@inproceedings{jia-long20,
+  author    = {Jun Jia and
+               Bo Long and
+               Huiji Gao and 
+               Weiwei Guo and 
+               Jun Shi and
+               Xiaowei Liu and
+               Mingzhou Zhou and
+               Zhoutong Fu and
+               Sida Wang and
+               Sandeep Kumar Jha},
+  title     = {Deep Learning for Search and Recommender Systems in Practice},
+  booktitle = {ACM SIGKDD 2020},
+  year      = {2020}
+}
+@inproceedings{wang-guo20,
+  author    = {Sida Wang and
+               Weiwei Guo and
+               Huiji Gao and
+               Bo Long},
+  title     = {Efficient Neural Query Auto Completion},
+  booktitle = {ACM CIKM 2020},
+  year      = {2020}
+}
+@inproceedings{liu-guo20,
+  author    = {Xiaowei Liu and
+               Weiwei Guo and
+               Huiji Gao and
+               Bo Long},
+  title     = {Deep Search Query Intent Understanding},
+  booktitle = {arXiv:2008.06759},
+  year      = {2020}
+}
+```
+
+%package help
+Summary:	Development documents and examples for detext
+Provides:	python3-detext-doc
+%description help
+**DeText** is a <b>_De_</b>ep **_Text_** understanding framework for NLP related ranking, classification, 
+and language generation tasks.  It leverages semantic matching using deep neural networks to 
+understand member intents in search and recommender systems. 
+As a general NLP framework, DeText can be applied to many tasks, including search & recommendation ranking, 
+multi-class classification and query understanding tasks.
+More details can be found in the [LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2020/open-sourcing-detext).
+## Highlight
+* Natural language understanding powered by state-of-the-art deep neural networks
+  * automatic feature extraction with deep models
+  * end-to-end training
+  * interaction modeling between ranking sources and targets
+* A general framework with great flexibility
+  * customizable model architectures
+  * multiple text encoder support
+  * multiple data input types support
+  * various optimization choices
+  * standard training flow control
+* Easy-to-use
+  * Configuration based modeling (e.g., all configurations through command line)
+## General Model Architecture
+DeText supports a general model architecture that contains following components:
+* **Word embedding layer**.  It converts the sequence of words into a d by n matrix.
+* **CNN/BERT/LSTM for text encoding layer**.  It takes into the word embedding matrix as input, and maps the text data into a fixed length embedding.
+* **Interaction layer**.  It generates deep features based on the text embeddings.  Options include concatenation, cosine similarity, etc.
+* **Wide & Deep Feature Processing**.  We combine the traditional features with the interaction features (deep features) in a wide & deep fashion.
+* **MLP layer**. The MLP layer is to combine wide features and deep features. 
+All parameters are jointly updated to optimize the training objective.
+![](detext_model_architecture.png) 
+### Model Configurables
+DeText offers great flexibility for clients to build customized networks for their own use cases:
+* **LTR/classification layer**: in-house LTR loss implementation, or tf-ranking LTR loss, multi-class classification support.
+* **MLP layer**: customizable number of layers and number of dimensions.
+* **Interaction layer**: support Cosine Similarity, Hadamard Product, and Concatenation.
+* **Text embedding layer**: support CNN, BERT, LSTM with customized parameters on filters, layers, dimensions, etc.
+* **Continuous feature normalization**: element-wise rescaling, value normalization.
+* **Categorical feature processing**: modeled as entity embedding.
+All these can be customized via hyper-parameters in the DeText template. Note that tf-ranking is 
+supported in the DeText framework, i.e., users can choose the LTR loss and metrics defined in DeText.
+## User Guide
+### Dev environment set up
+1. Create your virtualenv (Python version >= 3.7)
+    ```shell script
+    VENV_DIR = <your venv dir>
+    python3 -m venv $VENV_DIR  # Make sure your python version >= 3.7
+    source $VENV_DIR/bin/activate  # Enter the virtual environment
+    ```
+1. Upgrade pip and setuptools version
+    ```shell script
+    pip3 install -U pip
+    pip3 install -U setuptools
+    ```
+1. Run setup for DeText:
+    ```shell script
+    pip install . -e
+    ```
+1. Verify environment setup through pytest. If all tests pass, the environment is correctly set up
+    ```shell script
+    pytest 
+    ```
+1. Refer to the training manual ([TRAINING.md](user_guide/TRAINING.md)) to find information about customizing the model:
+    * Training data format and preparation
+    * Key parameters to customize and train DeText models
+    * Detailed information about all DeText training parameters for full customization
+1. Train a model using DeText (e.g., [run_detext.sh](test/resources/run_detext.sh))
+### Tutorial
+If you would like a simple try out of the library, you can refer to the following notebooks for tutorial
+* [text_classification_demo.ipynb](user_guide/notebooks/text_classification_demo.ipynb)
+    This notebook shows how to use DeText to train a multi-class text classification model on a public query intent 
+    classification dataset. Detailed instructions on data preparation, model training, model inference are included.
+* [autocompletion.ipynb](user_guide/notebooks/autocompletion.ipynb)
+    This notebook shows how to use DeText to train a text ranking model on a public query auto completion dataset.
+    Detailed steps on data preparation, model training, model inference examples are included.
+## **Citation**
+Please cite DeText in your publications if it helps your research:
+```
+@manual{guo-liu20,
+  author    = {Weiwei Guo and
+               Xiaowei Liu and
+               Sida Wang and 
+               Huiji Gao and
+               Bo Long},
+  title     = {DeText: A Deep NLP Framework for Intelligent Text Understanding},
+  url       = {https://engineering.linkedin.com/blog/2020/open-sourcing-detext},
+  year      = {2020}
+}
+@inproceedings{guo-gao19,
+  author    = {Weiwei Guo and
+               Huiji Gao and
+               Jun Shi and 
+               Bo Long},
+  title     = {Deep Natural Language Processing for Search Systems},
+  booktitle = {ACM SIGIR 2019},
+  year      = {2019}
+}
+@inproceedings{guo-gao19,
+  author    = {Weiwei Guo and
+               Huiji Gao and
+               Jun Shi and 
+               Bo Long and 
+               Liang Zhang and
+               Bee-Chung Chen and
+               Deepak Agarwal},
+  title     = {Deep Natural Language Processing for Search and Recommender Systems},
+  booktitle = {ACM SIGKDD 2019},
+  year      = {2019}
+}
+@inproceedings{guo-liu20,
+  author    = {Weiwei Guo and
+               Xiaowei Liu and
+               Sida Wang and 
+               Huiji Gao and
+               Ananth Sankar and 
+               Zimeng Yang and 
+               Qi Guo and 
+               Liang Zhang and
+               Bo Long and 
+               Bee-Chung Chen and 
+               Deepak Agarwal},
+  title     = {DeText: A Deep Text Ranking Framework with BERT},
+  booktitle = {ACM CIKM 2020},
+  year      = {2020}
+}
+@inproceedings{jia-long20,
+  author    = {Jun Jia and
+               Bo Long and
+               Huiji Gao and 
+               Weiwei Guo and 
+               Jun Shi and
+               Xiaowei Liu and
+               Mingzhou Zhou and
+               Zhoutong Fu and
+               Sida Wang and
+               Sandeep Kumar Jha},
+  title     = {Deep Learning for Search and Recommender Systems in Practice},
+  booktitle = {ACM SIGKDD 2020},
+  year      = {2020}
+}
+@inproceedings{wang-guo20,
+  author    = {Sida Wang and
+               Weiwei Guo and
+               Huiji Gao and
+               Bo Long},
+  title     = {Efficient Neural Query Auto Completion},
+  booktitle = {ACM CIKM 2020},
+  year      = {2020}
+}
+@inproceedings{liu-guo20,
+  author    = {Xiaowei Liu and
+               Weiwei Guo and
+               Huiji Gao and
+               Bo Long},
+  title     = {Deep Search Query Intent Understanding},
+  booktitle = {arXiv:2008.06759},
+  year      = {2020}
+}
+```
+
+%prep
+%autosetup -n detext-3.2.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-detext -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 3.2.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..77c51c5
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+0ca62a18e6d8ea366ed9e3dd8e02869e  detext-3.2.0.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-05 10:49:27 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-05 10:49:27 +0000
commit	190a691c88e39f560bc99671b1ef7be181183d18 (patch)
tree	9f91f52507d4cbdf29f96976c2e7c33e98b3695b
parent	85a12b4a85d8a454379fb37fdfaa475350058875 (diff)