diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-05-05 10:49:27 +0000 | 
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-05-05 10:49:27 +0000 | 
| commit | 190a691c88e39f560bc99671b1ef7be181183d18 (patch) | |
| tree | 9f91f52507d4cbdf29f96976c2e7c33e98b3695b | |
| parent | 85a12b4a85d8a454379fb37fdfaa475350058875 (diff) | |
automatic import of python-detextopeneuler20.03
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-detext.spec | 537 | ||||
| -rw-r--r-- | sources | 1 | 
3 files changed, 539 insertions, 0 deletions
@@ -0,0 +1 @@ +/detext-3.2.0.tar.gz diff --git a/python-detext.spec b/python-detext.spec new file mode 100644 index 0000000..5eaec48 --- /dev/null +++ b/python-detext.spec @@ -0,0 +1,537 @@ +%global _empty_manifest_terminate_build 0 +Name:		python-detext +Version:	3.2.0 +Release:	1 +Summary:	please add a summary manually as the author left a blank one +License:	BSD-2-CLAUSE +URL:		https://pypi.org/project/detext/ +Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/be/b5/7830e3f839fe0de22356c5e59982dc03706588ccd63e32162a903252ff1b/detext-3.2.0.tar.gz +BuildArch:	noarch + + +%description +**DeText** is a <b>_De_</b>ep **_Text_** understanding framework for NLP related ranking, classification,  +and language generation tasks.  It leverages semantic matching using deep neural networks to  +understand member intents in search and recommender systems.  +As a general NLP framework, DeText can be applied to many tasks, including search & recommendation ranking,  +multi-class classification and query understanding tasks. +More details can be found in the [LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2020/open-sourcing-detext). +## Highlight +* Natural language understanding powered by state-of-the-art deep neural networks +  * automatic feature extraction with deep models +  * end-to-end training +  * interaction modeling between ranking sources and targets +* A general framework with great flexibility +  * customizable model architectures +  * multiple text encoder support +  * multiple data input types support +  * various optimization choices +  * standard training flow control +* Easy-to-use +  * Configuration based modeling (e.g., all configurations through command line) +## General Model Architecture +DeText supports a general model architecture that contains following components: +* **Word embedding layer**.  It converts the sequence of words into a d by n matrix. +* **CNN/BERT/LSTM for text encoding layer**.  It takes into the word embedding matrix as input, and maps the text data into a fixed length embedding. +* **Interaction layer**.  It generates deep features based on the text embeddings.  Options include concatenation, cosine similarity, etc. +* **Wide & Deep Feature Processing**.  We combine the traditional features with the interaction features (deep features) in a wide & deep fashion. +* **MLP layer**. The MLP layer is to combine wide features and deep features.  +All parameters are jointly updated to optimize the training objective. +  +### Model Configurables +DeText offers great flexibility for clients to build customized networks for their own use cases: +* **LTR/classification layer**: in-house LTR loss implementation, or tf-ranking LTR loss, multi-class classification support. +* **MLP layer**: customizable number of layers and number of dimensions. +* **Interaction layer**: support Cosine Similarity, Hadamard Product, and Concatenation. +* **Text embedding layer**: support CNN, BERT, LSTM with customized parameters on filters, layers, dimensions, etc. +* **Continuous feature normalization**: element-wise rescaling, value normalization. +* **Categorical feature processing**: modeled as entity embedding. +All these can be customized via hyper-parameters in the DeText template. Note that tf-ranking is  +supported in the DeText framework, i.e., users can choose the LTR loss and metrics defined in DeText. +## User Guide +### Dev environment set up +1. Create your virtualenv (Python version >= 3.7) +    ```shell script +    VENV_DIR = <your venv dir> +    python3 -m venv $VENV_DIR  # Make sure your python version >= 3.7 +    source $VENV_DIR/bin/activate  # Enter the virtual environment +    ``` +1. Upgrade pip and setuptools version +    ```shell script +    pip3 install -U pip +    pip3 install -U setuptools +    ``` +1. Run setup for DeText: +    ```shell script +    pip install . -e +    ``` +1. Verify environment setup through pytest. If all tests pass, the environment is correctly set up +    ```shell script +    pytest  +    ``` +1. Refer to the training manual ([TRAINING.md](user_guide/TRAINING.md)) to find information about customizing the model: +    * Training data format and preparation +    * Key parameters to customize and train DeText models +    * Detailed information about all DeText training parameters for full customization +1. Train a model using DeText (e.g., [run_detext.sh](test/resources/run_detext.sh)) +### Tutorial +If you would like a simple try out of the library, you can refer to the following notebooks for tutorial +* [text_classification_demo.ipynb](user_guide/notebooks/text_classification_demo.ipynb) +    This notebook shows how to use DeText to train a multi-class text classification model on a public query intent  +    classification dataset. Detailed instructions on data preparation, model training, model inference are included. +* [autocompletion.ipynb](user_guide/notebooks/autocompletion.ipynb) +    This notebook shows how to use DeText to train a text ranking model on a public query auto completion dataset. +    Detailed steps on data preparation, model training, model inference examples are included. +## **Citation** +Please cite DeText in your publications if it helps your research: +``` +@manual{guo-liu20, +  author    = {Weiwei Guo and +               Xiaowei Liu and +               Sida Wang and  +               Huiji Gao and +               Bo Long}, +  title     = {DeText: A Deep NLP Framework for Intelligent Text Understanding}, +  url       = {https://engineering.linkedin.com/blog/2020/open-sourcing-detext}, +  year      = {2020} +} +@inproceedings{guo-gao19, +  author    = {Weiwei Guo and +               Huiji Gao and +               Jun Shi and  +               Bo Long}, +  title     = {Deep Natural Language Processing for Search Systems}, +  booktitle = {ACM SIGIR 2019}, +  year      = {2019} +} +@inproceedings{guo-gao19, +  author    = {Weiwei Guo and +               Huiji Gao and +               Jun Shi and  +               Bo Long and  +               Liang Zhang and +               Bee-Chung Chen and +               Deepak Agarwal}, +  title     = {Deep Natural Language Processing for Search and Recommender Systems}, +  booktitle = {ACM SIGKDD 2019}, +  year      = {2019} +} +@inproceedings{guo-liu20, +  author    = {Weiwei Guo and +               Xiaowei Liu and +               Sida Wang and  +               Huiji Gao and +               Ananth Sankar and  +               Zimeng Yang and  +               Qi Guo and  +               Liang Zhang and +               Bo Long and  +               Bee-Chung Chen and  +               Deepak Agarwal}, +  title     = {DeText: A Deep Text Ranking Framework with BERT}, +  booktitle = {ACM CIKM 2020}, +  year      = {2020} +} +@inproceedings{jia-long20, +  author    = {Jun Jia and +               Bo Long and +               Huiji Gao and  +               Weiwei Guo and  +               Jun Shi and +               Xiaowei Liu and +               Mingzhou Zhou and +               Zhoutong Fu and +               Sida Wang and +               Sandeep Kumar Jha}, +  title     = {Deep Learning for Search and Recommender Systems in Practice}, +  booktitle = {ACM SIGKDD 2020}, +  year      = {2020} +} +@inproceedings{wang-guo20, +  author    = {Sida Wang and +               Weiwei Guo and +               Huiji Gao and +               Bo Long}, +  title     = {Efficient Neural Query Auto Completion}, +  booktitle = {ACM CIKM 2020}, +  year      = {2020} +} +@inproceedings{liu-guo20, +  author    = {Xiaowei Liu and +               Weiwei Guo and +               Huiji Gao and +               Bo Long}, +  title     = {Deep Search Query Intent Understanding}, +  booktitle = {arXiv:2008.06759}, +  year      = {2020} +} +``` + +%package -n python3-detext +Summary:	please add a summary manually as the author left a blank one +Provides:	python-detext +BuildRequires:	python3-devel +BuildRequires:	python3-setuptools +BuildRequires:	python3-pip +%description -n python3-detext +**DeText** is a <b>_De_</b>ep **_Text_** understanding framework for NLP related ranking, classification,  +and language generation tasks.  It leverages semantic matching using deep neural networks to  +understand member intents in search and recommender systems.  +As a general NLP framework, DeText can be applied to many tasks, including search & recommendation ranking,  +multi-class classification and query understanding tasks. +More details can be found in the [LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2020/open-sourcing-detext). +## Highlight +* Natural language understanding powered by state-of-the-art deep neural networks +  * automatic feature extraction with deep models +  * end-to-end training +  * interaction modeling between ranking sources and targets +* A general framework with great flexibility +  * customizable model architectures +  * multiple text encoder support +  * multiple data input types support +  * various optimization choices +  * standard training flow control +* Easy-to-use +  * Configuration based modeling (e.g., all configurations through command line) +## General Model Architecture +DeText supports a general model architecture that contains following components: +* **Word embedding layer**.  It converts the sequence of words into a d by n matrix. +* **CNN/BERT/LSTM for text encoding layer**.  It takes into the word embedding matrix as input, and maps the text data into a fixed length embedding. +* **Interaction layer**.  It generates deep features based on the text embeddings.  Options include concatenation, cosine similarity, etc. +* **Wide & Deep Feature Processing**.  We combine the traditional features with the interaction features (deep features) in a wide & deep fashion. +* **MLP layer**. The MLP layer is to combine wide features and deep features.  +All parameters are jointly updated to optimize the training objective. +  +### Model Configurables +DeText offers great flexibility for clients to build customized networks for their own use cases: +* **LTR/classification layer**: in-house LTR loss implementation, or tf-ranking LTR loss, multi-class classification support. +* **MLP layer**: customizable number of layers and number of dimensions. +* **Interaction layer**: support Cosine Similarity, Hadamard Product, and Concatenation. +* **Text embedding layer**: support CNN, BERT, LSTM with customized parameters on filters, layers, dimensions, etc. +* **Continuous feature normalization**: element-wise rescaling, value normalization. +* **Categorical feature processing**: modeled as entity embedding. +All these can be customized via hyper-parameters in the DeText template. Note that tf-ranking is  +supported in the DeText framework, i.e., users can choose the LTR loss and metrics defined in DeText. +## User Guide +### Dev environment set up +1. Create your virtualenv (Python version >= 3.7) +    ```shell script +    VENV_DIR = <your venv dir> +    python3 -m venv $VENV_DIR  # Make sure your python version >= 3.7 +    source $VENV_DIR/bin/activate  # Enter the virtual environment +    ``` +1. Upgrade pip and setuptools version +    ```shell script +    pip3 install -U pip +    pip3 install -U setuptools +    ``` +1. Run setup for DeText: +    ```shell script +    pip install . -e +    ``` +1. Verify environment setup through pytest. If all tests pass, the environment is correctly set up +    ```shell script +    pytest  +    ``` +1. Refer to the training manual ([TRAINING.md](user_guide/TRAINING.md)) to find information about customizing the model: +    * Training data format and preparation +    * Key parameters to customize and train DeText models +    * Detailed information about all DeText training parameters for full customization +1. Train a model using DeText (e.g., [run_detext.sh](test/resources/run_detext.sh)) +### Tutorial +If you would like a simple try out of the library, you can refer to the following notebooks for tutorial +* [text_classification_demo.ipynb](user_guide/notebooks/text_classification_demo.ipynb) +    This notebook shows how to use DeText to train a multi-class text classification model on a public query intent  +    classification dataset. Detailed instructions on data preparation, model training, model inference are included. +* [autocompletion.ipynb](user_guide/notebooks/autocompletion.ipynb) +    This notebook shows how to use DeText to train a text ranking model on a public query auto completion dataset. +    Detailed steps on data preparation, model training, model inference examples are included. +## **Citation** +Please cite DeText in your publications if it helps your research: +``` +@manual{guo-liu20, +  author    = {Weiwei Guo and +               Xiaowei Liu and +               Sida Wang and  +               Huiji Gao and +               Bo Long}, +  title     = {DeText: A Deep NLP Framework for Intelligent Text Understanding}, +  url       = {https://engineering.linkedin.com/blog/2020/open-sourcing-detext}, +  year      = {2020} +} +@inproceedings{guo-gao19, +  author    = {Weiwei Guo and +               Huiji Gao and +               Jun Shi and  +               Bo Long}, +  title     = {Deep Natural Language Processing for Search Systems}, +  booktitle = {ACM SIGIR 2019}, +  year      = {2019} +} +@inproceedings{guo-gao19, +  author    = {Weiwei Guo and +               Huiji Gao and +               Jun Shi and  +               Bo Long and  +               Liang Zhang and +               Bee-Chung Chen and +               Deepak Agarwal}, +  title     = {Deep Natural Language Processing for Search and Recommender Systems}, +  booktitle = {ACM SIGKDD 2019}, +  year      = {2019} +} +@inproceedings{guo-liu20, +  author    = {Weiwei Guo and +               Xiaowei Liu and +               Sida Wang and  +               Huiji Gao and +               Ananth Sankar and  +               Zimeng Yang and  +               Qi Guo and  +               Liang Zhang and +               Bo Long and  +               Bee-Chung Chen and  +               Deepak Agarwal}, +  title     = {DeText: A Deep Text Ranking Framework with BERT}, +  booktitle = {ACM CIKM 2020}, +  year      = {2020} +} +@inproceedings{jia-long20, +  author    = {Jun Jia and +               Bo Long and +               Huiji Gao and  +               Weiwei Guo and  +               Jun Shi and +               Xiaowei Liu and +               Mingzhou Zhou and +               Zhoutong Fu and +               Sida Wang and +               Sandeep Kumar Jha}, +  title     = {Deep Learning for Search and Recommender Systems in Practice}, +  booktitle = {ACM SIGKDD 2020}, +  year      = {2020} +} +@inproceedings{wang-guo20, +  author    = {Sida Wang and +               Weiwei Guo and +               Huiji Gao and +               Bo Long}, +  title     = {Efficient Neural Query Auto Completion}, +  booktitle = {ACM CIKM 2020}, +  year      = {2020} +} +@inproceedings{liu-guo20, +  author    = {Xiaowei Liu and +               Weiwei Guo and +               Huiji Gao and +               Bo Long}, +  title     = {Deep Search Query Intent Understanding}, +  booktitle = {arXiv:2008.06759}, +  year      = {2020} +} +``` + +%package help +Summary:	Development documents and examples for detext +Provides:	python3-detext-doc +%description help +**DeText** is a <b>_De_</b>ep **_Text_** understanding framework for NLP related ranking, classification,  +and language generation tasks.  It leverages semantic matching using deep neural networks to  +understand member intents in search and recommender systems.  +As a general NLP framework, DeText can be applied to many tasks, including search & recommendation ranking,  +multi-class classification and query understanding tasks. +More details can be found in the [LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2020/open-sourcing-detext). +## Highlight +* Natural language understanding powered by state-of-the-art deep neural networks +  * automatic feature extraction with deep models +  * end-to-end training +  * interaction modeling between ranking sources and targets +* A general framework with great flexibility +  * customizable model architectures +  * multiple text encoder support +  * multiple data input types support +  * various optimization choices +  * standard training flow control +* Easy-to-use +  * Configuration based modeling (e.g., all configurations through command line) +## General Model Architecture +DeText supports a general model architecture that contains following components: +* **Word embedding layer**.  It converts the sequence of words into a d by n matrix. +* **CNN/BERT/LSTM for text encoding layer**.  It takes into the word embedding matrix as input, and maps the text data into a fixed length embedding. +* **Interaction layer**.  It generates deep features based on the text embeddings.  Options include concatenation, cosine similarity, etc. +* **Wide & Deep Feature Processing**.  We combine the traditional features with the interaction features (deep features) in a wide & deep fashion. +* **MLP layer**. The MLP layer is to combine wide features and deep features.  +All parameters are jointly updated to optimize the training objective. +  +### Model Configurables +DeText offers great flexibility for clients to build customized networks for their own use cases: +* **LTR/classification layer**: in-house LTR loss implementation, or tf-ranking LTR loss, multi-class classification support. +* **MLP layer**: customizable number of layers and number of dimensions. +* **Interaction layer**: support Cosine Similarity, Hadamard Product, and Concatenation. +* **Text embedding layer**: support CNN, BERT, LSTM with customized parameters on filters, layers, dimensions, etc. +* **Continuous feature normalization**: element-wise rescaling, value normalization. +* **Categorical feature processing**: modeled as entity embedding. +All these can be customized via hyper-parameters in the DeText template. Note that tf-ranking is  +supported in the DeText framework, i.e., users can choose the LTR loss and metrics defined in DeText. +## User Guide +### Dev environment set up +1. Create your virtualenv (Python version >= 3.7) +    ```shell script +    VENV_DIR = <your venv dir> +    python3 -m venv $VENV_DIR  # Make sure your python version >= 3.7 +    source $VENV_DIR/bin/activate  # Enter the virtual environment +    ``` +1. Upgrade pip and setuptools version +    ```shell script +    pip3 install -U pip +    pip3 install -U setuptools +    ``` +1. Run setup for DeText: +    ```shell script +    pip install . -e +    ``` +1. Verify environment setup through pytest. If all tests pass, the environment is correctly set up +    ```shell script +    pytest  +    ``` +1. Refer to the training manual ([TRAINING.md](user_guide/TRAINING.md)) to find information about customizing the model: +    * Training data format and preparation +    * Key parameters to customize and train DeText models +    * Detailed information about all DeText training parameters for full customization +1. Train a model using DeText (e.g., [run_detext.sh](test/resources/run_detext.sh)) +### Tutorial +If you would like a simple try out of the library, you can refer to the following notebooks for tutorial +* [text_classification_demo.ipynb](user_guide/notebooks/text_classification_demo.ipynb) +    This notebook shows how to use DeText to train a multi-class text classification model on a public query intent  +    classification dataset. Detailed instructions on data preparation, model training, model inference are included. +* [autocompletion.ipynb](user_guide/notebooks/autocompletion.ipynb) +    This notebook shows how to use DeText to train a text ranking model on a public query auto completion dataset. +    Detailed steps on data preparation, model training, model inference examples are included. +## **Citation** +Please cite DeText in your publications if it helps your research: +``` +@manual{guo-liu20, +  author    = {Weiwei Guo and +               Xiaowei Liu and +               Sida Wang and  +               Huiji Gao and +               Bo Long}, +  title     = {DeText: A Deep NLP Framework for Intelligent Text Understanding}, +  url       = {https://engineering.linkedin.com/blog/2020/open-sourcing-detext}, +  year      = {2020} +} +@inproceedings{guo-gao19, +  author    = {Weiwei Guo and +               Huiji Gao and +               Jun Shi and  +               Bo Long}, +  title     = {Deep Natural Language Processing for Search Systems}, +  booktitle = {ACM SIGIR 2019}, +  year      = {2019} +} +@inproceedings{guo-gao19, +  author    = {Weiwei Guo and +               Huiji Gao and +               Jun Shi and  +               Bo Long and  +               Liang Zhang and +               Bee-Chung Chen and +               Deepak Agarwal}, +  title     = {Deep Natural Language Processing for Search and Recommender Systems}, +  booktitle = {ACM SIGKDD 2019}, +  year      = {2019} +} +@inproceedings{guo-liu20, +  author    = {Weiwei Guo and +               Xiaowei Liu and +               Sida Wang and  +               Huiji Gao and +               Ananth Sankar and  +               Zimeng Yang and  +               Qi Guo and  +               Liang Zhang and +               Bo Long and  +               Bee-Chung Chen and  +               Deepak Agarwal}, +  title     = {DeText: A Deep Text Ranking Framework with BERT}, +  booktitle = {ACM CIKM 2020}, +  year      = {2020} +} +@inproceedings{jia-long20, +  author    = {Jun Jia and +               Bo Long and +               Huiji Gao and  +               Weiwei Guo and  +               Jun Shi and +               Xiaowei Liu and +               Mingzhou Zhou and +               Zhoutong Fu and +               Sida Wang and +               Sandeep Kumar Jha}, +  title     = {Deep Learning for Search and Recommender Systems in Practice}, +  booktitle = {ACM SIGKDD 2020}, +  year      = {2020} +} +@inproceedings{wang-guo20, +  author    = {Sida Wang and +               Weiwei Guo and +               Huiji Gao and +               Bo Long}, +  title     = {Efficient Neural Query Auto Completion}, +  booktitle = {ACM CIKM 2020}, +  year      = {2020} +} +@inproceedings{liu-guo20, +  author    = {Xiaowei Liu and +               Weiwei Guo and +               Huiji Gao and +               Bo Long}, +  title     = {Deep Search Query Intent Understanding}, +  booktitle = {arXiv:2008.06759}, +  year      = {2020} +} +``` + +%prep +%autosetup -n detext-3.2.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then +	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then +	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then +	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then +	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then +	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-detext -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 3.2.0-1 +- Package Spec generated @@ -0,0 +1 @@ +0ca62a18e6d8ea366ed9e3dd8e02869e  detext-3.2.0.tar.gz  | 
