diff options
author | CoprDistGit <infra@openeuler.org> | 2023-04-11 10:14:59 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-04-11 10:14:59 +0000 |
commit | d85074742d391668965c79534fc0be27ec7c5579 (patch) | |
tree | 5dc1401385f22d982f718b5ee999d54c7a23bc5a | |
parent | 2d210207b68a1c474c94b9f2b2001b2625538fd5 (diff) |
automatic import of python-paddlenlp
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-paddlenlp.spec | 839 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 841 insertions, 0 deletions
@@ -0,0 +1 @@ +/paddlenlp-2.5.2.tar.gz diff --git a/python-paddlenlp.spec b/python-paddlenlp.spec new file mode 100644 index 0000000..556e5fc --- /dev/null +++ b/python-paddlenlp.spec @@ -0,0 +1,839 @@ +%global _empty_manifest_terminate_build 0 +Name: python-paddlenlp +Version: 2.5.2 +Release: 1 +Summary: Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system. +License: Apache 2.0 +URL: https://github.com/PaddlePaddle/PaddleNLP +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/ac/30/204ab2e0e01222060db5684041d5c4a73dec0adb622e3397f88d15b38f94/paddlenlp-2.5.2.tar.gz +BuildArch: noarch + +Requires: python3-jieba +Requires: python3-colorlog +Requires: python3-colorama +Requires: python3-seqeval +Requires: python3-dill +Requires: python3-multiprocess +Requires: python3-datasets +Requires: python3-tqdm +Requires: python3-paddlefsl +Requires: python3-sentencepiece +Requires: python3-huggingface-hub +Requires: python3-paddle2onnx +Requires: python3-Flask-Babel +Requires: python3-visualdl +Requires: python3-fastapi +Requires: python3-uvicorn +Requires: python3-typer +Requires: python3-rich +Requires: python3-ray[tune] +Requires: python3-hyperopt +Requires: python3-parameterized +Requires: python3-sentencepiece +Requires: python3-regex +Requires: python3-torch +Requires: python3-transformers +Requires: python3-fast-tokenizer-python +Requires: python3-jinja2 +Requires: python3-sphinx +Requires: python3-sphinx-rtd-theme +Requires: python3-readthedocs-sphinx-search +Requires: python3-Markdown +Requires: python3-sphinx-copybutton +Requires: python3-sphinx-markdown-tables +Requires: python3-paddlepaddle +Requires: python3-ray[tune] +Requires: python3-hyperopt +Requires: python3-jinja2 +Requires: python3-sphinx +Requires: python3-sphinx-rtd-theme +Requires: python3-readthedocs-sphinx-search +Requires: python3-Markdown +Requires: python3-sphinx-copybutton +Requires: python3-sphinx-markdown-tables +Requires: python3-paddlepaddle +Requires: python3-parameterized +Requires: python3-sentencepiece +Requires: python3-regex +Requires: python3-torch +Requires: python3-transformers +Requires: python3-fast-tokenizer-python + +%description +<p align="center"> + <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleNLP?color=ffa"></a> + <a href=""><img src="https://img.shields.io/badge/python-3.6.2+-aff.svg"></a> + <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleNLP?color=9ea"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleNLP?color=3af"></a> + <a href="https://pypi.org/project/paddlenlp/"><img src="https://img.shields.io/pypi/dm/paddlenlp?color=9cf"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleNLP?color=9cc"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleNLP?color=ccf"></a> +</p> +<h4 align="center"> + <a href=#features> Features </a> | + <a href=#installation> Installation </a> | + <a href=#quick-start> Quick Start </a> | + <a href=#api-reference> API Reference </a> | + <a href=#community> Community </a> +</h4> +**PaddleNLP** is an *easy-to-use* and *powerful* NLP library with **Awesome** pre-trained model zoo, supporting wide-range of NLP tasks from research to industrial applications. +## News ๐ข +* ๐ฅ **Latest Features** + * ๐ Release **[UIE-X](./applications/information_extraction)**, an universal information extraction model that supports both document and text inputs. + * โฃ๏ธRelease **[Opinion Mining and Sentiment Analysis Models](./applications/sentiment_analysis/unified_sentiment_extraction)** based on UIE, including abilities of sentence-level and aspect-based sentiment classification, attribute extraction, opinion extraction, attribute aggregation and implicit opinion extraction. +* **2022.9.6 [PaddleNLPv2.4](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.0) Released!** + * ๐ NLP Tools: Released **[Pipelines](./pipelines)** which supports turn-key construction of search engine and question answering systems. It features a flexible design that is applicable for all kinds of NLP systems so you can build end-to-end NLP pipelines like Legos! + * ๐จ Industrial application: Release **[Complete Solution of Text Classification](./applications/text_classification)** covering various scenarios of text classification: multi-class, multi-label and hierarchical, it also supports **few-shot learning** and the training and optimization of **TrustAI**. Upgrade for [**UIE**](./model_zoo/uie) and release **UIE-M**, support both Chinese and English information extraction in a single model; release the data distillation solution for UIE to break the bottleneck of time-consuming of inference. + * ๐ญ AIGC: Release code generation SOTA model [**CodeGen**](./examples/code_generation/codegen) that supports multiple programming languages code generation. Integrate [**Text to Image Model**](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md#%E6%96%87%E5%9B%BE%E7%94%9F%E6%88%90) DALLยทE Mini, Disco Diffusion, Stable Diffusion, let's play and have some fun! + * ๐ช Framework upgrade: Release [**Auto Model Compression API**](./docs/compression.md), supports for pruning and quantization automatically, lower the barriers of model compression; Release [**Few-shot Prompt**](./applications/text_classification/multi_class/few-shot), includes the algorithms such as PET, P-Tuning and RGL. +## Features +#### <a href=#out-of-box-nlp-toolset> ๐ฆ Out-of-Box NLP Toolset </a> +#### <a href=#awesome-chinese-model-zoo> ๐ค Awesome Chinese Model Zoo </a> +#### <a href=#industrial-end-to-end-system> ๐๏ธ Industrial End-to-end System </a> +#### <a href=#high-performance-distributed-training-and-inference> ๐ High Performance Distributed Training and Inference </a> +### Out-of-Box NLP Toolset +Taskflow aims to provide off-the-shelf NLP pre-built task covering NLU and NLG technique, in the meanwhile with extreamly fast infernece satisfying industrial scenario. + +For more usage please refer to [Taskflow Docs](./docs/model_zoo/taskflow.md). +### Awesome Chinese Model Zoo +#### ๐ Comprehensive Chinese Transformer Models +We provide **45+** network architectures and over **500+** pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of the high-quality Chinese pretrained model developed by other organizations. Use `AutoModel` API to **โกSUPER FASTโก** download pretrained models of different architecture. We welcome all developers to contribute your Transformer models to PaddleNLP! +```python +from paddlenlp.transformers import * +ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh') +bert = AutoModel.from_pretrained('bert-wwm-chinese') +albert = AutoModel.from_pretrained('albert-chinese-tiny') +roberta = AutoModel.from_pretrained('roberta-wwm-ext') +electra = AutoModel.from_pretrained('chinese-electra-small') +gpt = AutoModelForPretraining.from_pretrained('gpt-cpm-large-cn') +``` +Due to the computation limitation, you can use the ERNIE-Tiny light models to accelerate the deployment of pretrained models. +```python +# 6L768H +ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh') +# 6L384H +ernie = AutoModel.from_pretrained('ernie-3.0-mini-zh') +# 4L384H +ernie = AutoModel.from_pretrained('ernie-3.0-micro-zh') +# 4L312H +ernie = AutoModel.from_pretrained('ernie-3.0-nano-zh') +``` +Unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc. +```python +import paddle +from paddlenlp.transformers import * +tokenizer = AutoTokenizer.from_pretrained('ernie-3.0-medium-zh') +text = tokenizer('natural language processing') +# Semantic Representation +model = AutoModel.from_pretrained('ernie-3.0-medium-zh') +sequence_output, pooled_output = model(input_ids=paddle.to_tensor([text['input_ids']])) +# Text Classificaiton and Matching +model = AutoModelForSequenceClassification.from_pretrained('ernie-3.0-medium-zh') +# Sequence Labeling +model = AutoModelForTokenClassification.from_pretrained('ernie-3.0-medium-zh') +# Question Answering +model = AutoModelForQuestionAnswering.from_pretrained('ernie-3.0-medium-zh') +``` +#### Wide-range NLP Task Support +PaddleNLP provides rich examples covering mainstream NLP task to help developers accelerate problem solving. You can find our powerful transformer [Model Zoo](./model_zoo), and wide-range NLP application [exmaples](./examples) with detailed instructions. +Also you can run our interactive [Notebook tutorial](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/574995) on AI Studio, a powerful platform with **FREE** computing resource. +<details><summary> PaddleNLP Transformer model summary (<b>click to show details</b>) </summary><div> +| Model | Sequence Classification | Token Classification | Question Answering | Text Generation | Multiple Choice | +| :----------------- | ----------------------- | -------------------- | ------------------ | --------------- | --------------- | +| ALBERT | โ
| โ
| โ
| โ | โ
| +| BART | โ
| โ
| โ
| โ
| โ | +| BERT | โ
| โ
| โ
| โ | โ
| +| BigBird | โ
| โ
| โ
| โ | โ
| +| BlenderBot | โ | โ | โ | โ
| โ | +| ChineseBERT | โ
| โ
| โ
| โ | โ | +| ConvBERT | โ
| โ
| โ
| โ | โ
| +| CTRL | โ
| โ | โ | โ | โ | +| DistilBERT | โ
| โ
| โ
| โ | โ | +| ELECTRA | โ
| โ
| โ
| โ | โ
| +| ERNIE | โ
| โ
| โ
| โ | โ
| +| ERNIE-CTM | โ | โ
| โ | โ | โ | +| ERNIE-Doc | โ
| โ
| โ
| โ | โ | +| ERNIE-GEN | โ | โ | โ | โ
| โ | +| ERNIE-Gram | โ
| โ
| โ
| โ | โ | +| ERNIE-M | โ
| โ
| โ
| โ | โ | +| FNet | โ
| โ
| โ
| โ | โ
| +| Funnel-Transformer | โ
| โ
| โ
| โ | โ | +| GPT | โ
| โ
| โ | โ
| โ | +| LayoutLM | โ
| โ
| โ | โ | โ | +| LayoutLMv2 | โ | โ
| โ | โ | โ | +| LayoutXLM | โ | โ
| โ | โ | โ | +| LUKE | โ | โ
| โ
| โ | โ | +| mBART | โ
| โ | โ
| โ | โ
| +| MegatronBERT | โ
| โ
| โ
| โ | โ
| +| MobileBERT | โ
| โ | โ
| โ | โ | +| MPNet | โ
| โ
| โ
| โ | โ
| +| NEZHA | โ
| โ
| โ
| โ | โ
| +| PP-MiniLM | โ
| โ | โ | โ | โ | +| ProphetNet | โ | โ | โ | โ
| โ | +| Reformer | โ
| โ | โ
| โ | โ | +| RemBERT | โ
| โ
| โ
| โ | โ
| +| RoBERTa | โ
| โ
| โ
| โ | โ
| +| RoFormer | โ
| โ
| โ
| โ | โ | +| SKEP | โ
| โ
| โ | โ | โ | +| SqueezeBERT | โ
| โ
| โ
| โ | โ | +| T5 | โ | โ | โ | โ
| โ | +| TinyBERT | โ
| โ | โ | โ | โ | +| UnifiedTransformer | โ | โ | โ | โ
| โ | +| XLNet | โ
| โ
| โ
| โ | โ
| +</div></details> +For more pretrained model usage, please refer to [Transformer API Docs](./docs/model_zoo/index.rst). +### Industrial End-to-end System +We provide high value scenarios including information extraction, semantic retrieval, questionn answering high-value. +For more details industial cases please refer to [Applications](./applications). +#### ๐ Neural Search System +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168514909-8817d79a-72c4-4be1-8080-93d1f682bb46.gif" width="400"> +</div> +For more details please refer to [Neural Search](./applications/neural_search). +#### โ Question Answering System +We provide question answering pipeline which can support FAQ system, Document-level Visual Question answering system based on [๐RocketQA](https://github.com/PaddlePaddle/RocketQA). +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168514868-1babe981-c675-4f89-9168-dd0a3eede315.gif" width="400"> +</div> +For more details please refer to [Question Answering](./applications/question_answering) and [Document VQA](./applications/document_intelligence/doc_vqa). +#### ๐ Opinion Extraction and Sentiment Analysis +We build an opinion extraction system for product review and fine-grained sentiment analysis based on [SKEP](https://arxiv.org/abs/2005.05635) Model. +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168407260-b7f92800-861c-4207-98f3-2291e0102bbe.png" width="300"> +</div> +For more details please refer to [Sentiment Analysis](./applications/sentiment_analysis). +#### ๐๏ธ Speech Command Analysis +Integrated ASR Model, Information Extraction, we provide a speech command analysis pipeline that show how to use PaddleNLP and [PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech) to solve Speech + NLP real scenarios. +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168412618-04897a47-79c9-4fe7-a054-5dc1f6a1f75c.png" width="500"> +</div> +For more details please refer to [Speech Command Analysis](./applications/speech_cmd_analysis). +### High Performance Distributed Training and Inference +#### โก FastTokenizer: High Performance Text Preprocessing Library +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168407921-b4395b1d-44bd-41a0-8c58-923ba2b703ef.png" width="400"> +</div> +```python +AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True) +``` +Set `use_fast=True` to use C++ Tokenizer kernel to achieve 100x faster on text pre-processing. For more usage please refer to [FastTokenizer](./fast_tokenizer). +#### โก FastGeneration: High Perforance Generation Library +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168407831-914dced0-3a5a-40b8-8a65-ec82bf13e53c.gif" width="400"> +</div> +```python +model = GPTLMHeadModel.from_pretrained('gpt-cpm-large-cn') +outputs, _ = model.generate( + input_ids=inputs_ids, max_length=10, decode_strategy='greedy_search', + use_fast=True) +``` +Set `use_fast=True` to achieve 5x speedup for Transformer, GPT, BART, PLATO, UniLM text generation. For more usage please refer to [FastGeneration](./fast_generation). +#### ๐ Fleet: 4D Hybrid Distributed Training +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168515134-513f13e0-9902-40ef-98fa-528271dcccda.png" width="300"> +</div> +For more super large-scale model pre-training details please refer to [GPT-3](./examples/language_model/gpt-3). +## Installation +### Prerequisites +* python >= 3.7 +* paddlepaddle >= 2.3 +More information about PaddlePaddle installation please refer to [PaddlePaddle's Website](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html). +### Python pip Installation +``` +pip install --upgrade paddlenlp +``` +or you can install the latest develop branch code with the following command: +```shell +pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html +``` +## Quick Start +**Taskflow** aims to provide off-the-shelf NLP pre-built task covering NLU and NLG scenario, in the meanwhile with extreamly fast infernece satisfying industrial applications. +```python +from paddlenlp import Taskflow +# Chinese Word Segmentation +seg = Taskflow("word_segmentation") +seg("็ฌฌๅๅๅฑๅ
จ่ฟไผๅจ่ฅฟๅฎไธพๅ") +>>> ['็ฌฌๅๅๅฑ', 'ๅ
จ่ฟไผ', 'ๅจ', '่ฅฟๅฎ', 'ไธพๅ'] +# POS Tagging +tag = Taskflow("pos_tagging") +tag("็ฌฌๅๅๅฑๅ
จ่ฟไผๅจ่ฅฟๅฎไธพๅ") +>>> [('็ฌฌๅๅๅฑ', 'm'), ('ๅ
จ่ฟไผ', 'nz'), ('ๅจ', 'p'), ('่ฅฟๅฎ', 'LOC'), ('ไธพๅ', 'v')] +# Named Entity Recognition +ner = Taskflow("ner") +ner("ใๅญคๅฅณใๆฏ2010ๅนดไนๅทๅบ็็คพๅบ็็ๅฐ่ฏด๏ผไฝ่
ๆฏไฝๅ
ผ็พฝ") +>>> [('ใ', 'w'), ('ๅญคๅฅณ', 'ไฝๅ็ฑป_ๅฎไฝ'), ('ใ', 'w'), ('ๆฏ', '่ฏๅฎ่ฏ'), ('2010ๅนด', 'ๆถ้ด็ฑป'), ('ไนๅทๅบ็็คพ', '็ป็ปๆบๆ็ฑป'), ('ๅบ็', 'ๅบๆฏไบไปถ'), ('็', 'ๅฉ่ฏ'), ('ๅฐ่ฏด', 'ไฝๅ็ฑป_ๆฆๅฟต'), ('๏ผ', 'w'), ('ไฝ่
', 'ไบบ็ฉ็ฑป_ๆฆๅฟต'), ('ๆฏ', '่ฏๅฎ่ฏ'), ('ไฝๅ
ผ็พฝ', 'ไบบ็ฉ็ฑป_ๅฎไฝ')] +# Dependency Parsing +ddp = Taskflow("dependency_parsing") +ddp("9ๆ9ๆฅไธๅ็บณ่พพๅฐๅจไบ็ยท้ฟไป็ๅบๅป่ดฅไฟ็ฝๆฏ็ๅๆข
ๅพท้ฆๆฐๅคซ") +>>> [{'word': ['9ๆ9ๆฅ', 'ไธๅ', '็บณ่พพๅฐ', 'ๅจ', 'ไบ็ยท้ฟไป็ๅบ', 'ๅป่ดฅ', 'ไฟ็ฝๆฏ', '็ๅ', 'ๆข
ๅพท้ฆๆฐๅคซ'], 'head': [2, 6, 6, 5, 6, 0, 8, 9, 6], 'deprel': ['ATT', 'ADV', 'SBV', 'MT', 'ADV', 'HED', 'ATT', 'ATT', 'VOB']}] +# Sentiment Analysis +senta = Taskflow("sentiment_analysis") +senta("่ฟไธชไบงๅ็จ่ตทๆฅ็็ๅพๆต็
๏ผๆ้ๅธธๅๆฌข") +>>> [{'text': '่ฟไธชไบงๅ็จ่ตทๆฅ็็ๅพๆต็
๏ผๆ้ๅธธๅๆฌข', 'label': 'positive', 'score': 0.9938690066337585}] +``` +## API Reference +- Support [LUGE](https://www.luge.ai/) dataset loading and compatible with Hugging Face [Datasets](https://huggingface.co/datasets). For more details please refer to [Dataset API](https://paddlenlp.readthedocs.io/zh/latest/data_prepare/dataset_list.html). +- Using Hugging Face style API to load 500+ selected transformer models and download with fast speed. For more information please refer to [Transformers API](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/index.html). +- One-line of code to load pre-trained word embedding. For more usage please refer to [Embedding API](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/embeddings.html). +Please find all PaddleNLP API Reference from our [readthedocs](https://paddlenlp.readthedocs.io/). +## Community +### Slack +To connect with other users and contributors, welcome to join our [Slack channel](https://paddlenlp.slack.com/). +### WeChat +Scan the QR code below with your Wechatโฌ๏ธ. You can access to official technical exchange group. Look forward to your participation. +<div align="center"> +<img src="https://user-images.githubusercontent.com/11793384/212060369-4642d16e-f0ad-4359-aa57-b8303042f9c1.jpg" width="150" height="150" /> +</div> +## Citation +If you find PaddleNLP useful in your research, please consider cite +``` +@misc{=paddlenlp, + title={PaddleNLP: An Easy-to-use and High Performance NLP Library}, + author={PaddleNLP Contributors}, + howpublished = {\url{https://github.com/PaddlePaddle/PaddleNLP}}, + year={2021} +} +``` +## Acknowledge +We have borrowed from Hugging Face's [Transformers](https://github.com/huggingface/transformers)๐ค excellent design on pretrained models usage, and we would like to express our gratitude to the authors of Hugging Face and its open source community. +## License +PaddleNLP is provided under the [Apache-2.0 License](./LICENSE). + +%package -n python3-paddlenlp +Summary: Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system. +Provides: python-paddlenlp +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-paddlenlp +<p align="center"> + <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleNLP?color=ffa"></a> + <a href=""><img src="https://img.shields.io/badge/python-3.6.2+-aff.svg"></a> + <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleNLP?color=9ea"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleNLP?color=3af"></a> + <a href="https://pypi.org/project/paddlenlp/"><img src="https://img.shields.io/pypi/dm/paddlenlp?color=9cf"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleNLP?color=9cc"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleNLP?color=ccf"></a> +</p> +<h4 align="center"> + <a href=#features> Features </a> | + <a href=#installation> Installation </a> | + <a href=#quick-start> Quick Start </a> | + <a href=#api-reference> API Reference </a> | + <a href=#community> Community </a> +</h4> +**PaddleNLP** is an *easy-to-use* and *powerful* NLP library with **Awesome** pre-trained model zoo, supporting wide-range of NLP tasks from research to industrial applications. +## News ๐ข +* ๐ฅ **Latest Features** + * ๐ Release **[UIE-X](./applications/information_extraction)**, an universal information extraction model that supports both document and text inputs. + * โฃ๏ธRelease **[Opinion Mining and Sentiment Analysis Models](./applications/sentiment_analysis/unified_sentiment_extraction)** based on UIE, including abilities of sentence-level and aspect-based sentiment classification, attribute extraction, opinion extraction, attribute aggregation and implicit opinion extraction. +* **2022.9.6 [PaddleNLPv2.4](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.0) Released!** + * ๐ NLP Tools: Released **[Pipelines](./pipelines)** which supports turn-key construction of search engine and question answering systems. It features a flexible design that is applicable for all kinds of NLP systems so you can build end-to-end NLP pipelines like Legos! + * ๐จ Industrial application: Release **[Complete Solution of Text Classification](./applications/text_classification)** covering various scenarios of text classification: multi-class, multi-label and hierarchical, it also supports **few-shot learning** and the training and optimization of **TrustAI**. Upgrade for [**UIE**](./model_zoo/uie) and release **UIE-M**, support both Chinese and English information extraction in a single model; release the data distillation solution for UIE to break the bottleneck of time-consuming of inference. + * ๐ญ AIGC: Release code generation SOTA model [**CodeGen**](./examples/code_generation/codegen) that supports multiple programming languages code generation. Integrate [**Text to Image Model**](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md#%E6%96%87%E5%9B%BE%E7%94%9F%E6%88%90) DALLยทE Mini, Disco Diffusion, Stable Diffusion, let's play and have some fun! + * ๐ช Framework upgrade: Release [**Auto Model Compression API**](./docs/compression.md), supports for pruning and quantization automatically, lower the barriers of model compression; Release [**Few-shot Prompt**](./applications/text_classification/multi_class/few-shot), includes the algorithms such as PET, P-Tuning and RGL. +## Features +#### <a href=#out-of-box-nlp-toolset> ๐ฆ Out-of-Box NLP Toolset </a> +#### <a href=#awesome-chinese-model-zoo> ๐ค Awesome Chinese Model Zoo </a> +#### <a href=#industrial-end-to-end-system> ๐๏ธ Industrial End-to-end System </a> +#### <a href=#high-performance-distributed-training-and-inference> ๐ High Performance Distributed Training and Inference </a> +### Out-of-Box NLP Toolset +Taskflow aims to provide off-the-shelf NLP pre-built task covering NLU and NLG technique, in the meanwhile with extreamly fast infernece satisfying industrial scenario. + +For more usage please refer to [Taskflow Docs](./docs/model_zoo/taskflow.md). +### Awesome Chinese Model Zoo +#### ๐ Comprehensive Chinese Transformer Models +We provide **45+** network architectures and over **500+** pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of the high-quality Chinese pretrained model developed by other organizations. Use `AutoModel` API to **โกSUPER FASTโก** download pretrained models of different architecture. We welcome all developers to contribute your Transformer models to PaddleNLP! +```python +from paddlenlp.transformers import * +ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh') +bert = AutoModel.from_pretrained('bert-wwm-chinese') +albert = AutoModel.from_pretrained('albert-chinese-tiny') +roberta = AutoModel.from_pretrained('roberta-wwm-ext') +electra = AutoModel.from_pretrained('chinese-electra-small') +gpt = AutoModelForPretraining.from_pretrained('gpt-cpm-large-cn') +``` +Due to the computation limitation, you can use the ERNIE-Tiny light models to accelerate the deployment of pretrained models. +```python +# 6L768H +ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh') +# 6L384H +ernie = AutoModel.from_pretrained('ernie-3.0-mini-zh') +# 4L384H +ernie = AutoModel.from_pretrained('ernie-3.0-micro-zh') +# 4L312H +ernie = AutoModel.from_pretrained('ernie-3.0-nano-zh') +``` +Unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc. +```python +import paddle +from paddlenlp.transformers import * +tokenizer = AutoTokenizer.from_pretrained('ernie-3.0-medium-zh') +text = tokenizer('natural language processing') +# Semantic Representation +model = AutoModel.from_pretrained('ernie-3.0-medium-zh') +sequence_output, pooled_output = model(input_ids=paddle.to_tensor([text['input_ids']])) +# Text Classificaiton and Matching +model = AutoModelForSequenceClassification.from_pretrained('ernie-3.0-medium-zh') +# Sequence Labeling +model = AutoModelForTokenClassification.from_pretrained('ernie-3.0-medium-zh') +# Question Answering +model = AutoModelForQuestionAnswering.from_pretrained('ernie-3.0-medium-zh') +``` +#### Wide-range NLP Task Support +PaddleNLP provides rich examples covering mainstream NLP task to help developers accelerate problem solving. You can find our powerful transformer [Model Zoo](./model_zoo), and wide-range NLP application [exmaples](./examples) with detailed instructions. +Also you can run our interactive [Notebook tutorial](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/574995) on AI Studio, a powerful platform with **FREE** computing resource. +<details><summary> PaddleNLP Transformer model summary (<b>click to show details</b>) </summary><div> +| Model | Sequence Classification | Token Classification | Question Answering | Text Generation | Multiple Choice | +| :----------------- | ----------------------- | -------------------- | ------------------ | --------------- | --------------- | +| ALBERT | โ
| โ
| โ
| โ | โ
| +| BART | โ
| โ
| โ
| โ
| โ | +| BERT | โ
| โ
| โ
| โ | โ
| +| BigBird | โ
| โ
| โ
| โ | โ
| +| BlenderBot | โ | โ | โ | โ
| โ | +| ChineseBERT | โ
| โ
| โ
| โ | โ | +| ConvBERT | โ
| โ
| โ
| โ | โ
| +| CTRL | โ
| โ | โ | โ | โ | +| DistilBERT | โ
| โ
| โ
| โ | โ | +| ELECTRA | โ
| โ
| โ
| โ | โ
| +| ERNIE | โ
| โ
| โ
| โ | โ
| +| ERNIE-CTM | โ | โ
| โ | โ | โ | +| ERNIE-Doc | โ
| โ
| โ
| โ | โ | +| ERNIE-GEN | โ | โ | โ | โ
| โ | +| ERNIE-Gram | โ
| โ
| โ
| โ | โ | +| ERNIE-M | โ
| โ
| โ
| โ | โ | +| FNet | โ
| โ
| โ
| โ | โ
| +| Funnel-Transformer | โ
| โ
| โ
| โ | โ | +| GPT | โ
| โ
| โ | โ
| โ | +| LayoutLM | โ
| โ
| โ | โ | โ | +| LayoutLMv2 | โ | โ
| โ | โ | โ | +| LayoutXLM | โ | โ
| โ | โ | โ | +| LUKE | โ | โ
| โ
| โ | โ | +| mBART | โ
| โ | โ
| โ | โ
| +| MegatronBERT | โ
| โ
| โ
| โ | โ
| +| MobileBERT | โ
| โ | โ
| โ | โ | +| MPNet | โ
| โ
| โ
| โ | โ
| +| NEZHA | โ
| โ
| โ
| โ | โ
| +| PP-MiniLM | โ
| โ | โ | โ | โ | +| ProphetNet | โ | โ | โ | โ
| โ | +| Reformer | โ
| โ | โ
| โ | โ | +| RemBERT | โ
| โ
| โ
| โ | โ
| +| RoBERTa | โ
| โ
| โ
| โ | โ
| +| RoFormer | โ
| โ
| โ
| โ | โ | +| SKEP | โ
| โ
| โ | โ | โ | +| SqueezeBERT | โ
| โ
| โ
| โ | โ | +| T5 | โ | โ | โ | โ
| โ | +| TinyBERT | โ
| โ | โ | โ | โ | +| UnifiedTransformer | โ | โ | โ | โ
| โ | +| XLNet | โ
| โ
| โ
| โ | โ
| +</div></details> +For more pretrained model usage, please refer to [Transformer API Docs](./docs/model_zoo/index.rst). +### Industrial End-to-end System +We provide high value scenarios including information extraction, semantic retrieval, questionn answering high-value. +For more details industial cases please refer to [Applications](./applications). +#### ๐ Neural Search System +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168514909-8817d79a-72c4-4be1-8080-93d1f682bb46.gif" width="400"> +</div> +For more details please refer to [Neural Search](./applications/neural_search). +#### โ Question Answering System +We provide question answering pipeline which can support FAQ system, Document-level Visual Question answering system based on [๐RocketQA](https://github.com/PaddlePaddle/RocketQA). +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168514868-1babe981-c675-4f89-9168-dd0a3eede315.gif" width="400"> +</div> +For more details please refer to [Question Answering](./applications/question_answering) and [Document VQA](./applications/document_intelligence/doc_vqa). +#### ๐ Opinion Extraction and Sentiment Analysis +We build an opinion extraction system for product review and fine-grained sentiment analysis based on [SKEP](https://arxiv.org/abs/2005.05635) Model. +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168407260-b7f92800-861c-4207-98f3-2291e0102bbe.png" width="300"> +</div> +For more details please refer to [Sentiment Analysis](./applications/sentiment_analysis). +#### ๐๏ธ Speech Command Analysis +Integrated ASR Model, Information Extraction, we provide a speech command analysis pipeline that show how to use PaddleNLP and [PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech) to solve Speech + NLP real scenarios. +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168412618-04897a47-79c9-4fe7-a054-5dc1f6a1f75c.png" width="500"> +</div> +For more details please refer to [Speech Command Analysis](./applications/speech_cmd_analysis). +### High Performance Distributed Training and Inference +#### โก FastTokenizer: High Performance Text Preprocessing Library +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168407921-b4395b1d-44bd-41a0-8c58-923ba2b703ef.png" width="400"> +</div> +```python +AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True) +``` +Set `use_fast=True` to use C++ Tokenizer kernel to achieve 100x faster on text pre-processing. For more usage please refer to [FastTokenizer](./fast_tokenizer). +#### โก FastGeneration: High Perforance Generation Library +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168407831-914dced0-3a5a-40b8-8a65-ec82bf13e53c.gif" width="400"> +</div> +```python +model = GPTLMHeadModel.from_pretrained('gpt-cpm-large-cn') +outputs, _ = model.generate( + input_ids=inputs_ids, max_length=10, decode_strategy='greedy_search', + use_fast=True) +``` +Set `use_fast=True` to achieve 5x speedup for Transformer, GPT, BART, PLATO, UniLM text generation. For more usage please refer to [FastGeneration](./fast_generation). +#### ๐ Fleet: 4D Hybrid Distributed Training +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168515134-513f13e0-9902-40ef-98fa-528271dcccda.png" width="300"> +</div> +For more super large-scale model pre-training details please refer to [GPT-3](./examples/language_model/gpt-3). +## Installation +### Prerequisites +* python >= 3.7 +* paddlepaddle >= 2.3 +More information about PaddlePaddle installation please refer to [PaddlePaddle's Website](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html). +### Python pip Installation +``` +pip install --upgrade paddlenlp +``` +or you can install the latest develop branch code with the following command: +```shell +pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html +``` +## Quick Start +**Taskflow** aims to provide off-the-shelf NLP pre-built task covering NLU and NLG scenario, in the meanwhile with extreamly fast infernece satisfying industrial applications. +```python +from paddlenlp import Taskflow +# Chinese Word Segmentation +seg = Taskflow("word_segmentation") +seg("็ฌฌๅๅๅฑๅ
จ่ฟไผๅจ่ฅฟๅฎไธพๅ") +>>> ['็ฌฌๅๅๅฑ', 'ๅ
จ่ฟไผ', 'ๅจ', '่ฅฟๅฎ', 'ไธพๅ'] +# POS Tagging +tag = Taskflow("pos_tagging") +tag("็ฌฌๅๅๅฑๅ
จ่ฟไผๅจ่ฅฟๅฎไธพๅ") +>>> [('็ฌฌๅๅๅฑ', 'm'), ('ๅ
จ่ฟไผ', 'nz'), ('ๅจ', 'p'), ('่ฅฟๅฎ', 'LOC'), ('ไธพๅ', 'v')] +# Named Entity Recognition +ner = Taskflow("ner") +ner("ใๅญคๅฅณใๆฏ2010ๅนดไนๅทๅบ็็คพๅบ็็ๅฐ่ฏด๏ผไฝ่
ๆฏไฝๅ
ผ็พฝ") +>>> [('ใ', 'w'), ('ๅญคๅฅณ', 'ไฝๅ็ฑป_ๅฎไฝ'), ('ใ', 'w'), ('ๆฏ', '่ฏๅฎ่ฏ'), ('2010ๅนด', 'ๆถ้ด็ฑป'), ('ไนๅทๅบ็็คพ', '็ป็ปๆบๆ็ฑป'), ('ๅบ็', 'ๅบๆฏไบไปถ'), ('็', 'ๅฉ่ฏ'), ('ๅฐ่ฏด', 'ไฝๅ็ฑป_ๆฆๅฟต'), ('๏ผ', 'w'), ('ไฝ่
', 'ไบบ็ฉ็ฑป_ๆฆๅฟต'), ('ๆฏ', '่ฏๅฎ่ฏ'), ('ไฝๅ
ผ็พฝ', 'ไบบ็ฉ็ฑป_ๅฎไฝ')] +# Dependency Parsing +ddp = Taskflow("dependency_parsing") +ddp("9ๆ9ๆฅไธๅ็บณ่พพๅฐๅจไบ็ยท้ฟไป็ๅบๅป่ดฅไฟ็ฝๆฏ็ๅๆข
ๅพท้ฆๆฐๅคซ") +>>> [{'word': ['9ๆ9ๆฅ', 'ไธๅ', '็บณ่พพๅฐ', 'ๅจ', 'ไบ็ยท้ฟไป็ๅบ', 'ๅป่ดฅ', 'ไฟ็ฝๆฏ', '็ๅ', 'ๆข
ๅพท้ฆๆฐๅคซ'], 'head': [2, 6, 6, 5, 6, 0, 8, 9, 6], 'deprel': ['ATT', 'ADV', 'SBV', 'MT', 'ADV', 'HED', 'ATT', 'ATT', 'VOB']}] +# Sentiment Analysis +senta = Taskflow("sentiment_analysis") +senta("่ฟไธชไบงๅ็จ่ตทๆฅ็็ๅพๆต็
๏ผๆ้ๅธธๅๆฌข") +>>> [{'text': '่ฟไธชไบงๅ็จ่ตทๆฅ็็ๅพๆต็
๏ผๆ้ๅธธๅๆฌข', 'label': 'positive', 'score': 0.9938690066337585}] +``` +## API Reference +- Support [LUGE](https://www.luge.ai/) dataset loading and compatible with Hugging Face [Datasets](https://huggingface.co/datasets). For more details please refer to [Dataset API](https://paddlenlp.readthedocs.io/zh/latest/data_prepare/dataset_list.html). +- Using Hugging Face style API to load 500+ selected transformer models and download with fast speed. For more information please refer to [Transformers API](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/index.html). +- One-line of code to load pre-trained word embedding. For more usage please refer to [Embedding API](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/embeddings.html). +Please find all PaddleNLP API Reference from our [readthedocs](https://paddlenlp.readthedocs.io/). +## Community +### Slack +To connect with other users and contributors, welcome to join our [Slack channel](https://paddlenlp.slack.com/). +### WeChat +Scan the QR code below with your Wechatโฌ๏ธ. You can access to official technical exchange group. Look forward to your participation. +<div align="center"> +<img src="https://user-images.githubusercontent.com/11793384/212060369-4642d16e-f0ad-4359-aa57-b8303042f9c1.jpg" width="150" height="150" /> +</div> +## Citation +If you find PaddleNLP useful in your research, please consider cite +``` +@misc{=paddlenlp, + title={PaddleNLP: An Easy-to-use and High Performance NLP Library}, + author={PaddleNLP Contributors}, + howpublished = {\url{https://github.com/PaddlePaddle/PaddleNLP}}, + year={2021} +} +``` +## Acknowledge +We have borrowed from Hugging Face's [Transformers](https://github.com/huggingface/transformers)๐ค excellent design on pretrained models usage, and we would like to express our gratitude to the authors of Hugging Face and its open source community. +## License +PaddleNLP is provided under the [Apache-2.0 License](./LICENSE). + +%package help +Summary: Development documents and examples for paddlenlp +Provides: python3-paddlenlp-doc +%description help +<p align="center"> + <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleNLP?color=ffa"></a> + <a href=""><img src="https://img.shields.io/badge/python-3.6.2+-aff.svg"></a> + <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleNLP?color=9ea"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleNLP?color=3af"></a> + <a href="https://pypi.org/project/paddlenlp/"><img src="https://img.shields.io/pypi/dm/paddlenlp?color=9cf"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleNLP?color=9cc"></a> + <a href="https://github.com/PaddlePaddle/PaddleNLP/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleNLP?color=ccf"></a> +</p> +<h4 align="center"> + <a href=#features> Features </a> | + <a href=#installation> Installation </a> | + <a href=#quick-start> Quick Start </a> | + <a href=#api-reference> API Reference </a> | + <a href=#community> Community </a> +</h4> +**PaddleNLP** is an *easy-to-use* and *powerful* NLP library with **Awesome** pre-trained model zoo, supporting wide-range of NLP tasks from research to industrial applications. +## News ๐ข +* ๐ฅ **Latest Features** + * ๐ Release **[UIE-X](./applications/information_extraction)**, an universal information extraction model that supports both document and text inputs. + * โฃ๏ธRelease **[Opinion Mining and Sentiment Analysis Models](./applications/sentiment_analysis/unified_sentiment_extraction)** based on UIE, including abilities of sentence-level and aspect-based sentiment classification, attribute extraction, opinion extraction, attribute aggregation and implicit opinion extraction. +* **2022.9.6 [PaddleNLPv2.4](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.0) Released!** + * ๐ NLP Tools: Released **[Pipelines](./pipelines)** which supports turn-key construction of search engine and question answering systems. It features a flexible design that is applicable for all kinds of NLP systems so you can build end-to-end NLP pipelines like Legos! + * ๐จ Industrial application: Release **[Complete Solution of Text Classification](./applications/text_classification)** covering various scenarios of text classification: multi-class, multi-label and hierarchical, it also supports **few-shot learning** and the training and optimization of **TrustAI**. Upgrade for [**UIE**](./model_zoo/uie) and release **UIE-M**, support both Chinese and English information extraction in a single model; release the data distillation solution for UIE to break the bottleneck of time-consuming of inference. + * ๐ญ AIGC: Release code generation SOTA model [**CodeGen**](./examples/code_generation/codegen) that supports multiple programming languages code generation. Integrate [**Text to Image Model**](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md#%E6%96%87%E5%9B%BE%E7%94%9F%E6%88%90) DALLยทE Mini, Disco Diffusion, Stable Diffusion, let's play and have some fun! + * ๐ช Framework upgrade: Release [**Auto Model Compression API**](./docs/compression.md), supports for pruning and quantization automatically, lower the barriers of model compression; Release [**Few-shot Prompt**](./applications/text_classification/multi_class/few-shot), includes the algorithms such as PET, P-Tuning and RGL. +## Features +#### <a href=#out-of-box-nlp-toolset> ๐ฆ Out-of-Box NLP Toolset </a> +#### <a href=#awesome-chinese-model-zoo> ๐ค Awesome Chinese Model Zoo </a> +#### <a href=#industrial-end-to-end-system> ๐๏ธ Industrial End-to-end System </a> +#### <a href=#high-performance-distributed-training-and-inference> ๐ High Performance Distributed Training and Inference </a> +### Out-of-Box NLP Toolset +Taskflow aims to provide off-the-shelf NLP pre-built task covering NLU and NLG technique, in the meanwhile with extreamly fast infernece satisfying industrial scenario. + +For more usage please refer to [Taskflow Docs](./docs/model_zoo/taskflow.md). +### Awesome Chinese Model Zoo +#### ๐ Comprehensive Chinese Transformer Models +We provide **45+** network architectures and over **500+** pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of the high-quality Chinese pretrained model developed by other organizations. Use `AutoModel` API to **โกSUPER FASTโก** download pretrained models of different architecture. We welcome all developers to contribute your Transformer models to PaddleNLP! +```python +from paddlenlp.transformers import * +ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh') +bert = AutoModel.from_pretrained('bert-wwm-chinese') +albert = AutoModel.from_pretrained('albert-chinese-tiny') +roberta = AutoModel.from_pretrained('roberta-wwm-ext') +electra = AutoModel.from_pretrained('chinese-electra-small') +gpt = AutoModelForPretraining.from_pretrained('gpt-cpm-large-cn') +``` +Due to the computation limitation, you can use the ERNIE-Tiny light models to accelerate the deployment of pretrained models. +```python +# 6L768H +ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh') +# 6L384H +ernie = AutoModel.from_pretrained('ernie-3.0-mini-zh') +# 4L384H +ernie = AutoModel.from_pretrained('ernie-3.0-micro-zh') +# 4L312H +ernie = AutoModel.from_pretrained('ernie-3.0-nano-zh') +``` +Unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc. +```python +import paddle +from paddlenlp.transformers import * +tokenizer = AutoTokenizer.from_pretrained('ernie-3.0-medium-zh') +text = tokenizer('natural language processing') +# Semantic Representation +model = AutoModel.from_pretrained('ernie-3.0-medium-zh') +sequence_output, pooled_output = model(input_ids=paddle.to_tensor([text['input_ids']])) +# Text Classificaiton and Matching +model = AutoModelForSequenceClassification.from_pretrained('ernie-3.0-medium-zh') +# Sequence Labeling +model = AutoModelForTokenClassification.from_pretrained('ernie-3.0-medium-zh') +# Question Answering +model = AutoModelForQuestionAnswering.from_pretrained('ernie-3.0-medium-zh') +``` +#### Wide-range NLP Task Support +PaddleNLP provides rich examples covering mainstream NLP task to help developers accelerate problem solving. You can find our powerful transformer [Model Zoo](./model_zoo), and wide-range NLP application [exmaples](./examples) with detailed instructions. +Also you can run our interactive [Notebook tutorial](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/574995) on AI Studio, a powerful platform with **FREE** computing resource. +<details><summary> PaddleNLP Transformer model summary (<b>click to show details</b>) </summary><div> +| Model | Sequence Classification | Token Classification | Question Answering | Text Generation | Multiple Choice | +| :----------------- | ----------------------- | -------------------- | ------------------ | --------------- | --------------- | +| ALBERT | โ
| โ
| โ
| โ | โ
| +| BART | โ
| โ
| โ
| โ
| โ | +| BERT | โ
| โ
| โ
| โ | โ
| +| BigBird | โ
| โ
| โ
| โ | โ
| +| BlenderBot | โ | โ | โ | โ
| โ | +| ChineseBERT | โ
| โ
| โ
| โ | โ | +| ConvBERT | โ
| โ
| โ
| โ | โ
| +| CTRL | โ
| โ | โ | โ | โ | +| DistilBERT | โ
| โ
| โ
| โ | โ | +| ELECTRA | โ
| โ
| โ
| โ | โ
| +| ERNIE | โ
| โ
| โ
| โ | โ
| +| ERNIE-CTM | โ | โ
| โ | โ | โ | +| ERNIE-Doc | โ
| โ
| โ
| โ | โ | +| ERNIE-GEN | โ | โ | โ | โ
| โ | +| ERNIE-Gram | โ
| โ
| โ
| โ | โ | +| ERNIE-M | โ
| โ
| โ
| โ | โ | +| FNet | โ
| โ
| โ
| โ | โ
| +| Funnel-Transformer | โ
| โ
| โ
| โ | โ | +| GPT | โ
| โ
| โ | โ
| โ | +| LayoutLM | โ
| โ
| โ | โ | โ | +| LayoutLMv2 | โ | โ
| โ | โ | โ | +| LayoutXLM | โ | โ
| โ | โ | โ | +| LUKE | โ | โ
| โ
| โ | โ | +| mBART | โ
| โ | โ
| โ | โ
| +| MegatronBERT | โ
| โ
| โ
| โ | โ
| +| MobileBERT | โ
| โ | โ
| โ | โ | +| MPNet | โ
| โ
| โ
| โ | โ
| +| NEZHA | โ
| โ
| โ
| โ | โ
| +| PP-MiniLM | โ
| โ | โ | โ | โ | +| ProphetNet | โ | โ | โ | โ
| โ | +| Reformer | โ
| โ | โ
| โ | โ | +| RemBERT | โ
| โ
| โ
| โ | โ
| +| RoBERTa | โ
| โ
| โ
| โ | โ
| +| RoFormer | โ
| โ
| โ
| โ | โ | +| SKEP | โ
| โ
| โ | โ | โ | +| SqueezeBERT | โ
| โ
| โ
| โ | โ | +| T5 | โ | โ | โ | โ
| โ | +| TinyBERT | โ
| โ | โ | โ | โ | +| UnifiedTransformer | โ | โ | โ | โ
| โ | +| XLNet | โ
| โ
| โ
| โ | โ
| +</div></details> +For more pretrained model usage, please refer to [Transformer API Docs](./docs/model_zoo/index.rst). +### Industrial End-to-end System +We provide high value scenarios including information extraction, semantic retrieval, questionn answering high-value. +For more details industial cases please refer to [Applications](./applications). +#### ๐ Neural Search System +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168514909-8817d79a-72c4-4be1-8080-93d1f682bb46.gif" width="400"> +</div> +For more details please refer to [Neural Search](./applications/neural_search). +#### โ Question Answering System +We provide question answering pipeline which can support FAQ system, Document-level Visual Question answering system based on [๐RocketQA](https://github.com/PaddlePaddle/RocketQA). +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168514868-1babe981-c675-4f89-9168-dd0a3eede315.gif" width="400"> +</div> +For more details please refer to [Question Answering](./applications/question_answering) and [Document VQA](./applications/document_intelligence/doc_vqa). +#### ๐ Opinion Extraction and Sentiment Analysis +We build an opinion extraction system for product review and fine-grained sentiment analysis based on [SKEP](https://arxiv.org/abs/2005.05635) Model. +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168407260-b7f92800-861c-4207-98f3-2291e0102bbe.png" width="300"> +</div> +For more details please refer to [Sentiment Analysis](./applications/sentiment_analysis). +#### ๐๏ธ Speech Command Analysis +Integrated ASR Model, Information Extraction, we provide a speech command analysis pipeline that show how to use PaddleNLP and [PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech) to solve Speech + NLP real scenarios. +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168412618-04897a47-79c9-4fe7-a054-5dc1f6a1f75c.png" width="500"> +</div> +For more details please refer to [Speech Command Analysis](./applications/speech_cmd_analysis). +### High Performance Distributed Training and Inference +#### โก FastTokenizer: High Performance Text Preprocessing Library +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168407921-b4395b1d-44bd-41a0-8c58-923ba2b703ef.png" width="400"> +</div> +```python +AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True) +``` +Set `use_fast=True` to use C++ Tokenizer kernel to achieve 100x faster on text pre-processing. For more usage please refer to [FastTokenizer](./fast_tokenizer). +#### โก FastGeneration: High Perforance Generation Library +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168407831-914dced0-3a5a-40b8-8a65-ec82bf13e53c.gif" width="400"> +</div> +```python +model = GPTLMHeadModel.from_pretrained('gpt-cpm-large-cn') +outputs, _ = model.generate( + input_ids=inputs_ids, max_length=10, decode_strategy='greedy_search', + use_fast=True) +``` +Set `use_fast=True` to achieve 5x speedup for Transformer, GPT, BART, PLATO, UniLM text generation. For more usage please refer to [FastGeneration](./fast_generation). +#### ๐ Fleet: 4D Hybrid Distributed Training +<div align="center"> + <img src="https://user-images.githubusercontent.com/11793384/168515134-513f13e0-9902-40ef-98fa-528271dcccda.png" width="300"> +</div> +For more super large-scale model pre-training details please refer to [GPT-3](./examples/language_model/gpt-3). +## Installation +### Prerequisites +* python >= 3.7 +* paddlepaddle >= 2.3 +More information about PaddlePaddle installation please refer to [PaddlePaddle's Website](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html). +### Python pip Installation +``` +pip install --upgrade paddlenlp +``` +or you can install the latest develop branch code with the following command: +```shell +pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html +``` +## Quick Start +**Taskflow** aims to provide off-the-shelf NLP pre-built task covering NLU and NLG scenario, in the meanwhile with extreamly fast infernece satisfying industrial applications. +```python +from paddlenlp import Taskflow +# Chinese Word Segmentation +seg = Taskflow("word_segmentation") +seg("็ฌฌๅๅๅฑๅ
จ่ฟไผๅจ่ฅฟๅฎไธพๅ") +>>> ['็ฌฌๅๅๅฑ', 'ๅ
จ่ฟไผ', 'ๅจ', '่ฅฟๅฎ', 'ไธพๅ'] +# POS Tagging +tag = Taskflow("pos_tagging") +tag("็ฌฌๅๅๅฑๅ
จ่ฟไผๅจ่ฅฟๅฎไธพๅ") +>>> [('็ฌฌๅๅๅฑ', 'm'), ('ๅ
จ่ฟไผ', 'nz'), ('ๅจ', 'p'), ('่ฅฟๅฎ', 'LOC'), ('ไธพๅ', 'v')] +# Named Entity Recognition +ner = Taskflow("ner") +ner("ใๅญคๅฅณใๆฏ2010ๅนดไนๅทๅบ็็คพๅบ็็ๅฐ่ฏด๏ผไฝ่
ๆฏไฝๅ
ผ็พฝ") +>>> [('ใ', 'w'), ('ๅญคๅฅณ', 'ไฝๅ็ฑป_ๅฎไฝ'), ('ใ', 'w'), ('ๆฏ', '่ฏๅฎ่ฏ'), ('2010ๅนด', 'ๆถ้ด็ฑป'), ('ไนๅทๅบ็็คพ', '็ป็ปๆบๆ็ฑป'), ('ๅบ็', 'ๅบๆฏไบไปถ'), ('็', 'ๅฉ่ฏ'), ('ๅฐ่ฏด', 'ไฝๅ็ฑป_ๆฆๅฟต'), ('๏ผ', 'w'), ('ไฝ่
', 'ไบบ็ฉ็ฑป_ๆฆๅฟต'), ('ๆฏ', '่ฏๅฎ่ฏ'), ('ไฝๅ
ผ็พฝ', 'ไบบ็ฉ็ฑป_ๅฎไฝ')] +# Dependency Parsing +ddp = Taskflow("dependency_parsing") +ddp("9ๆ9ๆฅไธๅ็บณ่พพๅฐๅจไบ็ยท้ฟไป็ๅบๅป่ดฅไฟ็ฝๆฏ็ๅๆข
ๅพท้ฆๆฐๅคซ") +>>> [{'word': ['9ๆ9ๆฅ', 'ไธๅ', '็บณ่พพๅฐ', 'ๅจ', 'ไบ็ยท้ฟไป็ๅบ', 'ๅป่ดฅ', 'ไฟ็ฝๆฏ', '็ๅ', 'ๆข
ๅพท้ฆๆฐๅคซ'], 'head': [2, 6, 6, 5, 6, 0, 8, 9, 6], 'deprel': ['ATT', 'ADV', 'SBV', 'MT', 'ADV', 'HED', 'ATT', 'ATT', 'VOB']}] +# Sentiment Analysis +senta = Taskflow("sentiment_analysis") +senta("่ฟไธชไบงๅ็จ่ตทๆฅ็็ๅพๆต็
๏ผๆ้ๅธธๅๆฌข") +>>> [{'text': '่ฟไธชไบงๅ็จ่ตทๆฅ็็ๅพๆต็
๏ผๆ้ๅธธๅๆฌข', 'label': 'positive', 'score': 0.9938690066337585}] +``` +## API Reference +- Support [LUGE](https://www.luge.ai/) dataset loading and compatible with Hugging Face [Datasets](https://huggingface.co/datasets). For more details please refer to [Dataset API](https://paddlenlp.readthedocs.io/zh/latest/data_prepare/dataset_list.html). +- Using Hugging Face style API to load 500+ selected transformer models and download with fast speed. For more information please refer to [Transformers API](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/index.html). +- One-line of code to load pre-trained word embedding. For more usage please refer to [Embedding API](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/embeddings.html). +Please find all PaddleNLP API Reference from our [readthedocs](https://paddlenlp.readthedocs.io/). +## Community +### Slack +To connect with other users and contributors, welcome to join our [Slack channel](https://paddlenlp.slack.com/). +### WeChat +Scan the QR code below with your Wechatโฌ๏ธ. You can access to official technical exchange group. Look forward to your participation. +<div align="center"> +<img src="https://user-images.githubusercontent.com/11793384/212060369-4642d16e-f0ad-4359-aa57-b8303042f9c1.jpg" width="150" height="150" /> +</div> +## Citation +If you find PaddleNLP useful in your research, please consider cite +``` +@misc{=paddlenlp, + title={PaddleNLP: An Easy-to-use and High Performance NLP Library}, + author={PaddleNLP Contributors}, + howpublished = {\url{https://github.com/PaddlePaddle/PaddleNLP}}, + year={2021} +} +``` +## Acknowledge +We have borrowed from Hugging Face's [Transformers](https://github.com/huggingface/transformers)๐ค excellent design on pretrained models usage, and we would like to express our gratitude to the authors of Hugging Face and its open source community. +## License +PaddleNLP is provided under the [Apache-2.0 License](./LICENSE). + +%prep +%autosetup -n paddlenlp-2.5.2 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-paddlenlp -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 2.5.2-1 +- Package Spec generated @@ -0,0 +1 @@ +180cdfcfebfa17d14f5537347a5c0e0f paddlenlp-2.5.2.tar.gz |