automatic import of python-sadedegel

author: CoprDistGit <infra@openeuler.org> 2023-05-31 04:13:21 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-31 04:13:21 +0000
commit: 65991f6089d1b3959672fa5505dd0490ea1c8e4d (patch)
tree: e3dad901119e67de1afef4ec6d08079582c98ef6
parent: 427c9230b5d9653fa1f72d270ad7fc9c606740b3 (diff)
3 files changed, 1173 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..c229b36 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/sadedegel-0.21.2.tar.gz
diff --git a/python-sadedegel.spec b/python-sadedegel.spec
new file mode 100644
index 0000000..2ca53e7
--- /dev/null
+++ b/python-sadedegel.spec
@@ -0,0 +1,1171 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-sadedegel
+Version:	0.21.2
+Release:	1
+Summary:	Extraction-based Turkish news summarizer.
+License:	MIT
+URL:		https://github.com/GlobalMaksimum/sadedegel
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/da/1a/138f91345a46559f8130190c83722791da5e36166acc2893a54bf97a8343/sadedegel-0.21.2.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-loguru
+Requires:	python3-click
+Requires:	python3-smart-open
+Requires:	python3-uvicorn
+Requires:	python3-fastapi
+Requires:	python3-scikit-learn
+Requires:	python3-nltk
+Requires:	python3-networkx
+Requires:	python3-tabulate
+Requires:	python3-sadedegel-icu
+Requires:	python3-requests
+Requires:	python3-rich
+Requires:	python3-cached-property
+Requires:	python3-h5py
+Requires:	python3-sentence-transformers
+Requires:	python3-gensim
+
+%description
+<a href="http://sadedegel.ai"><img src="https://sadedegel.ai/assets/img/logo-2.png" width="125" height="125" align="right" /></a>
+
+# SadedeGel: A General Purpose NLP library for Turkish
+
+SadedeGel is initially designed to be a library for unsupervised extraction-based news summarization using several old and new NLP techniques.
+
+Development of the library started as a part of [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/) in which SadedeGel was the **2nd place winner**.
+
+We are keeping on adding features with the goal of becoming a general purpose open source NLP library for Turkish language.
+
+
+💫 **Version 0.21 out now!**
+[Check out the release notes here.](https://github.com/GlobalMaksimum/sadedegel/releases)
+
+
+![Python package](https://github.com/GlobalMaksimum/sadedegel/workflows/Python%20package/badge.svg)
+[![Python Version](https://img.shields.io/pypi/pyversions/sadedegel?style=plastic)](https://img.shields.io/pypi/pyversions/sadedegel)
+[![Coverage](https://codecov.io/gh/globalmaksimum/sadedegel/branch/master/graphs/badge.svg?style=plastic)](https://codecov.io/gh/globalmaksimum/sadedegel)
+[![pypi Version](https://img.shields.io/pypi/v/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![PyPi downloads](https://img.shields.io/pypi/dm/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![License](https://img.shields.io/pypi/l/sadedegel)](https://github.com/GlobalMaksimum/sadedegel/blob/master/LICENSE)
+![Commit Month](https://img.shields.io/github/commit-activity/m/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Commit Week](https://img.shields.io/github/commit-activity/w/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Last Commit](https://img.shields.io/github/last-commit/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+[![Binder](https://mybinder.org/badge_logo.svg?style=plastic)](https://mybinder.org/v2/gh/GlobalMaksimum/sadedegel.git/master?filepath=notebook%2FBasics.ipynb)
+[![Slack](https://img.shields.io/static/v1?logo=slack&style=plastic&color=blueviolet&label=slack&labelColor=grey&message=sadedegel)](https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A)
+[![Kaggle](http://img.shields.io/static/v1?logo=kaggle&style=plastic&color=blue&label=kaggle&labelColor=grey&message=notebooks)](https://www.kaggle.com/search?q=sadedegel+in%3Anotebooks)
+
+
+## 📖 Documentation
+
+| Documentation   |                                                                |
+| --------------- | -------------------------------------------------------------- |
+| [Contribute]    | How to contribute to the sadedeGel project and code base.          |
+
+[contribute]: https://github.com/GlobalMaksimum/sadedegel/blob/master/CONTRIBUTING.md
+
+## 💬 Where to ask questions
+
+The SadedeGel project is initialized by [@globalmaksimum](https://github.com/GlobalMaksimum) AI team members
+[@dafajon](https://github.com/dafajon),
+[@askarbozcan](https://github.com/askarbozcan),
+[@mccakir](https://github.com/mccakir),
+[@husnusensoy](https://github.com/husnusensoy) and 
+[@ertugruldemir](https://github.com/ertugrul-dmr).
+
+
+Other community maintainers
+
+* [@doruktiktiklar](https://github.com/doruktiktiklar) contributes [TFIDF Summarizer](sadedegel/summarize/tf_idf.py)
+
+| Type                     | Platforms                                              |
+| ------------------------ | ------------------------------------------------------ |
+| 🚨 **Bug Reports**       | [GitHub Issue Tracker]                                 |
+| 🎁 **Feature Requests**  | [GitHub Issue Tracker]                                 |
+| <img width="18" height="18" src="https://www.freeiconspng.com/uploads/slack-icon-2.png"/> **Questions**  | [Slack Workspace]                                 |
+
+[github issue tracker]: https://github.com/GlobalMaksimum/sadedegel/issues
+[Slack Workspace]: https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A
+
+
+## Features
+
+* Several datasets
+  * Basic corpus
+      * Raw corpus (`sadedegel.dataset.load_raw_corpus`)
+      * Sentences tokenized corpus (`sadedegel.dataset.load_sentences_corpus`)  
+      * Human annotated summary corpus (`sadedegel.dataset.load_annotated_corpus`)   
+  * [Extended corpus](sadedegel/dataset/README.md)
+      * Raw corpus (`sadedegel.dataset.extended.load_extended_raw_corpus`)
+      * Sentences tokenized corpus (`sadedegel.dataset.extended.load_extended_sents_corpus`)
+      
+  * TsCorpus(`sadedegel.dataset.tscorpus`)
+      * Thanks to [Taner Sezer](https://github.com/tanerim), over 300K documents from tscorpus is also a part of sadedegel. Allowing us to
+        * [Evaluate](sadedegel/bblock/TOKENIZER.md) our tokenizers (word tokenizers)
+        * Build our [prebuilt news category classifier](sadedegel/prebuilt/README.md) 
+  * Various domain specific [datasets](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/dataset) (e-commerce, social media, tourism etc.) 
+* ML based sentence boundary detector (**SBD**) trained for Turkish language
+* Sadedegel Extractive Summarizers
+  * Various baseline summarizers
+    * Position Summarizer
+    * Length Summarizer
+    * Band Summarizer
+    * Random Summarizer
+  
+  * Various unsupervised/supervised summarizers
+    * ROUGE1 Summarizer
+    * TextRank Summarizer
+    * Cluster Summarizer
+    * Lexrank Summarizer
+    * BM25 Summarizer
+    * TfIdf Summarizer
+ 
+* Various Word Tokenizers
+  * BERT Tokenizer - Trained tokenizer (`pip install sadedegel[bert]`)
+  * Simple Tokenizer - Regex Based
+  * IcU Tokenizer (default by `0.19`)
+  
+* Various Sparse and Dense Embeddings implemented for `Sentences` and `Document` objects.
+  * BERT Embeddings (`pip install sadedegel[bert]`)
+  * TfIdf Embeddings
+ 
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension) 
+  
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension)
+  
+* [**Experimental**] Prebuilt models for several common NLP tasks ([`sadedegel.prebuilt`](sadedegel/prebuilt/README.md)).
+
+```python
+from sadedegel.prebuilt import news_classification
+
+model = news_classification.load()
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+y_pred = model.predict([doc_str])
+```
+
+📖 **For more details, refer to [sadedegel.ai](http://sadedegel.ai)**
+
+## Install sadedeGel
+
+- **Operating system**: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual
+  Studio)
+- **Python version**: 3.6+ (only 64 bit)
+- **Package managers**: [pip] 
+
+[pip]: https://pypi.org/project/sadedegel/
+
+### pip
+
+Using pip, sadedeGel releases are available as source packages and binary wheels.
+
+```bash
+pip install sadedegel
+```
+or update now
+
+```bash
+pip install sadedegel -U
+```
+
+When using pip it is generally recommended to install packages in a virtual
+environment to avoid modifying system state:
+
+```bash
+python -m venv .env
+source .env/bin/activate
+pip install sadedegel
+```
+
+#### Vocabulary Dump
+
+Certaing attributes of SadedeGel's NLP objects are dependent on shipped vocabulary dumps that are created over `sadedegel.dataset.extened_corpus` via each of the existing SadedeGel tokenizers. Those tokenizers are listed above. If you want to re-train a specific tokenizer's vocabulary with custom settings:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple] 
+```
+
+This will create a vocabulary dump using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+For all options to customize your vocab dump refer to:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary --help 
+```
+
+#### Optional
+
+To keep core sadedegel as light as possible we decomposed our initial monolitic design.
+
+To enable BERT embeddings and related capabilities use
+
+```bash
+pip install sadedegel[bert]
+```
+
+We ship 100-dimension word vectors with the library. If you need to re-train those word embeddings you can use
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple] --w2v
+```
+`--w2v` option requires `w2v` option to be installed. To install option use
+
+This will create a vocabulary dump with keyed vectors of arbitrary size using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+
+```bash
+pip install sadedegel[w2v]
+```
+
+### Quickstart with SadedeGel
+
+To load SadedeGel, use `sadedegel.load()`
+
+```python
+from sadedegel import Doc
+from sadedegel.dataset import load_raw_corpus
+from sadedegel.summarize import Rouge1Summarizer
+
+raw = load_raw_corpus()
+
+d = Doc(next(raw))
+
+summarizer = Rouge1Summarizer()
+summarizer(d, k=5)
+```
+
+To trigger sadedeGel NLP pipeline, initialize `Doc` instance with a document string.
+
+Access all sentences using Python built-in `list` function.
+
+```python
+from sadedegel import Doc
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+doc = Doc(doc_str)
+
+list(doc)
+```
+```python
+['Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı.',
+ 'Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok daha büyük ölçekte.',
+ 'Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.',
+ 'IBM 650 Model I adını taşıyan bilgisayarın satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı.',
+ 'Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı.',
+ 'Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.']
+```
+
+Access sentences by index.
+
+```python
+doc[2]
+```
+
+```python
+Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.
+```
+
+## SadedeGel Server
+In order to integrate with your applications we provide a quick summarizer server with sadedeGel.
+
+```bash
+python3 -m sadedegel.server 
+```
+
+### SadedeGel Server on Heroku
+[SadedeGel Server](https://sadedegel.herokuapp.com/api/info) is hosted on free tier of [Heroku](https://heroku.com) cloud services.
+
+* [OpenAPI Documentation](https://sadedegel.herokuapp.com/docs)
+* [Redoc Documentation](https://sadedegel.herokuapp.com/redoc)
+* [Redirection to sadedegel.ai](https://sadedegel.herokuapp.com)
+
+## PyLint, Flake8 and Bandit
+sadedeGel utilized [pylint](https://www.pylint.org/) for static code analysis, 
+[flake8](https://flake8.pycqa.org/en/latest) for code styling and [bandit](https://pypi.org/project/bandit) 
+for code security check.
+
+To run all tests
+
+```bash
+make lint
+```
+
+## Run tests
+
+sadedeGel comes with an [extensive test suite](sadedegel/tests). In order to run the
+tests, you'll usually want to clone the repository and build sadedeGel from source.
+This will also install the required development dependencies and test utilities
+defined in the `requirements.txt`.
+
+Alternatively, you can find out where sadedeGel is installed and run `pytest` on
+that directory. Don't forget to also install the test utilities via sadedeGel's
+`requirements.txt`:
+
+```bash
+make test
+```
+
+## 📓 Kaggle
+
+* Check [comprehensive notebook](https://www.kaggle.com/datafan07/clickbait-news-classification-using-sadedegel) of Kaggle Master [Ertugrul Demir](https://www.kaggle.com/datafan07) explaining the capabilities of sadedegel on Turkish clickbate dataset
+
+
+## Youtube Channel
+Some videos from [sadedeGel YouTube Channel](https://www.youtube.com/channel/UCyNG1Mehl44XWZ8LzkColuw)
+
+### SkyLab YTU Webinar Playlist
+
+[![Youtube](https://img.shields.io/youtube/likes/xoEERspk6Is?label=SadedeGel%20Subprojects%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=xoEERspk6Is)
+
+[![Youtube](https://img.shields.io/youtube/likes/HfWIzAwf5u8?label=SadedeGel%20Scraper%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=HfWIzAwf5u8)
+
+[![Youtube](https://img.shields.io/youtube/likes/PkUmYhahiMw?label=SadedeGel%20Evaluation-nDCG%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=PkUmYhahiMw)
+
+[![Youtube](https://img.shields.io/youtube/likes/AxpK7fOndRQ?label=SadedeGel%20Annotator%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=AxpK7fOndRQ)
+
+[![Youtube](https://img.shields.io/youtube/likes/jKh_t9ZOJ-g?label=SadedeGel%20Baseline%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=jKh_t9ZOJ-g)
+
+[![Youtube](https://img.shields.io/youtube/likes/3DO1X7de1FI?label=SadedeGel%20ROUGE1%20Özetleyici%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=3DO1X7de1FI)
+
+[![Youtube](https://img.shields.io/youtube/likes/KGg3DJQVH9c?label=SadedeGel%20Kümeleme%20Bazlı%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=KGg3DJQVH9c)
+
+[![Youtube](https://img.shields.io/youtube/likes/G_erifsGGFs?label=SadedeGel%20BERT%20Embeddings%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=G_erifsGGFs)
+
+## References
+
+### Special Thanks
+
+* [Starlang Software](https://starlangyazilim.com/) for their contribution to open source Turkish NLP development and corpus preperation.
+
+* [Olcay Taner Yıldız, Ph.D.](https://github.com/olcaytaner), one of our refrees in [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/), for helping our development on sadedegel.
+
+* [Taner Sezer](https://github.com/tanerim) for his contribution on tokenization corpus and labeled news corpus.
+
+### Our Community Contributors
+
+We would like to thank our community contributors for their bug/enhancement requests and questions to make sadedeGel better everyday
+
+* [Burak Işıklı](https://github.com/burakisikli)
+
+### Software Engineering
+* Special thanks to [spaCy](https://github.com/explosion/spaCy) project for their work in showing us the way to implement a proper python module rather than merely explaining it.
+    * We have borrowed many document and style related stuff from their code base :smile:
+    
+* There are a few free-tier service providers we need to thank:
+  * [GitHub](https://github.com) for
+      * Hosting our projects.
+      * Making it possible to collobrate easily.
+      * Automating our SLM via [Github Actions](https://github.com/features/actions)
+  * [Google Cloud Google Storage Service](https://cloud.google.com/products/storage) for providing low cost storage buckets making it possible to store `sadedegel.dataset.extended` data.
+  * [Heroku](https://heroku.com) for hosting [sadedeGel Server](https://sadedegel.herokuapp.com/api/info) in their free tier dynos.
+  * [CodeCov](https://codecov.io/) for allowing us to transparently share our [test coverage](https://codecov.io/gh/globalmaksimum/sadedegel)
+  * [PyPI](https://pypi.org/) for allowing us to share [sadedegel](https://pypi.org/project/sadedegel) with you.
+  * [binder](https://mybinder.org/) for 
+     * Allowing us to share our example [notebooks](notebook/)
+     * Hosting our learn by example boxes in [sadedegel.ai](http://sadedegel.ai) 
+    
+### Machine Learning (ML), Deep Learning (DL) and Natural Language Processing (NLP)
+* Resources on Extractive Text Summarization:
+
+    * [Leveraging BERT for Extractive Text Summarization on Lectures](https://arxiv.org/abs/1906.04165)  by Derek Miller
+    * [Fine-tune BERT for Extractive Summarization](https://arxiv.org/pdf/1903.10318.pdf) by Yang Liu
+
+* Other NLP related references
+
+    * [ROUGE: A Package for Automatic Evaluation of Summaries](https://www.aclweb.org/anthology/W04-1013.pdf)
+    * [Speech and Language Processing, Second Edition](https://web.stanford.edu/~jurafsky/slp3/)
+
+
+
+
+%package -n python3-sadedegel
+Summary:	Extraction-based Turkish news summarizer.
+Provides:	python-sadedegel
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-sadedegel
+<a href="http://sadedegel.ai"><img src="https://sadedegel.ai/assets/img/logo-2.png" width="125" height="125" align="right" /></a>
+
+# SadedeGel: A General Purpose NLP library for Turkish
+
+SadedeGel is initially designed to be a library for unsupervised extraction-based news summarization using several old and new NLP techniques.
+
+Development of the library started as a part of [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/) in which SadedeGel was the **2nd place winner**.
+
+We are keeping on adding features with the goal of becoming a general purpose open source NLP library for Turkish language.
+
+
+💫 **Version 0.21 out now!**
+[Check out the release notes here.](https://github.com/GlobalMaksimum/sadedegel/releases)
+
+
+![Python package](https://github.com/GlobalMaksimum/sadedegel/workflows/Python%20package/badge.svg)
+[![Python Version](https://img.shields.io/pypi/pyversions/sadedegel?style=plastic)](https://img.shields.io/pypi/pyversions/sadedegel)
+[![Coverage](https://codecov.io/gh/globalmaksimum/sadedegel/branch/master/graphs/badge.svg?style=plastic)](https://codecov.io/gh/globalmaksimum/sadedegel)
+[![pypi Version](https://img.shields.io/pypi/v/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![PyPi downloads](https://img.shields.io/pypi/dm/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![License](https://img.shields.io/pypi/l/sadedegel)](https://github.com/GlobalMaksimum/sadedegel/blob/master/LICENSE)
+![Commit Month](https://img.shields.io/github/commit-activity/m/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Commit Week](https://img.shields.io/github/commit-activity/w/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Last Commit](https://img.shields.io/github/last-commit/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+[![Binder](https://mybinder.org/badge_logo.svg?style=plastic)](https://mybinder.org/v2/gh/GlobalMaksimum/sadedegel.git/master?filepath=notebook%2FBasics.ipynb)
+[![Slack](https://img.shields.io/static/v1?logo=slack&style=plastic&color=blueviolet&label=slack&labelColor=grey&message=sadedegel)](https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A)
+[![Kaggle](http://img.shields.io/static/v1?logo=kaggle&style=plastic&color=blue&label=kaggle&labelColor=grey&message=notebooks)](https://www.kaggle.com/search?q=sadedegel+in%3Anotebooks)
+
+
+## 📖 Documentation
+
+| Documentation   |                                                                |
+| --------------- | -------------------------------------------------------------- |
+| [Contribute]    | How to contribute to the sadedeGel project and code base.          |
+
+[contribute]: https://github.com/GlobalMaksimum/sadedegel/blob/master/CONTRIBUTING.md
+
+## 💬 Where to ask questions
+
+The SadedeGel project is initialized by [@globalmaksimum](https://github.com/GlobalMaksimum) AI team members
+[@dafajon](https://github.com/dafajon),
+[@askarbozcan](https://github.com/askarbozcan),
+[@mccakir](https://github.com/mccakir),
+[@husnusensoy](https://github.com/husnusensoy) and 
+[@ertugruldemir](https://github.com/ertugrul-dmr).
+
+
+Other community maintainers
+
+* [@doruktiktiklar](https://github.com/doruktiktiklar) contributes [TFIDF Summarizer](sadedegel/summarize/tf_idf.py)
+
+| Type                     | Platforms                                              |
+| ------------------------ | ------------------------------------------------------ |
+| 🚨 **Bug Reports**       | [GitHub Issue Tracker]                                 |
+| 🎁 **Feature Requests**  | [GitHub Issue Tracker]                                 |
+| <img width="18" height="18" src="https://www.freeiconspng.com/uploads/slack-icon-2.png"/> **Questions**  | [Slack Workspace]                                 |
+
+[github issue tracker]: https://github.com/GlobalMaksimum/sadedegel/issues
+[Slack Workspace]: https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A
+
+
+## Features
+
+* Several datasets
+  * Basic corpus
+      * Raw corpus (`sadedegel.dataset.load_raw_corpus`)
+      * Sentences tokenized corpus (`sadedegel.dataset.load_sentences_corpus`)  
+      * Human annotated summary corpus (`sadedegel.dataset.load_annotated_corpus`)   
+  * [Extended corpus](sadedegel/dataset/README.md)
+      * Raw corpus (`sadedegel.dataset.extended.load_extended_raw_corpus`)
+      * Sentences tokenized corpus (`sadedegel.dataset.extended.load_extended_sents_corpus`)
+      
+  * TsCorpus(`sadedegel.dataset.tscorpus`)
+      * Thanks to [Taner Sezer](https://github.com/tanerim), over 300K documents from tscorpus is also a part of sadedegel. Allowing us to
+        * [Evaluate](sadedegel/bblock/TOKENIZER.md) our tokenizers (word tokenizers)
+        * Build our [prebuilt news category classifier](sadedegel/prebuilt/README.md) 
+  * Various domain specific [datasets](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/dataset) (e-commerce, social media, tourism etc.) 
+* ML based sentence boundary detector (**SBD**) trained for Turkish language
+* Sadedegel Extractive Summarizers
+  * Various baseline summarizers
+    * Position Summarizer
+    * Length Summarizer
+    * Band Summarizer
+    * Random Summarizer
+  
+  * Various unsupervised/supervised summarizers
+    * ROUGE1 Summarizer
+    * TextRank Summarizer
+    * Cluster Summarizer
+    * Lexrank Summarizer
+    * BM25 Summarizer
+    * TfIdf Summarizer
+ 
+* Various Word Tokenizers
+  * BERT Tokenizer - Trained tokenizer (`pip install sadedegel[bert]`)
+  * Simple Tokenizer - Regex Based
+  * IcU Tokenizer (default by `0.19`)
+  
+* Various Sparse and Dense Embeddings implemented for `Sentences` and `Document` objects.
+  * BERT Embeddings (`pip install sadedegel[bert]`)
+  * TfIdf Embeddings
+ 
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension) 
+  
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension)
+  
+* [**Experimental**] Prebuilt models for several common NLP tasks ([`sadedegel.prebuilt`](sadedegel/prebuilt/README.md)).
+
+```python
+from sadedegel.prebuilt import news_classification
+
+model = news_classification.load()
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+y_pred = model.predict([doc_str])
+```
+
+📖 **For more details, refer to [sadedegel.ai](http://sadedegel.ai)**
+
+## Install sadedeGel
+
+- **Operating system**: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual
+  Studio)
+- **Python version**: 3.6+ (only 64 bit)
+- **Package managers**: [pip] 
+
+[pip]: https://pypi.org/project/sadedegel/
+
+### pip
+
+Using pip, sadedeGel releases are available as source packages and binary wheels.
+
+```bash
+pip install sadedegel
+```
+or update now
+
+```bash
+pip install sadedegel -U
+```
+
+When using pip it is generally recommended to install packages in a virtual
+environment to avoid modifying system state:
+
+```bash
+python -m venv .env
+source .env/bin/activate
+pip install sadedegel
+```
+
+#### Vocabulary Dump
+
+Certaing attributes of SadedeGel's NLP objects are dependent on shipped vocabulary dumps that are created over `sadedegel.dataset.extened_corpus` via each of the existing SadedeGel tokenizers. Those tokenizers are listed above. If you want to re-train a specific tokenizer's vocabulary with custom settings:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple] 
+```
+
+This will create a vocabulary dump using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+For all options to customize your vocab dump refer to:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary --help 
+```
+
+#### Optional
+
+To keep core sadedegel as light as possible we decomposed our initial monolitic design.
+
+To enable BERT embeddings and related capabilities use
+
+```bash
+pip install sadedegel[bert]
+```
+
+We ship 100-dimension word vectors with the library. If you need to re-train those word embeddings you can use
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple] --w2v
+```
+`--w2v` option requires `w2v` option to be installed. To install option use
+
+This will create a vocabulary dump with keyed vectors of arbitrary size using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+
+```bash
+pip install sadedegel[w2v]
+```
+
+### Quickstart with SadedeGel
+
+To load SadedeGel, use `sadedegel.load()`
+
+```python
+from sadedegel import Doc
+from sadedegel.dataset import load_raw_corpus
+from sadedegel.summarize import Rouge1Summarizer
+
+raw = load_raw_corpus()
+
+d = Doc(next(raw))
+
+summarizer = Rouge1Summarizer()
+summarizer(d, k=5)
+```
+
+To trigger sadedeGel NLP pipeline, initialize `Doc` instance with a document string.
+
+Access all sentences using Python built-in `list` function.
+
+```python
+from sadedegel import Doc
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+doc = Doc(doc_str)
+
+list(doc)
+```
+```python
+['Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı.',
+ 'Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok daha büyük ölçekte.',
+ 'Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.',
+ 'IBM 650 Model I adını taşıyan bilgisayarın satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı.',
+ 'Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı.',
+ 'Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.']
+```
+
+Access sentences by index.
+
+```python
+doc[2]
+```
+
+```python
+Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.
+```
+
+## SadedeGel Server
+In order to integrate with your applications we provide a quick summarizer server with sadedeGel.
+
+```bash
+python3 -m sadedegel.server 
+```
+
+### SadedeGel Server on Heroku
+[SadedeGel Server](https://sadedegel.herokuapp.com/api/info) is hosted on free tier of [Heroku](https://heroku.com) cloud services.
+
+* [OpenAPI Documentation](https://sadedegel.herokuapp.com/docs)
+* [Redoc Documentation](https://sadedegel.herokuapp.com/redoc)
+* [Redirection to sadedegel.ai](https://sadedegel.herokuapp.com)
+
+## PyLint, Flake8 and Bandit
+sadedeGel utilized [pylint](https://www.pylint.org/) for static code analysis, 
+[flake8](https://flake8.pycqa.org/en/latest) for code styling and [bandit](https://pypi.org/project/bandit) 
+for code security check.
+
+To run all tests
+
+```bash
+make lint
+```
+
+## Run tests
+
+sadedeGel comes with an [extensive test suite](sadedegel/tests). In order to run the
+tests, you'll usually want to clone the repository and build sadedeGel from source.
+This will also install the required development dependencies and test utilities
+defined in the `requirements.txt`.
+
+Alternatively, you can find out where sadedeGel is installed and run `pytest` on
+that directory. Don't forget to also install the test utilities via sadedeGel's
+`requirements.txt`:
+
+```bash
+make test
+```
+
+## 📓 Kaggle
+
+* Check [comprehensive notebook](https://www.kaggle.com/datafan07/clickbait-news-classification-using-sadedegel) of Kaggle Master [Ertugrul Demir](https://www.kaggle.com/datafan07) explaining the capabilities of sadedegel on Turkish clickbate dataset
+
+
+## Youtube Channel
+Some videos from [sadedeGel YouTube Channel](https://www.youtube.com/channel/UCyNG1Mehl44XWZ8LzkColuw)
+
+### SkyLab YTU Webinar Playlist
+
+[![Youtube](https://img.shields.io/youtube/likes/xoEERspk6Is?label=SadedeGel%20Subprojects%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=xoEERspk6Is)
+
+[![Youtube](https://img.shields.io/youtube/likes/HfWIzAwf5u8?label=SadedeGel%20Scraper%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=HfWIzAwf5u8)
+
+[![Youtube](https://img.shields.io/youtube/likes/PkUmYhahiMw?label=SadedeGel%20Evaluation-nDCG%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=PkUmYhahiMw)
+
+[![Youtube](https://img.shields.io/youtube/likes/AxpK7fOndRQ?label=SadedeGel%20Annotator%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=AxpK7fOndRQ)
+
+[![Youtube](https://img.shields.io/youtube/likes/jKh_t9ZOJ-g?label=SadedeGel%20Baseline%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=jKh_t9ZOJ-g)
+
+[![Youtube](https://img.shields.io/youtube/likes/3DO1X7de1FI?label=SadedeGel%20ROUGE1%20Özetleyici%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=3DO1X7de1FI)
+
+[![Youtube](https://img.shields.io/youtube/likes/KGg3DJQVH9c?label=SadedeGel%20Kümeleme%20Bazlı%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=KGg3DJQVH9c)
+
+[![Youtube](https://img.shields.io/youtube/likes/G_erifsGGFs?label=SadedeGel%20BERT%20Embeddings%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=G_erifsGGFs)
+
+## References
+
+### Special Thanks
+
+* [Starlang Software](https://starlangyazilim.com/) for their contribution to open source Turkish NLP development and corpus preperation.
+
+* [Olcay Taner Yıldız, Ph.D.](https://github.com/olcaytaner), one of our refrees in [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/), for helping our development on sadedegel.
+
+* [Taner Sezer](https://github.com/tanerim) for his contribution on tokenization corpus and labeled news corpus.
+
+### Our Community Contributors
+
+We would like to thank our community contributors for their bug/enhancement requests and questions to make sadedeGel better everyday
+
+* [Burak Işıklı](https://github.com/burakisikli)
+
+### Software Engineering
+* Special thanks to [spaCy](https://github.com/explosion/spaCy) project for their work in showing us the way to implement a proper python module rather than merely explaining it.
+    * We have borrowed many document and style related stuff from their code base :smile:
+    
+* There are a few free-tier service providers we need to thank:
+  * [GitHub](https://github.com) for
+      * Hosting our projects.
+      * Making it possible to collobrate easily.
+      * Automating our SLM via [Github Actions](https://github.com/features/actions)
+  * [Google Cloud Google Storage Service](https://cloud.google.com/products/storage) for providing low cost storage buckets making it possible to store `sadedegel.dataset.extended` data.
+  * [Heroku](https://heroku.com) for hosting [sadedeGel Server](https://sadedegel.herokuapp.com/api/info) in their free tier dynos.
+  * [CodeCov](https://codecov.io/) for allowing us to transparently share our [test coverage](https://codecov.io/gh/globalmaksimum/sadedegel)
+  * [PyPI](https://pypi.org/) for allowing us to share [sadedegel](https://pypi.org/project/sadedegel) with you.
+  * [binder](https://mybinder.org/) for 
+     * Allowing us to share our example [notebooks](notebook/)
+     * Hosting our learn by example boxes in [sadedegel.ai](http://sadedegel.ai) 
+    
+### Machine Learning (ML), Deep Learning (DL) and Natural Language Processing (NLP)
+* Resources on Extractive Text Summarization:
+
+    * [Leveraging BERT for Extractive Text Summarization on Lectures](https://arxiv.org/abs/1906.04165)  by Derek Miller
+    * [Fine-tune BERT for Extractive Summarization](https://arxiv.org/pdf/1903.10318.pdf) by Yang Liu
+
+* Other NLP related references
+
+    * [ROUGE: A Package for Automatic Evaluation of Summaries](https://www.aclweb.org/anthology/W04-1013.pdf)
+    * [Speech and Language Processing, Second Edition](https://web.stanford.edu/~jurafsky/slp3/)
+
+
+
+
+%package help
+Summary:	Development documents and examples for sadedegel
+Provides:	python3-sadedegel-doc
+%description help
+<a href="http://sadedegel.ai"><img src="https://sadedegel.ai/assets/img/logo-2.png" width="125" height="125" align="right" /></a>
+
+# SadedeGel: A General Purpose NLP library for Turkish
+
+SadedeGel is initially designed to be a library for unsupervised extraction-based news summarization using several old and new NLP techniques.
+
+Development of the library started as a part of [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/) in which SadedeGel was the **2nd place winner**.
+
+We are keeping on adding features with the goal of becoming a general purpose open source NLP library for Turkish language.
+
+
+💫 **Version 0.21 out now!**
+[Check out the release notes here.](https://github.com/GlobalMaksimum/sadedegel/releases)
+
+
+![Python package](https://github.com/GlobalMaksimum/sadedegel/workflows/Python%20package/badge.svg)
+[![Python Version](https://img.shields.io/pypi/pyversions/sadedegel?style=plastic)](https://img.shields.io/pypi/pyversions/sadedegel)
+[![Coverage](https://codecov.io/gh/globalmaksimum/sadedegel/branch/master/graphs/badge.svg?style=plastic)](https://codecov.io/gh/globalmaksimum/sadedegel)
+[![pypi Version](https://img.shields.io/pypi/v/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![PyPi downloads](https://img.shields.io/pypi/dm/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![License](https://img.shields.io/pypi/l/sadedegel)](https://github.com/GlobalMaksimum/sadedegel/blob/master/LICENSE)
+![Commit Month](https://img.shields.io/github/commit-activity/m/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Commit Week](https://img.shields.io/github/commit-activity/w/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Last Commit](https://img.shields.io/github/last-commit/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+[![Binder](https://mybinder.org/badge_logo.svg?style=plastic)](https://mybinder.org/v2/gh/GlobalMaksimum/sadedegel.git/master?filepath=notebook%2FBasics.ipynb)
+[![Slack](https://img.shields.io/static/v1?logo=slack&style=plastic&color=blueviolet&label=slack&labelColor=grey&message=sadedegel)](https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A)
+[![Kaggle](http://img.shields.io/static/v1?logo=kaggle&style=plastic&color=blue&label=kaggle&labelColor=grey&message=notebooks)](https://www.kaggle.com/search?q=sadedegel+in%3Anotebooks)
+
+
+## 📖 Documentation
+
+| Documentation   |                                                                |
+| --------------- | -------------------------------------------------------------- |
+| [Contribute]    | How to contribute to the sadedeGel project and code base.          |
+
+[contribute]: https://github.com/GlobalMaksimum/sadedegel/blob/master/CONTRIBUTING.md
+
+## 💬 Where to ask questions
+
+The SadedeGel project is initialized by [@globalmaksimum](https://github.com/GlobalMaksimum) AI team members
+[@dafajon](https://github.com/dafajon),
+[@askarbozcan](https://github.com/askarbozcan),
+[@mccakir](https://github.com/mccakir),
+[@husnusensoy](https://github.com/husnusensoy) and 
+[@ertugruldemir](https://github.com/ertugrul-dmr).
+
+
+Other community maintainers
+
+* [@doruktiktiklar](https://github.com/doruktiktiklar) contributes [TFIDF Summarizer](sadedegel/summarize/tf_idf.py)
+
+| Type                     | Platforms                                              |
+| ------------------------ | ------------------------------------------------------ |
+| 🚨 **Bug Reports**       | [GitHub Issue Tracker]                                 |
+| 🎁 **Feature Requests**  | [GitHub Issue Tracker]                                 |
+| <img width="18" height="18" src="https://www.freeiconspng.com/uploads/slack-icon-2.png"/> **Questions**  | [Slack Workspace]                                 |
+
+[github issue tracker]: https://github.com/GlobalMaksimum/sadedegel/issues
+[Slack Workspace]: https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A
+
+
+## Features
+
+* Several datasets
+  * Basic corpus
+      * Raw corpus (`sadedegel.dataset.load_raw_corpus`)
+      * Sentences tokenized corpus (`sadedegel.dataset.load_sentences_corpus`)  
+      * Human annotated summary corpus (`sadedegel.dataset.load_annotated_corpus`)   
+  * [Extended corpus](sadedegel/dataset/README.md)
+      * Raw corpus (`sadedegel.dataset.extended.load_extended_raw_corpus`)
+      * Sentences tokenized corpus (`sadedegel.dataset.extended.load_extended_sents_corpus`)
+      
+  * TsCorpus(`sadedegel.dataset.tscorpus`)
+      * Thanks to [Taner Sezer](https://github.com/tanerim), over 300K documents from tscorpus is also a part of sadedegel. Allowing us to
+        * [Evaluate](sadedegel/bblock/TOKENIZER.md) our tokenizers (word tokenizers)
+        * Build our [prebuilt news category classifier](sadedegel/prebuilt/README.md) 
+  * Various domain specific [datasets](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/dataset) (e-commerce, social media, tourism etc.) 
+* ML based sentence boundary detector (**SBD**) trained for Turkish language
+* Sadedegel Extractive Summarizers
+  * Various baseline summarizers
+    * Position Summarizer
+    * Length Summarizer
+    * Band Summarizer
+    * Random Summarizer
+  
+  * Various unsupervised/supervised summarizers
+    * ROUGE1 Summarizer
+    * TextRank Summarizer
+    * Cluster Summarizer
+    * Lexrank Summarizer
+    * BM25 Summarizer
+    * TfIdf Summarizer
+ 
+* Various Word Tokenizers
+  * BERT Tokenizer - Trained tokenizer (`pip install sadedegel[bert]`)
+  * Simple Tokenizer - Regex Based
+  * IcU Tokenizer (default by `0.19`)
+  
+* Various Sparse and Dense Embeddings implemented for `Sentences` and `Document` objects.
+  * BERT Embeddings (`pip install sadedegel[bert]`)
+  * TfIdf Embeddings
+ 
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension) 
+  
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension)
+  
+* [**Experimental**] Prebuilt models for several common NLP tasks ([`sadedegel.prebuilt`](sadedegel/prebuilt/README.md)).
+
+```python
+from sadedegel.prebuilt import news_classification
+
+model = news_classification.load()
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+y_pred = model.predict([doc_str])
+```
+
+📖 **For more details, refer to [sadedegel.ai](http://sadedegel.ai)**
+
+## Install sadedeGel
+
+- **Operating system**: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual
+  Studio)
+- **Python version**: 3.6+ (only 64 bit)
+- **Package managers**: [pip] 
+
+[pip]: https://pypi.org/project/sadedegel/
+
+### pip
+
+Using pip, sadedeGel releases are available as source packages and binary wheels.
+
+```bash
+pip install sadedegel
+```
+or update now
+
+```bash
+pip install sadedegel -U
+```
+
+When using pip it is generally recommended to install packages in a virtual
+environment to avoid modifying system state:
+
+```bash
+python -m venv .env
+source .env/bin/activate
+pip install sadedegel
+```
+
+#### Vocabulary Dump
+
+Certaing attributes of SadedeGel's NLP objects are dependent on shipped vocabulary dumps that are created over `sadedegel.dataset.extened_corpus` via each of the existing SadedeGel tokenizers. Those tokenizers are listed above. If you want to re-train a specific tokenizer's vocabulary with custom settings:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple] 
+```
+
+This will create a vocabulary dump using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+For all options to customize your vocab dump refer to:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary --help 
+```
+
+#### Optional
+
+To keep core sadedegel as light as possible we decomposed our initial monolitic design.
+
+To enable BERT embeddings and related capabilities use
+
+```bash
+pip install sadedegel[bert]
+```
+
+We ship 100-dimension word vectors with the library. If you need to re-train those word embeddings you can use
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple] --w2v
+```
+`--w2v` option requires `w2v` option to be installed. To install option use
+
+This will create a vocabulary dump with keyed vectors of arbitrary size using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+
+```bash
+pip install sadedegel[w2v]
+```
+
+### Quickstart with SadedeGel
+
+To load SadedeGel, use `sadedegel.load()`
+
+```python
+from sadedegel import Doc
+from sadedegel.dataset import load_raw_corpus
+from sadedegel.summarize import Rouge1Summarizer
+
+raw = load_raw_corpus()
+
+d = Doc(next(raw))
+
+summarizer = Rouge1Summarizer()
+summarizer(d, k=5)
+```
+
+To trigger sadedeGel NLP pipeline, initialize `Doc` instance with a document string.
+
+Access all sentences using Python built-in `list` function.
+
+```python
+from sadedegel import Doc
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+doc = Doc(doc_str)
+
+list(doc)
+```
+```python
+['Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı.',
+ 'Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok daha büyük ölçekte.',
+ 'Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.',
+ 'IBM 650 Model I adını taşıyan bilgisayarın satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı.',
+ 'Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı.',
+ 'Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.']
+```
+
+Access sentences by index.
+
+```python
+doc[2]
+```
+
+```python
+Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.
+```
+
+## SadedeGel Server
+In order to integrate with your applications we provide a quick summarizer server with sadedeGel.
+
+```bash
+python3 -m sadedegel.server 
+```
+
+### SadedeGel Server on Heroku
+[SadedeGel Server](https://sadedegel.herokuapp.com/api/info) is hosted on free tier of [Heroku](https://heroku.com) cloud services.
+
+* [OpenAPI Documentation](https://sadedegel.herokuapp.com/docs)
+* [Redoc Documentation](https://sadedegel.herokuapp.com/redoc)
+* [Redirection to sadedegel.ai](https://sadedegel.herokuapp.com)
+
+## PyLint, Flake8 and Bandit
+sadedeGel utilized [pylint](https://www.pylint.org/) for static code analysis, 
+[flake8](https://flake8.pycqa.org/en/latest) for code styling and [bandit](https://pypi.org/project/bandit) 
+for code security check.
+
+To run all tests
+
+```bash
+make lint
+```
+
+## Run tests
+
+sadedeGel comes with an [extensive test suite](sadedegel/tests). In order to run the
+tests, you'll usually want to clone the repository and build sadedeGel from source.
+This will also install the required development dependencies and test utilities
+defined in the `requirements.txt`.
+
+Alternatively, you can find out where sadedeGel is installed and run `pytest` on
+that directory. Don't forget to also install the test utilities via sadedeGel's
+`requirements.txt`:
+
+```bash
+make test
+```
+
+## 📓 Kaggle
+
+* Check [comprehensive notebook](https://www.kaggle.com/datafan07/clickbait-news-classification-using-sadedegel) of Kaggle Master [Ertugrul Demir](https://www.kaggle.com/datafan07) explaining the capabilities of sadedegel on Turkish clickbate dataset
+
+
+## Youtube Channel
+Some videos from [sadedeGel YouTube Channel](https://www.youtube.com/channel/UCyNG1Mehl44XWZ8LzkColuw)
+
+### SkyLab YTU Webinar Playlist
+
+[![Youtube](https://img.shields.io/youtube/likes/xoEERspk6Is?label=SadedeGel%20Subprojects%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=xoEERspk6Is)
+
+[![Youtube](https://img.shields.io/youtube/likes/HfWIzAwf5u8?label=SadedeGel%20Scraper%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=HfWIzAwf5u8)
+
+[![Youtube](https://img.shields.io/youtube/likes/PkUmYhahiMw?label=SadedeGel%20Evaluation-nDCG%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=PkUmYhahiMw)
+
+[![Youtube](https://img.shields.io/youtube/likes/AxpK7fOndRQ?label=SadedeGel%20Annotator%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=AxpK7fOndRQ)
+
+[![Youtube](https://img.shields.io/youtube/likes/jKh_t9ZOJ-g?label=SadedeGel%20Baseline%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=jKh_t9ZOJ-g)
+
+[![Youtube](https://img.shields.io/youtube/likes/3DO1X7de1FI?label=SadedeGel%20ROUGE1%20Özetleyici%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=3DO1X7de1FI)
+
+[![Youtube](https://img.shields.io/youtube/likes/KGg3DJQVH9c?label=SadedeGel%20Kümeleme%20Bazlı%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=KGg3DJQVH9c)
+
+[![Youtube](https://img.shields.io/youtube/likes/G_erifsGGFs?label=SadedeGel%20BERT%20Embeddings%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=G_erifsGGFs)
+
+## References
+
+### Special Thanks
+
+* [Starlang Software](https://starlangyazilim.com/) for their contribution to open source Turkish NLP development and corpus preperation.
+
+* [Olcay Taner Yıldız, Ph.D.](https://github.com/olcaytaner), one of our refrees in [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/), for helping our development on sadedegel.
+
+* [Taner Sezer](https://github.com/tanerim) for his contribution on tokenization corpus and labeled news corpus.
+
+### Our Community Contributors
+
+We would like to thank our community contributors for their bug/enhancement requests and questions to make sadedeGel better everyday
+
+* [Burak Işıklı](https://github.com/burakisikli)
+
+### Software Engineering
+* Special thanks to [spaCy](https://github.com/explosion/spaCy) project for their work in showing us the way to implement a proper python module rather than merely explaining it.
+    * We have borrowed many document and style related stuff from their code base :smile:
+    
+* There are a few free-tier service providers we need to thank:
+  * [GitHub](https://github.com) for
+      * Hosting our projects.
+      * Making it possible to collobrate easily.
+      * Automating our SLM via [Github Actions](https://github.com/features/actions)
+  * [Google Cloud Google Storage Service](https://cloud.google.com/products/storage) for providing low cost storage buckets making it possible to store `sadedegel.dataset.extended` data.
+  * [Heroku](https://heroku.com) for hosting [sadedeGel Server](https://sadedegel.herokuapp.com/api/info) in their free tier dynos.
+  * [CodeCov](https://codecov.io/) for allowing us to transparently share our [test coverage](https://codecov.io/gh/globalmaksimum/sadedegel)
+  * [PyPI](https://pypi.org/) for allowing us to share [sadedegel](https://pypi.org/project/sadedegel) with you.
+  * [binder](https://mybinder.org/) for 
+     * Allowing us to share our example [notebooks](notebook/)
+     * Hosting our learn by example boxes in [sadedegel.ai](http://sadedegel.ai) 
+    
+### Machine Learning (ML), Deep Learning (DL) and Natural Language Processing (NLP)
+* Resources on Extractive Text Summarization:
+
+    * [Leveraging BERT for Extractive Text Summarization on Lectures](https://arxiv.org/abs/1906.04165)  by Derek Miller
+    * [Fine-tune BERT for Extractive Summarization](https://arxiv.org/pdf/1903.10318.pdf) by Yang Liu
+
+* Other NLP related references
+
+    * [ROUGE: A Package for Automatic Evaluation of Summaries](https://www.aclweb.org/anthology/W04-1013.pdf)
+    * [Speech and Language Processing, Second Edition](https://web.stanford.edu/~jurafsky/slp3/)
+
+
+
+
+%prep
+%autosetup -n sadedegel-0.21.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-sadedegel -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 31 2023 Python_Bot <Python_Bot@openeuler.org> - 0.21.2-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..a57676f
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+5c5e317121482a1938f0b1e73c5d51e1  sadedegel-0.21.2.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-31 04:13:21 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-31 04:13:21 +0000
commit	65991f6089d1b3959672fa5505dd0490ea1c8e4d (patch)
tree	e3dad901119e67de1afef4ec6d08079582c98ef6
parent	427c9230b5d9653fa1f72d270ad7fc9c606740b3 (diff)