summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-sadedegel.spec1171
-rw-r--r--sources1
3 files changed, 1173 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..c229b36 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/sadedegel-0.21.2.tar.gz
diff --git a/python-sadedegel.spec b/python-sadedegel.spec
new file mode 100644
index 0000000..2ca53e7
--- /dev/null
+++ b/python-sadedegel.spec
@@ -0,0 +1,1171 @@
+%global _empty_manifest_terminate_build 0
+Name: python-sadedegel
+Version: 0.21.2
+Release: 1
+Summary: Extraction-based Turkish news summarizer.
+License: MIT
+URL: https://github.com/GlobalMaksimum/sadedegel
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/da/1a/138f91345a46559f8130190c83722791da5e36166acc2893a54bf97a8343/sadedegel-0.21.2.tar.gz
+BuildArch: noarch
+
+Requires: python3-loguru
+Requires: python3-click
+Requires: python3-smart-open
+Requires: python3-uvicorn
+Requires: python3-fastapi
+Requires: python3-scikit-learn
+Requires: python3-nltk
+Requires: python3-networkx
+Requires: python3-tabulate
+Requires: python3-sadedegel-icu
+Requires: python3-requests
+Requires: python3-rich
+Requires: python3-cached-property
+Requires: python3-h5py
+Requires: python3-sentence-transformers
+Requires: python3-gensim
+
+%description
+<a href="http://sadedegel.ai"><img src="https://sadedegel.ai/assets/img/logo-2.png" width="125" height="125" align="right" /></a>
+
+# SadedeGel: A General Purpose NLP library for Turkish
+
+SadedeGel is initially designed to be a library for unsupervised extraction-based news summarization using several old and new NLP techniques.
+
+Development of the library started as a part of [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/) in which SadedeGel was the **2nd place winner**.
+
+We are keeping on adding features with the goal of becoming a general purpose open source NLP library for Turkish language.
+
+
+💫 **Version 0.21 out now!**
+[Check out the release notes here.](https://github.com/GlobalMaksimum/sadedegel/releases)
+
+
+![Python package](https://github.com/GlobalMaksimum/sadedegel/workflows/Python%20package/badge.svg)
+[![Python Version](https://img.shields.io/pypi/pyversions/sadedegel?style=plastic)](https://img.shields.io/pypi/pyversions/sadedegel)
+[![Coverage](https://codecov.io/gh/globalmaksimum/sadedegel/branch/master/graphs/badge.svg?style=plastic)](https://codecov.io/gh/globalmaksimum/sadedegel)
+[![pypi Version](https://img.shields.io/pypi/v/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![PyPi downloads](https://img.shields.io/pypi/dm/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![License](https://img.shields.io/pypi/l/sadedegel)](https://github.com/GlobalMaksimum/sadedegel/blob/master/LICENSE)
+![Commit Month](https://img.shields.io/github/commit-activity/m/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Commit Week](https://img.shields.io/github/commit-activity/w/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Last Commit](https://img.shields.io/github/last-commit/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+[![Binder](https://mybinder.org/badge_logo.svg?style=plastic)](https://mybinder.org/v2/gh/GlobalMaksimum/sadedegel.git/master?filepath=notebook%2FBasics.ipynb)
+[![Slack](https://img.shields.io/static/v1?logo=slack&style=plastic&color=blueviolet&label=slack&labelColor=grey&message=sadedegel)](https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A)
+[![Kaggle](http://img.shields.io/static/v1?logo=kaggle&style=plastic&color=blue&label=kaggle&labelColor=grey&message=notebooks)](https://www.kaggle.com/search?q=sadedegel+in%3Anotebooks)
+
+
+## 📖 Documentation
+
+| Documentation | |
+| --------------- | -------------------------------------------------------------- |
+| [Contribute] | How to contribute to the sadedeGel project and code base. |
+
+[contribute]: https://github.com/GlobalMaksimum/sadedegel/blob/master/CONTRIBUTING.md
+
+## 💬 Where to ask questions
+
+The SadedeGel project is initialized by [@globalmaksimum](https://github.com/GlobalMaksimum) AI team members
+[@dafajon](https://github.com/dafajon),
+[@askarbozcan](https://github.com/askarbozcan),
+[@mccakir](https://github.com/mccakir),
+[@husnusensoy](https://github.com/husnusensoy) and
+[@ertugruldemir](https://github.com/ertugrul-dmr).
+
+
+Other community maintainers
+
+* [@doruktiktiklar](https://github.com/doruktiktiklar) contributes [TFIDF Summarizer](sadedegel/summarize/tf_idf.py)
+
+| Type | Platforms |
+| ------------------------ | ------------------------------------------------------ |
+| 🚨 **Bug Reports** | [GitHub Issue Tracker] |
+| 🎁 **Feature Requests** | [GitHub Issue Tracker] |
+| <img width="18" height="18" src="https://www.freeiconspng.com/uploads/slack-icon-2.png"/> **Questions** | [Slack Workspace] |
+
+[github issue tracker]: https://github.com/GlobalMaksimum/sadedegel/issues
+[Slack Workspace]: https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A
+
+
+## Features
+
+* Several datasets
+ * Basic corpus
+ * Raw corpus (`sadedegel.dataset.load_raw_corpus`)
+ * Sentences tokenized corpus (`sadedegel.dataset.load_sentences_corpus`)
+ * Human annotated summary corpus (`sadedegel.dataset.load_annotated_corpus`)
+ * [Extended corpus](sadedegel/dataset/README.md)
+ * Raw corpus (`sadedegel.dataset.extended.load_extended_raw_corpus`)
+ * Sentences tokenized corpus (`sadedegel.dataset.extended.load_extended_sents_corpus`)
+
+ * TsCorpus(`sadedegel.dataset.tscorpus`)
+ * Thanks to [Taner Sezer](https://github.com/tanerim), over 300K documents from tscorpus is also a part of sadedegel. Allowing us to
+ * [Evaluate](sadedegel/bblock/TOKENIZER.md) our tokenizers (word tokenizers)
+ * Build our [prebuilt news category classifier](sadedegel/prebuilt/README.md)
+ * Various domain specific [datasets](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/dataset) (e-commerce, social media, tourism etc.)
+* ML based sentence boundary detector (**SBD**) trained for Turkish language
+* Sadedegel Extractive Summarizers
+ * Various baseline summarizers
+ * Position Summarizer
+ * Length Summarizer
+ * Band Summarizer
+ * Random Summarizer
+
+ * Various unsupervised/supervised summarizers
+ * ROUGE1 Summarizer
+ * TextRank Summarizer
+ * Cluster Summarizer
+ * Lexrank Summarizer
+ * BM25 Summarizer
+ * TfIdf Summarizer
+
+* Various Word Tokenizers
+ * BERT Tokenizer - Trained tokenizer (`pip install sadedegel[bert]`)
+ * Simple Tokenizer - Regex Based
+ * IcU Tokenizer (default by `0.19`)
+
+* Various Sparse and Dense Embeddings implemented for `Sentences` and `Document` objects.
+ * BERT Embeddings (`pip install sadedegel[bert]`)
+ * TfIdf Embeddings
+
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension)
+
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension)
+
+* [**Experimental**] Prebuilt models for several common NLP tasks ([`sadedegel.prebuilt`](sadedegel/prebuilt/README.md)).
+
+```python
+from sadedegel.prebuilt import news_classification
+
+model = news_classification.load()
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+y_pred = model.predict([doc_str])
+```
+
+📖 **For more details, refer to [sadedegel.ai](http://sadedegel.ai)**
+
+## Install sadedeGel
+
+- **Operating system**: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual
+ Studio)
+- **Python version**: 3.6+ (only 64 bit)
+- **Package managers**: [pip]
+
+[pip]: https://pypi.org/project/sadedegel/
+
+### pip
+
+Using pip, sadedeGel releases are available as source packages and binary wheels.
+
+```bash
+pip install sadedegel
+```
+or update now
+
+```bash
+pip install sadedegel -U
+```
+
+When using pip it is generally recommended to install packages in a virtual
+environment to avoid modifying system state:
+
+```bash
+python -m venv .env
+source .env/bin/activate
+pip install sadedegel
+```
+
+#### Vocabulary Dump
+
+Certaing attributes of SadedeGel's NLP objects are dependent on shipped vocabulary dumps that are created over `sadedegel.dataset.extened_corpus` via each of the existing SadedeGel tokenizers. Those tokenizers are listed above. If you want to re-train a specific tokenizer's vocabulary with custom settings:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple]
+```
+
+This will create a vocabulary dump using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+For all options to customize your vocab dump refer to:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary --help
+```
+
+#### Optional
+
+To keep core sadedegel as light as possible we decomposed our initial monolitic design.
+
+To enable BERT embeddings and related capabilities use
+
+```bash
+pip install sadedegel[bert]
+```
+
+We ship 100-dimension word vectors with the library. If you need to re-train those word embeddings you can use
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple] --w2v
+```
+`--w2v` option requires `w2v` option to be installed. To install option use
+
+This will create a vocabulary dump with keyed vectors of arbitrary size using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+
+```bash
+pip install sadedegel[w2v]
+```
+
+### Quickstart with SadedeGel
+
+To load SadedeGel, use `sadedegel.load()`
+
+```python
+from sadedegel import Doc
+from sadedegel.dataset import load_raw_corpus
+from sadedegel.summarize import Rouge1Summarizer
+
+raw = load_raw_corpus()
+
+d = Doc(next(raw))
+
+summarizer = Rouge1Summarizer()
+summarizer(d, k=5)
+```
+
+To trigger sadedeGel NLP pipeline, initialize `Doc` instance with a document string.
+
+Access all sentences using Python built-in `list` function.
+
+```python
+from sadedegel import Doc
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+doc = Doc(doc_str)
+
+list(doc)
+```
+```python
+['Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı.',
+ 'Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok daha büyük ölçekte.',
+ 'Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.',
+ 'IBM 650 Model I adını taşıyan bilgisayarın satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı.',
+ 'Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı.',
+ 'Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.']
+```
+
+Access sentences by index.
+
+```python
+doc[2]
+```
+
+```python
+Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.
+```
+
+## SadedeGel Server
+In order to integrate with your applications we provide a quick summarizer server with sadedeGel.
+
+```bash
+python3 -m sadedegel.server
+```
+
+### SadedeGel Server on Heroku
+[SadedeGel Server](https://sadedegel.herokuapp.com/api/info) is hosted on free tier of [Heroku](https://heroku.com) cloud services.
+
+* [OpenAPI Documentation](https://sadedegel.herokuapp.com/docs)
+* [Redoc Documentation](https://sadedegel.herokuapp.com/redoc)
+* [Redirection to sadedegel.ai](https://sadedegel.herokuapp.com)
+
+## PyLint, Flake8 and Bandit
+sadedeGel utilized [pylint](https://www.pylint.org/) for static code analysis,
+[flake8](https://flake8.pycqa.org/en/latest) for code styling and [bandit](https://pypi.org/project/bandit)
+for code security check.
+
+To run all tests
+
+```bash
+make lint
+```
+
+## Run tests
+
+sadedeGel comes with an [extensive test suite](sadedegel/tests). In order to run the
+tests, you'll usually want to clone the repository and build sadedeGel from source.
+This will also install the required development dependencies and test utilities
+defined in the `requirements.txt`.
+
+Alternatively, you can find out where sadedeGel is installed and run `pytest` on
+that directory. Don't forget to also install the test utilities via sadedeGel's
+`requirements.txt`:
+
+```bash
+make test
+```
+
+## 📓 Kaggle
+
+* Check [comprehensive notebook](https://www.kaggle.com/datafan07/clickbait-news-classification-using-sadedegel) of Kaggle Master [Ertugrul Demir](https://www.kaggle.com/datafan07) explaining the capabilities of sadedegel on Turkish clickbate dataset
+
+
+## Youtube Channel
+Some videos from [sadedeGel YouTube Channel](https://www.youtube.com/channel/UCyNG1Mehl44XWZ8LzkColuw)
+
+### SkyLab YTU Webinar Playlist
+
+[![Youtube](https://img.shields.io/youtube/likes/xoEERspk6Is?label=SadedeGel%20Subprojects%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=xoEERspk6Is)
+
+[![Youtube](https://img.shields.io/youtube/likes/HfWIzAwf5u8?label=SadedeGel%20Scraper%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=HfWIzAwf5u8)
+
+[![Youtube](https://img.shields.io/youtube/likes/PkUmYhahiMw?label=SadedeGel%20Evaluation-nDCG%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=PkUmYhahiMw)
+
+[![Youtube](https://img.shields.io/youtube/likes/AxpK7fOndRQ?label=SadedeGel%20Annotator%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=AxpK7fOndRQ)
+
+[![Youtube](https://img.shields.io/youtube/likes/jKh_t9ZOJ-g?label=SadedeGel%20Baseline%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=jKh_t9ZOJ-g)
+
+[![Youtube](https://img.shields.io/youtube/likes/3DO1X7de1FI?label=SadedeGel%20ROUGE1%20Özetleyici%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=3DO1X7de1FI)
+
+[![Youtube](https://img.shields.io/youtube/likes/KGg3DJQVH9c?label=SadedeGel%20Kümeleme%20Bazlı%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=KGg3DJQVH9c)
+
+[![Youtube](https://img.shields.io/youtube/likes/G_erifsGGFs?label=SadedeGel%20BERT%20Embeddings%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=G_erifsGGFs)
+
+## References
+
+### Special Thanks
+
+* [Starlang Software](https://starlangyazilim.com/) for their contribution to open source Turkish NLP development and corpus preperation.
+
+* [Olcay Taner Yıldız, Ph.D.](https://github.com/olcaytaner), one of our refrees in [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/), for helping our development on sadedegel.
+
+* [Taner Sezer](https://github.com/tanerim) for his contribution on tokenization corpus and labeled news corpus.
+
+### Our Community Contributors
+
+We would like to thank our community contributors for their bug/enhancement requests and questions to make sadedeGel better everyday
+
+* [Burak Işıklı](https://github.com/burakisikli)
+
+### Software Engineering
+* Special thanks to [spaCy](https://github.com/explosion/spaCy) project for their work in showing us the way to implement a proper python module rather than merely explaining it.
+ * We have borrowed many document and style related stuff from their code base :smile:
+
+* There are a few free-tier service providers we need to thank:
+ * [GitHub](https://github.com) for
+ * Hosting our projects.
+ * Making it possible to collobrate easily.
+ * Automating our SLM via [Github Actions](https://github.com/features/actions)
+ * [Google Cloud Google Storage Service](https://cloud.google.com/products/storage) for providing low cost storage buckets making it possible to store `sadedegel.dataset.extended` data.
+ * [Heroku](https://heroku.com) for hosting [sadedeGel Server](https://sadedegel.herokuapp.com/api/info) in their free tier dynos.
+ * [CodeCov](https://codecov.io/) for allowing us to transparently share our [test coverage](https://codecov.io/gh/globalmaksimum/sadedegel)
+ * [PyPI](https://pypi.org/) for allowing us to share [sadedegel](https://pypi.org/project/sadedegel) with you.
+ * [binder](https://mybinder.org/) for
+ * Allowing us to share our example [notebooks](notebook/)
+ * Hosting our learn by example boxes in [sadedegel.ai](http://sadedegel.ai)
+
+### Machine Learning (ML), Deep Learning (DL) and Natural Language Processing (NLP)
+* Resources on Extractive Text Summarization:
+
+ * [Leveraging BERT for Extractive Text Summarization on Lectures](https://arxiv.org/abs/1906.04165) by Derek Miller
+ * [Fine-tune BERT for Extractive Summarization](https://arxiv.org/pdf/1903.10318.pdf) by Yang Liu
+
+* Other NLP related references
+
+ * [ROUGE: A Package for Automatic Evaluation of Summaries](https://www.aclweb.org/anthology/W04-1013.pdf)
+ * [Speech and Language Processing, Second Edition](https://web.stanford.edu/~jurafsky/slp3/)
+
+
+
+
+%package -n python3-sadedegel
+Summary: Extraction-based Turkish news summarizer.
+Provides: python-sadedegel
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-sadedegel
+<a href="http://sadedegel.ai"><img src="https://sadedegel.ai/assets/img/logo-2.png" width="125" height="125" align="right" /></a>
+
+# SadedeGel: A General Purpose NLP library for Turkish
+
+SadedeGel is initially designed to be a library for unsupervised extraction-based news summarization using several old and new NLP techniques.
+
+Development of the library started as a part of [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/) in which SadedeGel was the **2nd place winner**.
+
+We are keeping on adding features with the goal of becoming a general purpose open source NLP library for Turkish language.
+
+
+💫 **Version 0.21 out now!**
+[Check out the release notes here.](https://github.com/GlobalMaksimum/sadedegel/releases)
+
+
+![Python package](https://github.com/GlobalMaksimum/sadedegel/workflows/Python%20package/badge.svg)
+[![Python Version](https://img.shields.io/pypi/pyversions/sadedegel?style=plastic)](https://img.shields.io/pypi/pyversions/sadedegel)
+[![Coverage](https://codecov.io/gh/globalmaksimum/sadedegel/branch/master/graphs/badge.svg?style=plastic)](https://codecov.io/gh/globalmaksimum/sadedegel)
+[![pypi Version](https://img.shields.io/pypi/v/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![PyPi downloads](https://img.shields.io/pypi/dm/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![License](https://img.shields.io/pypi/l/sadedegel)](https://github.com/GlobalMaksimum/sadedegel/blob/master/LICENSE)
+![Commit Month](https://img.shields.io/github/commit-activity/m/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Commit Week](https://img.shields.io/github/commit-activity/w/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Last Commit](https://img.shields.io/github/last-commit/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+[![Binder](https://mybinder.org/badge_logo.svg?style=plastic)](https://mybinder.org/v2/gh/GlobalMaksimum/sadedegel.git/master?filepath=notebook%2FBasics.ipynb)
+[![Slack](https://img.shields.io/static/v1?logo=slack&style=plastic&color=blueviolet&label=slack&labelColor=grey&message=sadedegel)](https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A)
+[![Kaggle](http://img.shields.io/static/v1?logo=kaggle&style=plastic&color=blue&label=kaggle&labelColor=grey&message=notebooks)](https://www.kaggle.com/search?q=sadedegel+in%3Anotebooks)
+
+
+## 📖 Documentation
+
+| Documentation | |
+| --------------- | -------------------------------------------------------------- |
+| [Contribute] | How to contribute to the sadedeGel project and code base. |
+
+[contribute]: https://github.com/GlobalMaksimum/sadedegel/blob/master/CONTRIBUTING.md
+
+## 💬 Where to ask questions
+
+The SadedeGel project is initialized by [@globalmaksimum](https://github.com/GlobalMaksimum) AI team members
+[@dafajon](https://github.com/dafajon),
+[@askarbozcan](https://github.com/askarbozcan),
+[@mccakir](https://github.com/mccakir),
+[@husnusensoy](https://github.com/husnusensoy) and
+[@ertugruldemir](https://github.com/ertugrul-dmr).
+
+
+Other community maintainers
+
+* [@doruktiktiklar](https://github.com/doruktiktiklar) contributes [TFIDF Summarizer](sadedegel/summarize/tf_idf.py)
+
+| Type | Platforms |
+| ------------------------ | ------------------------------------------------------ |
+| 🚨 **Bug Reports** | [GitHub Issue Tracker] |
+| 🎁 **Feature Requests** | [GitHub Issue Tracker] |
+| <img width="18" height="18" src="https://www.freeiconspng.com/uploads/slack-icon-2.png"/> **Questions** | [Slack Workspace] |
+
+[github issue tracker]: https://github.com/GlobalMaksimum/sadedegel/issues
+[Slack Workspace]: https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A
+
+
+## Features
+
+* Several datasets
+ * Basic corpus
+ * Raw corpus (`sadedegel.dataset.load_raw_corpus`)
+ * Sentences tokenized corpus (`sadedegel.dataset.load_sentences_corpus`)
+ * Human annotated summary corpus (`sadedegel.dataset.load_annotated_corpus`)
+ * [Extended corpus](sadedegel/dataset/README.md)
+ * Raw corpus (`sadedegel.dataset.extended.load_extended_raw_corpus`)
+ * Sentences tokenized corpus (`sadedegel.dataset.extended.load_extended_sents_corpus`)
+
+ * TsCorpus(`sadedegel.dataset.tscorpus`)
+ * Thanks to [Taner Sezer](https://github.com/tanerim), over 300K documents from tscorpus is also a part of sadedegel. Allowing us to
+ * [Evaluate](sadedegel/bblock/TOKENIZER.md) our tokenizers (word tokenizers)
+ * Build our [prebuilt news category classifier](sadedegel/prebuilt/README.md)
+ * Various domain specific [datasets](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/dataset) (e-commerce, social media, tourism etc.)
+* ML based sentence boundary detector (**SBD**) trained for Turkish language
+* Sadedegel Extractive Summarizers
+ * Various baseline summarizers
+ * Position Summarizer
+ * Length Summarizer
+ * Band Summarizer
+ * Random Summarizer
+
+ * Various unsupervised/supervised summarizers
+ * ROUGE1 Summarizer
+ * TextRank Summarizer
+ * Cluster Summarizer
+ * Lexrank Summarizer
+ * BM25 Summarizer
+ * TfIdf Summarizer
+
+* Various Word Tokenizers
+ * BERT Tokenizer - Trained tokenizer (`pip install sadedegel[bert]`)
+ * Simple Tokenizer - Regex Based
+ * IcU Tokenizer (default by `0.19`)
+
+* Various Sparse and Dense Embeddings implemented for `Sentences` and `Document` objects.
+ * BERT Embeddings (`pip install sadedegel[bert]`)
+ * TfIdf Embeddings
+
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension)
+
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension)
+
+* [**Experimental**] Prebuilt models for several common NLP tasks ([`sadedegel.prebuilt`](sadedegel/prebuilt/README.md)).
+
+```python
+from sadedegel.prebuilt import news_classification
+
+model = news_classification.load()
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+y_pred = model.predict([doc_str])
+```
+
+📖 **For more details, refer to [sadedegel.ai](http://sadedegel.ai)**
+
+## Install sadedeGel
+
+- **Operating system**: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual
+ Studio)
+- **Python version**: 3.6+ (only 64 bit)
+- **Package managers**: [pip]
+
+[pip]: https://pypi.org/project/sadedegel/
+
+### pip
+
+Using pip, sadedeGel releases are available as source packages and binary wheels.
+
+```bash
+pip install sadedegel
+```
+or update now
+
+```bash
+pip install sadedegel -U
+```
+
+When using pip it is generally recommended to install packages in a virtual
+environment to avoid modifying system state:
+
+```bash
+python -m venv .env
+source .env/bin/activate
+pip install sadedegel
+```
+
+#### Vocabulary Dump
+
+Certaing attributes of SadedeGel's NLP objects are dependent on shipped vocabulary dumps that are created over `sadedegel.dataset.extened_corpus` via each of the existing SadedeGel tokenizers. Those tokenizers are listed above. If you want to re-train a specific tokenizer's vocabulary with custom settings:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple]
+```
+
+This will create a vocabulary dump using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+For all options to customize your vocab dump refer to:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary --help
+```
+
+#### Optional
+
+To keep core sadedegel as light as possible we decomposed our initial monolitic design.
+
+To enable BERT embeddings and related capabilities use
+
+```bash
+pip install sadedegel[bert]
+```
+
+We ship 100-dimension word vectors with the library. If you need to re-train those word embeddings you can use
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple] --w2v
+```
+`--w2v` option requires `w2v` option to be installed. To install option use
+
+This will create a vocabulary dump with keyed vectors of arbitrary size using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+
+```bash
+pip install sadedegel[w2v]
+```
+
+### Quickstart with SadedeGel
+
+To load SadedeGel, use `sadedegel.load()`
+
+```python
+from sadedegel import Doc
+from sadedegel.dataset import load_raw_corpus
+from sadedegel.summarize import Rouge1Summarizer
+
+raw = load_raw_corpus()
+
+d = Doc(next(raw))
+
+summarizer = Rouge1Summarizer()
+summarizer(d, k=5)
+```
+
+To trigger sadedeGel NLP pipeline, initialize `Doc` instance with a document string.
+
+Access all sentences using Python built-in `list` function.
+
+```python
+from sadedegel import Doc
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+doc = Doc(doc_str)
+
+list(doc)
+```
+```python
+['Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı.',
+ 'Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok daha büyük ölçekte.',
+ 'Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.',
+ 'IBM 650 Model I adını taşıyan bilgisayarın satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı.',
+ 'Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı.',
+ 'Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.']
+```
+
+Access sentences by index.
+
+```python
+doc[2]
+```
+
+```python
+Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.
+```
+
+## SadedeGel Server
+In order to integrate with your applications we provide a quick summarizer server with sadedeGel.
+
+```bash
+python3 -m sadedegel.server
+```
+
+### SadedeGel Server on Heroku
+[SadedeGel Server](https://sadedegel.herokuapp.com/api/info) is hosted on free tier of [Heroku](https://heroku.com) cloud services.
+
+* [OpenAPI Documentation](https://sadedegel.herokuapp.com/docs)
+* [Redoc Documentation](https://sadedegel.herokuapp.com/redoc)
+* [Redirection to sadedegel.ai](https://sadedegel.herokuapp.com)
+
+## PyLint, Flake8 and Bandit
+sadedeGel utilized [pylint](https://www.pylint.org/) for static code analysis,
+[flake8](https://flake8.pycqa.org/en/latest) for code styling and [bandit](https://pypi.org/project/bandit)
+for code security check.
+
+To run all tests
+
+```bash
+make lint
+```
+
+## Run tests
+
+sadedeGel comes with an [extensive test suite](sadedegel/tests). In order to run the
+tests, you'll usually want to clone the repository and build sadedeGel from source.
+This will also install the required development dependencies and test utilities
+defined in the `requirements.txt`.
+
+Alternatively, you can find out where sadedeGel is installed and run `pytest` on
+that directory. Don't forget to also install the test utilities via sadedeGel's
+`requirements.txt`:
+
+```bash
+make test
+```
+
+## 📓 Kaggle
+
+* Check [comprehensive notebook](https://www.kaggle.com/datafan07/clickbait-news-classification-using-sadedegel) of Kaggle Master [Ertugrul Demir](https://www.kaggle.com/datafan07) explaining the capabilities of sadedegel on Turkish clickbate dataset
+
+
+## Youtube Channel
+Some videos from [sadedeGel YouTube Channel](https://www.youtube.com/channel/UCyNG1Mehl44XWZ8LzkColuw)
+
+### SkyLab YTU Webinar Playlist
+
+[![Youtube](https://img.shields.io/youtube/likes/xoEERspk6Is?label=SadedeGel%20Subprojects%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=xoEERspk6Is)
+
+[![Youtube](https://img.shields.io/youtube/likes/HfWIzAwf5u8?label=SadedeGel%20Scraper%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=HfWIzAwf5u8)
+
+[![Youtube](https://img.shields.io/youtube/likes/PkUmYhahiMw?label=SadedeGel%20Evaluation-nDCG%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=PkUmYhahiMw)
+
+[![Youtube](https://img.shields.io/youtube/likes/AxpK7fOndRQ?label=SadedeGel%20Annotator%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=AxpK7fOndRQ)
+
+[![Youtube](https://img.shields.io/youtube/likes/jKh_t9ZOJ-g?label=SadedeGel%20Baseline%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=jKh_t9ZOJ-g)
+
+[![Youtube](https://img.shields.io/youtube/likes/3DO1X7de1FI?label=SadedeGel%20ROUGE1%20Özetleyici%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=3DO1X7de1FI)
+
+[![Youtube](https://img.shields.io/youtube/likes/KGg3DJQVH9c?label=SadedeGel%20Kümeleme%20Bazlı%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=KGg3DJQVH9c)
+
+[![Youtube](https://img.shields.io/youtube/likes/G_erifsGGFs?label=SadedeGel%20BERT%20Embeddings%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=G_erifsGGFs)
+
+## References
+
+### Special Thanks
+
+* [Starlang Software](https://starlangyazilim.com/) for their contribution to open source Turkish NLP development and corpus preperation.
+
+* [Olcay Taner Yıldız, Ph.D.](https://github.com/olcaytaner), one of our refrees in [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/), for helping our development on sadedegel.
+
+* [Taner Sezer](https://github.com/tanerim) for his contribution on tokenization corpus and labeled news corpus.
+
+### Our Community Contributors
+
+We would like to thank our community contributors for their bug/enhancement requests and questions to make sadedeGel better everyday
+
+* [Burak Işıklı](https://github.com/burakisikli)
+
+### Software Engineering
+* Special thanks to [spaCy](https://github.com/explosion/spaCy) project for their work in showing us the way to implement a proper python module rather than merely explaining it.
+ * We have borrowed many document and style related stuff from their code base :smile:
+
+* There are a few free-tier service providers we need to thank:
+ * [GitHub](https://github.com) for
+ * Hosting our projects.
+ * Making it possible to collobrate easily.
+ * Automating our SLM via [Github Actions](https://github.com/features/actions)
+ * [Google Cloud Google Storage Service](https://cloud.google.com/products/storage) for providing low cost storage buckets making it possible to store `sadedegel.dataset.extended` data.
+ * [Heroku](https://heroku.com) for hosting [sadedeGel Server](https://sadedegel.herokuapp.com/api/info) in their free tier dynos.
+ * [CodeCov](https://codecov.io/) for allowing us to transparently share our [test coverage](https://codecov.io/gh/globalmaksimum/sadedegel)
+ * [PyPI](https://pypi.org/) for allowing us to share [sadedegel](https://pypi.org/project/sadedegel) with you.
+ * [binder](https://mybinder.org/) for
+ * Allowing us to share our example [notebooks](notebook/)
+ * Hosting our learn by example boxes in [sadedegel.ai](http://sadedegel.ai)
+
+### Machine Learning (ML), Deep Learning (DL) and Natural Language Processing (NLP)
+* Resources on Extractive Text Summarization:
+
+ * [Leveraging BERT for Extractive Text Summarization on Lectures](https://arxiv.org/abs/1906.04165) by Derek Miller
+ * [Fine-tune BERT for Extractive Summarization](https://arxiv.org/pdf/1903.10318.pdf) by Yang Liu
+
+* Other NLP related references
+
+ * [ROUGE: A Package for Automatic Evaluation of Summaries](https://www.aclweb.org/anthology/W04-1013.pdf)
+ * [Speech and Language Processing, Second Edition](https://web.stanford.edu/~jurafsky/slp3/)
+
+
+
+
+%package help
+Summary: Development documents and examples for sadedegel
+Provides: python3-sadedegel-doc
+%description help
+<a href="http://sadedegel.ai"><img src="https://sadedegel.ai/assets/img/logo-2.png" width="125" height="125" align="right" /></a>
+
+# SadedeGel: A General Purpose NLP library for Turkish
+
+SadedeGel is initially designed to be a library for unsupervised extraction-based news summarization using several old and new NLP techniques.
+
+Development of the library started as a part of [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/) in which SadedeGel was the **2nd place winner**.
+
+We are keeping on adding features with the goal of becoming a general purpose open source NLP library for Turkish language.
+
+
+💫 **Version 0.21 out now!**
+[Check out the release notes here.](https://github.com/GlobalMaksimum/sadedegel/releases)
+
+
+![Python package](https://github.com/GlobalMaksimum/sadedegel/workflows/Python%20package/badge.svg)
+[![Python Version](https://img.shields.io/pypi/pyversions/sadedegel?style=plastic)](https://img.shields.io/pypi/pyversions/sadedegel)
+[![Coverage](https://codecov.io/gh/globalmaksimum/sadedegel/branch/master/graphs/badge.svg?style=plastic)](https://codecov.io/gh/globalmaksimum/sadedegel)
+[![pypi Version](https://img.shields.io/pypi/v/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![PyPi downloads](https://img.shields.io/pypi/dm/sadedegel?style=plastic&logo=PyPI)](https://pypi.org/project/sadedegel/)
+[![License](https://img.shields.io/pypi/l/sadedegel)](https://github.com/GlobalMaksimum/sadedegel/blob/master/LICENSE)
+![Commit Month](https://img.shields.io/github/commit-activity/m/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Commit Week](https://img.shields.io/github/commit-activity/w/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+![Last Commit](https://img.shields.io/github/last-commit/globalmaksimum/sadedegel?style=plastic&logo=GitHub)
+[![Binder](https://mybinder.org/badge_logo.svg?style=plastic)](https://mybinder.org/v2/gh/GlobalMaksimum/sadedegel.git/master?filepath=notebook%2FBasics.ipynb)
+[![Slack](https://img.shields.io/static/v1?logo=slack&style=plastic&color=blueviolet&label=slack&labelColor=grey&message=sadedegel)](https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A)
+[![Kaggle](http://img.shields.io/static/v1?logo=kaggle&style=plastic&color=blue&label=kaggle&labelColor=grey&message=notebooks)](https://www.kaggle.com/search?q=sadedegel+in%3Anotebooks)
+
+
+## 📖 Documentation
+
+| Documentation | |
+| --------------- | -------------------------------------------------------------- |
+| [Contribute] | How to contribute to the sadedeGel project and code base. |
+
+[contribute]: https://github.com/GlobalMaksimum/sadedegel/blob/master/CONTRIBUTING.md
+
+## 💬 Where to ask questions
+
+The SadedeGel project is initialized by [@globalmaksimum](https://github.com/GlobalMaksimum) AI team members
+[@dafajon](https://github.com/dafajon),
+[@askarbozcan](https://github.com/askarbozcan),
+[@mccakir](https://github.com/mccakir),
+[@husnusensoy](https://github.com/husnusensoy) and
+[@ertugruldemir](https://github.com/ertugrul-dmr).
+
+
+Other community maintainers
+
+* [@doruktiktiklar](https://github.com/doruktiktiklar) contributes [TFIDF Summarizer](sadedegel/summarize/tf_idf.py)
+
+| Type | Platforms |
+| ------------------------ | ------------------------------------------------------ |
+| 🚨 **Bug Reports** | [GitHub Issue Tracker] |
+| 🎁 **Feature Requests** | [GitHub Issue Tracker] |
+| <img width="18" height="18" src="https://www.freeiconspng.com/uploads/slack-icon-2.png"/> **Questions** | [Slack Workspace] |
+
+[github issue tracker]: https://github.com/GlobalMaksimum/sadedegel/issues
+[Slack Workspace]: https://join.slack.com/t/sadedegel/shared_invite/zt-h77u6aeq-VzEorB5QLHyJV90Fv4Ky3A
+
+
+## Features
+
+* Several datasets
+ * Basic corpus
+ * Raw corpus (`sadedegel.dataset.load_raw_corpus`)
+ * Sentences tokenized corpus (`sadedegel.dataset.load_sentences_corpus`)
+ * Human annotated summary corpus (`sadedegel.dataset.load_annotated_corpus`)
+ * [Extended corpus](sadedegel/dataset/README.md)
+ * Raw corpus (`sadedegel.dataset.extended.load_extended_raw_corpus`)
+ * Sentences tokenized corpus (`sadedegel.dataset.extended.load_extended_sents_corpus`)
+
+ * TsCorpus(`sadedegel.dataset.tscorpus`)
+ * Thanks to [Taner Sezer](https://github.com/tanerim), over 300K documents from tscorpus is also a part of sadedegel. Allowing us to
+ * [Evaluate](sadedegel/bblock/TOKENIZER.md) our tokenizers (word tokenizers)
+ * Build our [prebuilt news category classifier](sadedegel/prebuilt/README.md)
+ * Various domain specific [datasets](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/dataset) (e-commerce, social media, tourism etc.)
+* ML based sentence boundary detector (**SBD**) trained for Turkish language
+* Sadedegel Extractive Summarizers
+ * Various baseline summarizers
+ * Position Summarizer
+ * Length Summarizer
+ * Band Summarizer
+ * Random Summarizer
+
+ * Various unsupervised/supervised summarizers
+ * ROUGE1 Summarizer
+ * TextRank Summarizer
+ * Cluster Summarizer
+ * Lexrank Summarizer
+ * BM25 Summarizer
+ * TfIdf Summarizer
+
+* Various Word Tokenizers
+ * BERT Tokenizer - Trained tokenizer (`pip install sadedegel[bert]`)
+ * Simple Tokenizer - Regex Based
+ * IcU Tokenizer (default by `0.19`)
+
+* Various Sparse and Dense Embeddings implemented for `Sentences` and `Document` objects.
+ * BERT Embeddings (`pip install sadedegel[bert]`)
+ * TfIdf Embeddings
+
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension)
+
+* Word Vectors for your tokens (`pip install sadedegel[w2v]`)
+
+* A `sklearn` compatible [Feature Extraction API](https://github.com/GlobalMaksimum/sadedegel/tree/develop/sadedegel/extension)
+
+* [**Experimental**] Prebuilt models for several common NLP tasks ([`sadedegel.prebuilt`](sadedegel/prebuilt/README.md)).
+
+```python
+from sadedegel.prebuilt import news_classification
+
+model = news_classification.load()
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+y_pred = model.predict([doc_str])
+```
+
+📖 **For more details, refer to [sadedegel.ai](http://sadedegel.ai)**
+
+## Install sadedeGel
+
+- **Operating system**: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual
+ Studio)
+- **Python version**: 3.6+ (only 64 bit)
+- **Package managers**: [pip]
+
+[pip]: https://pypi.org/project/sadedegel/
+
+### pip
+
+Using pip, sadedeGel releases are available as source packages and binary wheels.
+
+```bash
+pip install sadedegel
+```
+or update now
+
+```bash
+pip install sadedegel -U
+```
+
+When using pip it is generally recommended to install packages in a virtual
+environment to avoid modifying system state:
+
+```bash
+python -m venv .env
+source .env/bin/activate
+pip install sadedegel
+```
+
+#### Vocabulary Dump
+
+Certaing attributes of SadedeGel's NLP objects are dependent on shipped vocabulary dumps that are created over `sadedegel.dataset.extened_corpus` via each of the existing SadedeGel tokenizers. Those tokenizers are listed above. If you want to re-train a specific tokenizer's vocabulary with custom settings:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple]
+```
+
+This will create a vocabulary dump using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+For all options to customize your vocab dump refer to:
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary --help
+```
+
+#### Optional
+
+To keep core sadedegel as light as possible we decomposed our initial monolitic design.
+
+To enable BERT embeddings and related capabilities use
+
+```bash
+pip install sadedegel[bert]
+```
+
+We ship 100-dimension word vectors with the library. If you need to re-train those word embeddings you can use
+
+```bash
+python -m sadedegel.bblock.cli build-vocabulary -t [bert|icu|simple] --w2v
+```
+`--w2v` option requires `w2v` option to be installed. To install option use
+
+This will create a vocabulary dump with keyed vectors of arbitrary size using `sadedegel.dataset.extended_corpus` based on custom user settings.
+
+
+```bash
+pip install sadedegel[w2v]
+```
+
+### Quickstart with SadedeGel
+
+To load SadedeGel, use `sadedegel.load()`
+
+```python
+from sadedegel import Doc
+from sadedegel.dataset import load_raw_corpus
+from sadedegel.summarize import Rouge1Summarizer
+
+raw = load_raw_corpus()
+
+d = Doc(next(raw))
+
+summarizer = Rouge1Summarizer()
+summarizer(d, k=5)
+```
+
+To trigger sadedeGel NLP pipeline, initialize `Doc` instance with a document string.
+
+Access all sentences using Python built-in `list` function.
+
+```python
+from sadedegel import Doc
+
+doc_str = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
+"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
+"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
+"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")
+
+doc = Doc(doc_str)
+
+list(doc)
+```
+```python
+['Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı.',
+ 'Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok daha büyük ölçekte.',
+ 'Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.',
+ 'IBM 650 Model I adını taşıyan bilgisayarın satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı.',
+ 'Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı.',
+ 'Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.']
+```
+
+Access sentences by index.
+
+```python
+doc[2]
+```
+
+```python
+Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.
+```
+
+## SadedeGel Server
+In order to integrate with your applications we provide a quick summarizer server with sadedeGel.
+
+```bash
+python3 -m sadedegel.server
+```
+
+### SadedeGel Server on Heroku
+[SadedeGel Server](https://sadedegel.herokuapp.com/api/info) is hosted on free tier of [Heroku](https://heroku.com) cloud services.
+
+* [OpenAPI Documentation](https://sadedegel.herokuapp.com/docs)
+* [Redoc Documentation](https://sadedegel.herokuapp.com/redoc)
+* [Redirection to sadedegel.ai](https://sadedegel.herokuapp.com)
+
+## PyLint, Flake8 and Bandit
+sadedeGel utilized [pylint](https://www.pylint.org/) for static code analysis,
+[flake8](https://flake8.pycqa.org/en/latest) for code styling and [bandit](https://pypi.org/project/bandit)
+for code security check.
+
+To run all tests
+
+```bash
+make lint
+```
+
+## Run tests
+
+sadedeGel comes with an [extensive test suite](sadedegel/tests). In order to run the
+tests, you'll usually want to clone the repository and build sadedeGel from source.
+This will also install the required development dependencies and test utilities
+defined in the `requirements.txt`.
+
+Alternatively, you can find out where sadedeGel is installed and run `pytest` on
+that directory. Don't forget to also install the test utilities via sadedeGel's
+`requirements.txt`:
+
+```bash
+make test
+```
+
+## 📓 Kaggle
+
+* Check [comprehensive notebook](https://www.kaggle.com/datafan07/clickbait-news-classification-using-sadedegel) of Kaggle Master [Ertugrul Demir](https://www.kaggle.com/datafan07) explaining the capabilities of sadedegel on Turkish clickbate dataset
+
+
+## Youtube Channel
+Some videos from [sadedeGel YouTube Channel](https://www.youtube.com/channel/UCyNG1Mehl44XWZ8LzkColuw)
+
+### SkyLab YTU Webinar Playlist
+
+[![Youtube](https://img.shields.io/youtube/likes/xoEERspk6Is?label=SadedeGel%20Subprojects%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=xoEERspk6Is)
+
+[![Youtube](https://img.shields.io/youtube/likes/HfWIzAwf5u8?label=SadedeGel%20Scraper%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=HfWIzAwf5u8)
+
+[![Youtube](https://img.shields.io/youtube/likes/PkUmYhahiMw?label=SadedeGel%20Evaluation-nDCG%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=PkUmYhahiMw)
+
+[![Youtube](https://img.shields.io/youtube/likes/AxpK7fOndRQ?label=SadedeGel%20Annotator%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=AxpK7fOndRQ)
+
+[![Youtube](https://img.shields.io/youtube/likes/jKh_t9ZOJ-g?label=SadedeGel%20Baseline%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=jKh_t9ZOJ-g)
+
+[![Youtube](https://img.shields.io/youtube/likes/3DO1X7de1FI?label=SadedeGel%20ROUGE1%20Özetleyici%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=3DO1X7de1FI)
+
+[![Youtube](https://img.shields.io/youtube/likes/KGg3DJQVH9c?label=SadedeGel%20Kümeleme%20Bazlı%20Özetleyiciler%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=KGg3DJQVH9c)
+
+[![Youtube](https://img.shields.io/youtube/likes/G_erifsGGFs?label=SadedeGel%20BERT%20Embeddings%20(Turkish)&style=social&withDislikes)](https://www.youtube.com/watch?v=G_erifsGGFs)
+
+## References
+
+### Special Thanks
+
+* [Starlang Software](https://starlangyazilim.com/) for their contribution to open source Turkish NLP development and corpus preperation.
+
+* [Olcay Taner Yıldız, Ph.D.](https://github.com/olcaytaner), one of our refrees in [Açık Kaynak Hackathon Programı 2020](https://www.acikhack.com/), for helping our development on sadedegel.
+
+* [Taner Sezer](https://github.com/tanerim) for his contribution on tokenization corpus and labeled news corpus.
+
+### Our Community Contributors
+
+We would like to thank our community contributors for their bug/enhancement requests and questions to make sadedeGel better everyday
+
+* [Burak Işıklı](https://github.com/burakisikli)
+
+### Software Engineering
+* Special thanks to [spaCy](https://github.com/explosion/spaCy) project for their work in showing us the way to implement a proper python module rather than merely explaining it.
+ * We have borrowed many document and style related stuff from their code base :smile:
+
+* There are a few free-tier service providers we need to thank:
+ * [GitHub](https://github.com) for
+ * Hosting our projects.
+ * Making it possible to collobrate easily.
+ * Automating our SLM via [Github Actions](https://github.com/features/actions)
+ * [Google Cloud Google Storage Service](https://cloud.google.com/products/storage) for providing low cost storage buckets making it possible to store `sadedegel.dataset.extended` data.
+ * [Heroku](https://heroku.com) for hosting [sadedeGel Server](https://sadedegel.herokuapp.com/api/info) in their free tier dynos.
+ * [CodeCov](https://codecov.io/) for allowing us to transparently share our [test coverage](https://codecov.io/gh/globalmaksimum/sadedegel)
+ * [PyPI](https://pypi.org/) for allowing us to share [sadedegel](https://pypi.org/project/sadedegel) with you.
+ * [binder](https://mybinder.org/) for
+ * Allowing us to share our example [notebooks](notebook/)
+ * Hosting our learn by example boxes in [sadedegel.ai](http://sadedegel.ai)
+
+### Machine Learning (ML), Deep Learning (DL) and Natural Language Processing (NLP)
+* Resources on Extractive Text Summarization:
+
+ * [Leveraging BERT for Extractive Text Summarization on Lectures](https://arxiv.org/abs/1906.04165) by Derek Miller
+ * [Fine-tune BERT for Extractive Summarization](https://arxiv.org/pdf/1903.10318.pdf) by Yang Liu
+
+* Other NLP related references
+
+ * [ROUGE: A Package for Automatic Evaluation of Summaries](https://www.aclweb.org/anthology/W04-1013.pdf)
+ * [Speech and Language Processing, Second Edition](https://web.stanford.edu/~jurafsky/slp3/)
+
+
+
+
+%prep
+%autosetup -n sadedegel-0.21.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-sadedegel -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 31 2023 Python_Bot <Python_Bot@openeuler.org> - 0.21.2-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..a57676f
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+5c5e317121482a1938f0b1e73c5d51e1 sadedegel-0.21.2.tar.gz