%global _empty_manifest_terminate_build 0 Name: python-whatlies Version: 0.7.0 Release: 1 Summary: Tools to help uncover `whatlies` in word embeddings. License: MIT License URL: https://koaning.github.io/whatlies/ Source0: https://mirrors.aliyun.com/pypi/web/packages/17/54/bfbf6425a40f6fd3af190a751304ce132746c52626f1cc77dc2083d805c0/whatlies-0.7.0.tar.gz BuildArch: noarch Requires: python3-scikit-learn Requires: python3-altair Requires: python3-matplotlib Requires: python3-bpemb Requires: python3-gensim Requires: python3-tensorflow Requires: python3-tensorflow-text Requires: python3-tensorflow-hub Requires: python3-transformers Requires: python3-sense2vec Requires: python3-spacy Requires: python3-spacy-lookups-data Requires: python3-sentence-transformers Requires: python3-fasttext Requires: python3-umap-learn Requires: python3-floret Requires: python3-scikit-learn Requires: python3-altair Requires: python3-matplotlib Requires: python3-bpemb Requires: python3-gensim Requires: python3-mkdocs Requires: python3-mkdocs-material Requires: python3-mkdocstrings Requires: python3-jupyterlab Requires: python3-nbstripout Requires: python3-nbval Requires: python3-torch Requires: python3-flake8 Requires: python3-pytest Requires: python3-black Requires: python3-pytest-cov Requires: python3-pre-commit Requires: python3-tensorflow Requires: python3-tensorflow-text Requires: python3-tensorflow-hub Requires: python3-transformers Requires: python3-sense2vec Requires: python3-spacy Requires: python3-spacy-lookups-data Requires: python3-sentence-transformers Requires: python3-fasttext Requires: python3-umap-learn Requires: python3-floret Requires: python3-mkdocs Requires: python3-mkdocs-material Requires: python3-mkdocstrings Requires: python3-jupyterlab Requires: python3-nbstripout Requires: python3-nbval Requires: python3-fasttext Requires: python3-floret Requires: python3-scikit-learn Requires: python3-altair Requires: python3-matplotlib Requires: python3-bpemb Requires: python3-gensim Requires: python3-sense2vec Requires: python3-spacy Requires: python3-spacy-lookups-data Requires: python3-scikit-learn Requires: python3-altair Requires: python3-matplotlib Requires: python3-bpemb Requires: python3-gensim Requires: python3-sentence-transformers Requires: python3-scikit-learn Requires: python3-altair Requires: python3-matplotlib Requires: python3-bpemb Requires: python3-gensim Requires: python3-spacy Requires: python3-spacy-lookups-data Requires: python3-torch Requires: python3-flake8 Requires: python3-pytest Requires: python3-black Requires: python3-pytest-cov Requires: python3-nbval Requires: python3-pre-commit Requires: python3-scikit-learn Requires: python3-altair Requires: python3-matplotlib Requires: python3-bpemb Requires: python3-gensim Requires: python3-tensorflow Requires: python3-tensorflow-text Requires: python3-tensorflow-hub Requires: python3-scikit-learn Requires: python3-altair Requires: python3-matplotlib Requires: python3-bpemb Requires: python3-gensim Requires: python3-transformers Requires: python3-scikit-learn Requires: python3-altair Requires: python3-matplotlib Requires: python3-bpemb Requires: python3-gensim Requires: python3-umap-learn %description ![](https://img.shields.io/pypi/v/whatlies) ![](https://img.shields.io/pypi/pyversions/whatlies) ![](https://img.shields.io/github/license/koaning/whatlies) [![Downloads](https://pepy.tech/badge/whatlies)](https://pepy.tech/project/whatlies) # whatlies A library that tries to help you to understand (note the pun). > "What lies in word embeddings?" This small library offers tools to make visualisation easier of both word embeddings as well as operations on them. ## Produced This project was initiated at [Rasa](https://rasa.com) as a by-product of our efforts in the developer advocacy and research teams. The project is maintained by [koaning](https://github.com/koaning) in order to support more use-cases. ## Features This library has tools to help you understand what lies in word embeddings. This includes: - simple tools to create (interactive) visualisations - support for many language backends including spaCy, fasttext, tfhub, huggingface and bpemb - lightweight scikit-learn featurizer support for all these backends ## Installation You can install the package via pip; ```bash pip install whatlies ``` This will install the base dependencies. Depending on the transformers and language backends that you'll be using you may want to install more. Here's some of the possible installation settings you could go for. ```bash pip install whatlies[spacy] pip install whatlies[tfhub] pip install whatlies[transformers] ``` If you want it all you can also install via; ```bash pip install whatlies[all] ``` Note that this will install dependencies but it **will not** install all the language models you might want to visualise. For example, you might still need to manually download spaCy models if you intend to use that backend. ## Getting Started More in depth getting started guides can be found on the [documentation page](https://koaning.github.io/whatlies/). ## Examples The idea is that you can load embeddings from a language backend and use mathematical operations on it. ```python from whatlies import EmbeddingSet from whatlies.language import SpacyLanguage lang = SpacyLanguage("en_core_web_md") words = ["cat", "dog", "fish", "kitten", "man", "woman", "king", "queen", "doctor", "nurse"] emb = EmbeddingSet(*[lang[w] for w in words]) emb.plot_interactive(x_axis=emb["man"], y_axis=emb["woman"]) ``` ![](docs/gif-zero.gif) You can even do fancy operations. Like projecting onto and away from vector embeddings! You can perform these on embeddings as well as sets of embeddings. In the example below we attempt to filter away gender bias using linear algebra operations. ```python orig_chart = emb.plot_interactive('man', 'woman') new_ts = emb | (emb['king'] - emb['queen']) new_chart = new_ts.plot_interactive('man', 'woman') ``` ![](docs/gif-one.gif) There's also things like **pca** and **umap**. ```python from whatlies.transformers import Pca, Umap orig_chart = emb.plot_interactive('man', 'woman') pca_plot = emb.transform(Pca(2)).plot_interactive() umap_plot = emb.transform(Umap(2)).plot_interactive() pca_plot | umap_plot ``` ![](docs/gif-two.gif) ## Scikit-Learn Support Every language backend in this video is available as a scikit-learn featurizer as well. ```python import numpy as np from whatlies.language import BytePairLanguage from sklearn.pipeline import Pipeline from sklearn.linear_model import LogisticRegression pipe = Pipeline([ ("embed", BytePairLanguage("en")), ("model", LogisticRegression()) ]) X = [ "i really like this post", "thanks for that comment", "i enjoy this friendly forum", "this is a bad post", "i dislike this article", "this is not well written" ] y = np.array([1, 1, 1, 0, 0, 0]) pipe.fit(X, y) ``` ## Documentation To learn more and for a getting started guide, check out the [documentation](https://koaning.github.io/whatlies/). ## Similar Projects There are some similar projects out and we figured it fair to mention and compare them here.
Julia Bazińska & Piotr Migdal Web App

The original inspiration for this project came from this web app and this pydata talk. It is a web app that takes a while to load but it is really fun to play with. The goal of this project is to make it easier to make similar charts from jupyter using different language backends.

Tensorflow Projector

From google there's the tensorflow projector project. It offers highly interactive 3d visualisations as well as some transformations via tensorboard.

Parallax

From Uber AI Labs there's parallax which is described in a paper here. There's a common mindset in the two tools; the goal is to use arbitrary user defined projections to understand embedding spaces better. That said, some differences that are worth to mention.

## Local Development If you want to develop locally you can start by running this command. ```bash make develop ``` ### Documentation This is generated via ``` make docs ``` ### Citation Please use the following citation when you found `whatlies` helpful for any of your work (find the `whatlies` paper [here](https://www.aclweb.org/anthology/2020.nlposs-1.8)): ``` @inproceedings{warmerdam-etal-2020-going, title = "Going Beyond {T}-{SNE}: Exposing whatlies in Text Embeddings", author = "Warmerdam, Vincent and Kober, Thomas and Tatman, Rachael", booktitle = "Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.nlposs-1.8", doi = "10.18653/v1/2020.nlposs-1.8", pages = "52--60", abstract = "We introduce whatlies, an open source toolkit for visually inspecting word and sentence embeddings. The project offers a unified and extensible API with current support for a range of popular embedding backends including spaCy, tfhub, huggingface transformers, gensim, fastText and BytePair embeddings. The package combines a domain specific language for vector arithmetic with visualisation tools that make exploring word embeddings more intuitive and concise. It offers support for many popular dimensionality reduction techniques as well as many interactive visualisations that can either be statically exported or shared via Jupyter notebooks. The project documentation is available from https://koaning.github.io/whatlies/.", } ``` %package -n python3-whatlies Summary: Tools to help uncover `whatlies` in word embeddings. Provides: python-whatlies BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-whatlies ![](https://img.shields.io/pypi/v/whatlies) ![](https://img.shields.io/pypi/pyversions/whatlies) ![](https://img.shields.io/github/license/koaning/whatlies) [![Downloads](https://pepy.tech/badge/whatlies)](https://pepy.tech/project/whatlies) # whatlies A library that tries to help you to understand (note the pun). > "What lies in word embeddings?" This small library offers tools to make visualisation easier of both word embeddings as well as operations on them. ## Produced This project was initiated at [Rasa](https://rasa.com) as a by-product of our efforts in the developer advocacy and research teams. The project is maintained by [koaning](https://github.com/koaning) in order to support more use-cases. ## Features This library has tools to help you understand what lies in word embeddings. This includes: - simple tools to create (interactive) visualisations - support for many language backends including spaCy, fasttext, tfhub, huggingface and bpemb - lightweight scikit-learn featurizer support for all these backends ## Installation You can install the package via pip; ```bash pip install whatlies ``` This will install the base dependencies. Depending on the transformers and language backends that you'll be using you may want to install more. Here's some of the possible installation settings you could go for. ```bash pip install whatlies[spacy] pip install whatlies[tfhub] pip install whatlies[transformers] ``` If you want it all you can also install via; ```bash pip install whatlies[all] ``` Note that this will install dependencies but it **will not** install all the language models you might want to visualise. For example, you might still need to manually download spaCy models if you intend to use that backend. ## Getting Started More in depth getting started guides can be found on the [documentation page](https://koaning.github.io/whatlies/). ## Examples The idea is that you can load embeddings from a language backend and use mathematical operations on it. ```python from whatlies import EmbeddingSet from whatlies.language import SpacyLanguage lang = SpacyLanguage("en_core_web_md") words = ["cat", "dog", "fish", "kitten", "man", "woman", "king", "queen", "doctor", "nurse"] emb = EmbeddingSet(*[lang[w] for w in words]) emb.plot_interactive(x_axis=emb["man"], y_axis=emb["woman"]) ``` ![](docs/gif-zero.gif) You can even do fancy operations. Like projecting onto and away from vector embeddings! You can perform these on embeddings as well as sets of embeddings. In the example below we attempt to filter away gender bias using linear algebra operations. ```python orig_chart = emb.plot_interactive('man', 'woman') new_ts = emb | (emb['king'] - emb['queen']) new_chart = new_ts.plot_interactive('man', 'woman') ``` ![](docs/gif-one.gif) There's also things like **pca** and **umap**. ```python from whatlies.transformers import Pca, Umap orig_chart = emb.plot_interactive('man', 'woman') pca_plot = emb.transform(Pca(2)).plot_interactive() umap_plot = emb.transform(Umap(2)).plot_interactive() pca_plot | umap_plot ``` ![](docs/gif-two.gif) ## Scikit-Learn Support Every language backend in this video is available as a scikit-learn featurizer as well. ```python import numpy as np from whatlies.language import BytePairLanguage from sklearn.pipeline import Pipeline from sklearn.linear_model import LogisticRegression pipe = Pipeline([ ("embed", BytePairLanguage("en")), ("model", LogisticRegression()) ]) X = [ "i really like this post", "thanks for that comment", "i enjoy this friendly forum", "this is a bad post", "i dislike this article", "this is not well written" ] y = np.array([1, 1, 1, 0, 0, 0]) pipe.fit(X, y) ``` ## Documentation To learn more and for a getting started guide, check out the [documentation](https://koaning.github.io/whatlies/). ## Similar Projects There are some similar projects out and we figured it fair to mention and compare them here.
Julia Bazińska & Piotr Migdal Web App

The original inspiration for this project came from this web app and this pydata talk. It is a web app that takes a while to load but it is really fun to play with. The goal of this project is to make it easier to make similar charts from jupyter using different language backends.

Tensorflow Projector

From google there's the tensorflow projector project. It offers highly interactive 3d visualisations as well as some transformations via tensorboard.

Parallax

From Uber AI Labs there's parallax which is described in a paper here. There's a common mindset in the two tools; the goal is to use arbitrary user defined projections to understand embedding spaces better. That said, some differences that are worth to mention.

## Local Development If you want to develop locally you can start by running this command. ```bash make develop ``` ### Documentation This is generated via ``` make docs ``` ### Citation Please use the following citation when you found `whatlies` helpful for any of your work (find the `whatlies` paper [here](https://www.aclweb.org/anthology/2020.nlposs-1.8)): ``` @inproceedings{warmerdam-etal-2020-going, title = "Going Beyond {T}-{SNE}: Exposing whatlies in Text Embeddings", author = "Warmerdam, Vincent and Kober, Thomas and Tatman, Rachael", booktitle = "Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.nlposs-1.8", doi = "10.18653/v1/2020.nlposs-1.8", pages = "52--60", abstract = "We introduce whatlies, an open source toolkit for visually inspecting word and sentence embeddings. The project offers a unified and extensible API with current support for a range of popular embedding backends including spaCy, tfhub, huggingface transformers, gensim, fastText and BytePair embeddings. The package combines a domain specific language for vector arithmetic with visualisation tools that make exploring word embeddings more intuitive and concise. It offers support for many popular dimensionality reduction techniques as well as many interactive visualisations that can either be statically exported or shared via Jupyter notebooks. The project documentation is available from https://koaning.github.io/whatlies/.", } ``` %package help Summary: Development documents and examples for whatlies Provides: python3-whatlies-doc %description help ![](https://img.shields.io/pypi/v/whatlies) ![](https://img.shields.io/pypi/pyversions/whatlies) ![](https://img.shields.io/github/license/koaning/whatlies) [![Downloads](https://pepy.tech/badge/whatlies)](https://pepy.tech/project/whatlies) # whatlies A library that tries to help you to understand (note the pun). > "What lies in word embeddings?" This small library offers tools to make visualisation easier of both word embeddings as well as operations on them. ## Produced This project was initiated at [Rasa](https://rasa.com) as a by-product of our efforts in the developer advocacy and research teams. The project is maintained by [koaning](https://github.com/koaning) in order to support more use-cases. ## Features This library has tools to help you understand what lies in word embeddings. This includes: - simple tools to create (interactive) visualisations - support for many language backends including spaCy, fasttext, tfhub, huggingface and bpemb - lightweight scikit-learn featurizer support for all these backends ## Installation You can install the package via pip; ```bash pip install whatlies ``` This will install the base dependencies. Depending on the transformers and language backends that you'll be using you may want to install more. Here's some of the possible installation settings you could go for. ```bash pip install whatlies[spacy] pip install whatlies[tfhub] pip install whatlies[transformers] ``` If you want it all you can also install via; ```bash pip install whatlies[all] ``` Note that this will install dependencies but it **will not** install all the language models you might want to visualise. For example, you might still need to manually download spaCy models if you intend to use that backend. ## Getting Started More in depth getting started guides can be found on the [documentation page](https://koaning.github.io/whatlies/). ## Examples The idea is that you can load embeddings from a language backend and use mathematical operations on it. ```python from whatlies import EmbeddingSet from whatlies.language import SpacyLanguage lang = SpacyLanguage("en_core_web_md") words = ["cat", "dog", "fish", "kitten", "man", "woman", "king", "queen", "doctor", "nurse"] emb = EmbeddingSet(*[lang[w] for w in words]) emb.plot_interactive(x_axis=emb["man"], y_axis=emb["woman"]) ``` ![](docs/gif-zero.gif) You can even do fancy operations. Like projecting onto and away from vector embeddings! You can perform these on embeddings as well as sets of embeddings. In the example below we attempt to filter away gender bias using linear algebra operations. ```python orig_chart = emb.plot_interactive('man', 'woman') new_ts = emb | (emb['king'] - emb['queen']) new_chart = new_ts.plot_interactive('man', 'woman') ``` ![](docs/gif-one.gif) There's also things like **pca** and **umap**. ```python from whatlies.transformers import Pca, Umap orig_chart = emb.plot_interactive('man', 'woman') pca_plot = emb.transform(Pca(2)).plot_interactive() umap_plot = emb.transform(Umap(2)).plot_interactive() pca_plot | umap_plot ``` ![](docs/gif-two.gif) ## Scikit-Learn Support Every language backend in this video is available as a scikit-learn featurizer as well. ```python import numpy as np from whatlies.language import BytePairLanguage from sklearn.pipeline import Pipeline from sklearn.linear_model import LogisticRegression pipe = Pipeline([ ("embed", BytePairLanguage("en")), ("model", LogisticRegression()) ]) X = [ "i really like this post", "thanks for that comment", "i enjoy this friendly forum", "this is a bad post", "i dislike this article", "this is not well written" ] y = np.array([1, 1, 1, 0, 0, 0]) pipe.fit(X, y) ``` ## Documentation To learn more and for a getting started guide, check out the [documentation](https://koaning.github.io/whatlies/). ## Similar Projects There are some similar projects out and we figured it fair to mention and compare them here.
Julia Bazińska & Piotr Migdal Web App

The original inspiration for this project came from this web app and this pydata talk. It is a web app that takes a while to load but it is really fun to play with. The goal of this project is to make it easier to make similar charts from jupyter using different language backends.

Tensorflow Projector

From google there's the tensorflow projector project. It offers highly interactive 3d visualisations as well as some transformations via tensorboard.

Parallax

From Uber AI Labs there's parallax which is described in a paper here. There's a common mindset in the two tools; the goal is to use arbitrary user defined projections to understand embedding spaces better. That said, some differences that are worth to mention.

## Local Development If you want to develop locally you can start by running this command. ```bash make develop ``` ### Documentation This is generated via ``` make docs ``` ### Citation Please use the following citation when you found `whatlies` helpful for any of your work (find the `whatlies` paper [here](https://www.aclweb.org/anthology/2020.nlposs-1.8)): ``` @inproceedings{warmerdam-etal-2020-going, title = "Going Beyond {T}-{SNE}: Exposing whatlies in Text Embeddings", author = "Warmerdam, Vincent and Kober, Thomas and Tatman, Rachael", booktitle = "Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.nlposs-1.8", doi = "10.18653/v1/2020.nlposs-1.8", pages = "52--60", abstract = "We introduce whatlies, an open source toolkit for visually inspecting word and sentence embeddings. The project offers a unified and extensible API with current support for a range of popular embedding backends including spaCy, tfhub, huggingface transformers, gensim, fastText and BytePair embeddings. The package combines a domain specific language for vector arithmetic with visualisation tools that make exploring word embeddings more intuitive and concise. It offers support for many popular dimensionality reduction techniques as well as many interactive visualisations that can either be statically exported or shared via Jupyter notebooks. The project documentation is available from https://koaning.github.io/whatlies/.", } ``` %prep %autosetup -n whatlies-0.7.0 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-whatlies -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Thu Jun 08 2023 Python_Bot - 0.7.0-1 - Package Spec generated