%global _empty_manifest_terminate_build 0
Name:		python-wikipedia2vec
Version:	1.0.5
Release:	1
Summary:	A tool for learning vector representations of words and entities from Wikipedia
License:	Apache Software License
URL:		http://wikipedia2vec.github.io/
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/89/83/15ab878fe5a93590b80bac8c3a8b0ad5f5dec5d0ea1071f9a17dbce5c33b/wikipedia2vec-1.0.5.tar.gz
BuildArch:	noarch


%description
[![Fury badge](https://badge.fury.io/py/wikipedia2vec.png)](http://badge.fury.io/py/wikipedia2vec)
[![CircleCI](https://circleci.com/gh/wikipedia2vec/wikipedia2vec.svg?style=svg)](https://circleci.com/gh/wikipedia2vec/wikipedia2vec)
Wikipedia2Vec is a tool used for obtaining embeddings (or vector representations) of words and entities (i.e., concepts that have corresponding pages in Wikipedia) from Wikipedia.
It is developed and maintained by [Studio Ousia](http://www.ousia.jp).
This tool enables you to learn embeddings of words and entities simultaneously, and places similar words and entities close to one another in a continuous vector space.
Embeddings can be easily trained by a single command with a publicly available Wikipedia dump as input.
This tool implements the [conventional skip-gram model](https://en.wikipedia.org/wiki/Word2vec) to learn the embeddings of words, and its extension proposed in [Yamada et al. (2016)](https://arxiv.org/abs/1601.01343) to learn the embeddings of entities.
An empirical comparison between Wikipedia2Vec and existing embedding tools (i.e., FastText, Gensim, RDF2Vec, and Wiki2vec) is available [here](https://arxiv.org/abs/1812.06280).
Documentation  are available online at [http://wikipedia2vec.github.io/](http://wikipedia2vec.github.io/).
## Basic Usage
Wikipedia2Vec can be installed via PyPI:
```bash
% pip install wikipedia2vec
```
With this tool, embeddings can be learned by running a *train* command with a Wikipedia dump as input.
For example, the following commands download the latest English Wikipedia dump and learn embeddings from this dump:
```bash
% wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
% wikipedia2vec train enwiki-latest-pages-articles.xml.bz2 MODEL_FILE
```
Then, the learned embeddings are written to *MODEL\_FILE*.
Note that this command can take many optional parameters.
Please refer to [our documentation](https://wikipedia2vec.github.io/wikipedia2vec/commands/) for further details.
## Pretrained Embeddings
Pretrained embeddings for 12 languages (i.e., English, Arabic, Chinese, Dutch, French, German, Italian, Japanese, Polish, Portuguese, Russian, and Spanish) can be downloaded from [this page](https://wikipedia2vec.github.io/wikipedia2vec/pretrained/).
## Use Cases
Wikipedia2Vec has been applied to the following tasks:
* Entity linking: [Yamada et al., 2016](https://arxiv.org/abs/1601.01343), [Eshel et al., 2017](https://arxiv.org/abs/1706.09147), [Chen et al., 2019](https://arxiv.org/abs/1911.03834), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681), [van Hulst et al., 2020](https://arxiv.org/abs/2006.01969).
* Named entity recognition: [Sato et al., 2017](http://www.aclweb.org/anthology/I17-2017), [Lara-Clares and Garcia-Serrano, 2019](http://ceur-ws.org/Vol-2421/eHealth-KD_paper_6.pdf).
* Question answering: [Yamada et al., 2017](https://arxiv.org/abs/1803.08652), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
* Entity typing: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960).
* Text classification: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960), [Yamada and Shindo, 2019](https://arxiv.org/abs/1909.01259), [Alam et al., 2020](https://link.springer.com/chapter/10.1007/978-3-030-61244-3_9).
* Relation classification: [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
* Paraphrase detection: [Duong et al., 2018](https://ieeexplore.ieee.org/abstract/document/8606845).
* Knowledge graph completion: [Shah et al., 2019](https://aaai.org/ojs/index.php/AAAI/article/view/4162), [Shah et al., 2020](https://www.aclweb.org/anthology/2020.textgraphs-1.9/).
* Fake news detection: [Singh et al., 2019](https://arxiv.org/abs/1906.11126), [Ghosal et al., 2020](https://arxiv.org/abs/2010.10836).
* Plot analysis of movies: [Papalampidi et al., 2019](https://arxiv.org/abs/1908.10328).
* Novel entity discovery: [Zhang et al., 2020](https://arxiv.org/abs/2002.00206).
* Entity retrieval: [Gerritse et al., 2020](https://link.springer.com/chapter/10.1007%2F978-3-030-45439-5_7).
* Deepfake detection: [Zhong et al., 2020](https://arxiv.org/abs/2010.07475).
* Conversational information seeking: [Rodriguez et al., 2020](https://arxiv.org/abs/2005.00172).
* Query expansion: [Rosin et al., 2020](https://arxiv.org/abs/2012.12065).
## References
If you use Wikipedia2Vec in a scientific publication, please cite the following paper:
Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto, [Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia](https://arxiv.org/abs/1812.06280).
```
@inproceedings{yamada2020wikipedia2vec,
  title = "{W}ikipedia2{V}ec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from {W}ikipedia",
  author={Yamada, Ikuya and Asai, Akari and Sakuma, Jin and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu and Matsumoto, Yuji},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year = {2020},
  publisher = {Association for Computational Linguistics},
  pages = {23--30}
}
```
The embedding model was originally proposed in the following paper:
Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, [Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation](https://arxiv.org/abs/1601.01343).
```
@inproceedings{yamada2016joint,
  title={Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation},
  author={Yamada, Ikuya and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu},
  booktitle={Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning},
  year={2016},
  publisher={Association for Computational Linguistics},
  pages={250--259}
}
```
The text classification model implemented in [this example](https://github.com/wikipedia2vec/wikipedia2vec/tree/master/examples/text_classification) was proposed in the following paper:
Ikuya Yamada, Hiroyuki Shindo, [Neural Attentive Bag-of-Entities Model for Text Classification](https://arxiv.org/abs/1909.01259).
```
@article{yamada2019neural,
  title={Neural Attentive Bag-of-Entities Model for Text Classification},
  author={Yamada, Ikuya and Shindo, Hiroyuki},
  booktitle={Proceedings of The 23th SIGNLL Conference on Computational Natural Language Learning},
  year={2019},
  publisher={Association for Computational Linguistics},
  pages = {563--573}
}
```
## License
[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0)

%package -n python3-wikipedia2vec
Summary:	A tool for learning vector representations of words and entities from Wikipedia
Provides:	python-wikipedia2vec
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-wikipedia2vec
[![Fury badge](https://badge.fury.io/py/wikipedia2vec.png)](http://badge.fury.io/py/wikipedia2vec)
[![CircleCI](https://circleci.com/gh/wikipedia2vec/wikipedia2vec.svg?style=svg)](https://circleci.com/gh/wikipedia2vec/wikipedia2vec)
Wikipedia2Vec is a tool used for obtaining embeddings (or vector representations) of words and entities (i.e., concepts that have corresponding pages in Wikipedia) from Wikipedia.
It is developed and maintained by [Studio Ousia](http://www.ousia.jp).
This tool enables you to learn embeddings of words and entities simultaneously, and places similar words and entities close to one another in a continuous vector space.
Embeddings can be easily trained by a single command with a publicly available Wikipedia dump as input.
This tool implements the [conventional skip-gram model](https://en.wikipedia.org/wiki/Word2vec) to learn the embeddings of words, and its extension proposed in [Yamada et al. (2016)](https://arxiv.org/abs/1601.01343) to learn the embeddings of entities.
An empirical comparison between Wikipedia2Vec and existing embedding tools (i.e., FastText, Gensim, RDF2Vec, and Wiki2vec) is available [here](https://arxiv.org/abs/1812.06280).
Documentation  are available online at [http://wikipedia2vec.github.io/](http://wikipedia2vec.github.io/).
## Basic Usage
Wikipedia2Vec can be installed via PyPI:
```bash
% pip install wikipedia2vec
```
With this tool, embeddings can be learned by running a *train* command with a Wikipedia dump as input.
For example, the following commands download the latest English Wikipedia dump and learn embeddings from this dump:
```bash
% wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
% wikipedia2vec train enwiki-latest-pages-articles.xml.bz2 MODEL_FILE
```
Then, the learned embeddings are written to *MODEL\_FILE*.
Note that this command can take many optional parameters.
Please refer to [our documentation](https://wikipedia2vec.github.io/wikipedia2vec/commands/) for further details.
## Pretrained Embeddings
Pretrained embeddings for 12 languages (i.e., English, Arabic, Chinese, Dutch, French, German, Italian, Japanese, Polish, Portuguese, Russian, and Spanish) can be downloaded from [this page](https://wikipedia2vec.github.io/wikipedia2vec/pretrained/).
## Use Cases
Wikipedia2Vec has been applied to the following tasks:
* Entity linking: [Yamada et al., 2016](https://arxiv.org/abs/1601.01343), [Eshel et al., 2017](https://arxiv.org/abs/1706.09147), [Chen et al., 2019](https://arxiv.org/abs/1911.03834), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681), [van Hulst et al., 2020](https://arxiv.org/abs/2006.01969).
* Named entity recognition: [Sato et al., 2017](http://www.aclweb.org/anthology/I17-2017), [Lara-Clares and Garcia-Serrano, 2019](http://ceur-ws.org/Vol-2421/eHealth-KD_paper_6.pdf).
* Question answering: [Yamada et al., 2017](https://arxiv.org/abs/1803.08652), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
* Entity typing: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960).
* Text classification: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960), [Yamada and Shindo, 2019](https://arxiv.org/abs/1909.01259), [Alam et al., 2020](https://link.springer.com/chapter/10.1007/978-3-030-61244-3_9).
* Relation classification: [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
* Paraphrase detection: [Duong et al., 2018](https://ieeexplore.ieee.org/abstract/document/8606845).
* Knowledge graph completion: [Shah et al., 2019](https://aaai.org/ojs/index.php/AAAI/article/view/4162), [Shah et al., 2020](https://www.aclweb.org/anthology/2020.textgraphs-1.9/).
* Fake news detection: [Singh et al., 2019](https://arxiv.org/abs/1906.11126), [Ghosal et al., 2020](https://arxiv.org/abs/2010.10836).
* Plot analysis of movies: [Papalampidi et al., 2019](https://arxiv.org/abs/1908.10328).
* Novel entity discovery: [Zhang et al., 2020](https://arxiv.org/abs/2002.00206).
* Entity retrieval: [Gerritse et al., 2020](https://link.springer.com/chapter/10.1007%2F978-3-030-45439-5_7).
* Deepfake detection: [Zhong et al., 2020](https://arxiv.org/abs/2010.07475).
* Conversational information seeking: [Rodriguez et al., 2020](https://arxiv.org/abs/2005.00172).
* Query expansion: [Rosin et al., 2020](https://arxiv.org/abs/2012.12065).
## References
If you use Wikipedia2Vec in a scientific publication, please cite the following paper:
Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto, [Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia](https://arxiv.org/abs/1812.06280).
```
@inproceedings{yamada2020wikipedia2vec,
  title = "{W}ikipedia2{V}ec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from {W}ikipedia",
  author={Yamada, Ikuya and Asai, Akari and Sakuma, Jin and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu and Matsumoto, Yuji},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year = {2020},
  publisher = {Association for Computational Linguistics},
  pages = {23--30}
}
```
The embedding model was originally proposed in the following paper:
Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, [Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation](https://arxiv.org/abs/1601.01343).
```
@inproceedings{yamada2016joint,
  title={Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation},
  author={Yamada, Ikuya and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu},
  booktitle={Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning},
  year={2016},
  publisher={Association for Computational Linguistics},
  pages={250--259}
}
```
The text classification model implemented in [this example](https://github.com/wikipedia2vec/wikipedia2vec/tree/master/examples/text_classification) was proposed in the following paper:
Ikuya Yamada, Hiroyuki Shindo, [Neural Attentive Bag-of-Entities Model for Text Classification](https://arxiv.org/abs/1909.01259).
```
@article{yamada2019neural,
  title={Neural Attentive Bag-of-Entities Model for Text Classification},
  author={Yamada, Ikuya and Shindo, Hiroyuki},
  booktitle={Proceedings of The 23th SIGNLL Conference on Computational Natural Language Learning},
  year={2019},
  publisher={Association for Computational Linguistics},
  pages = {563--573}
}
```
## License
[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0)

%package help
Summary:	Development documents and examples for wikipedia2vec
Provides:	python3-wikipedia2vec-doc
%description help
[![Fury badge](https://badge.fury.io/py/wikipedia2vec.png)](http://badge.fury.io/py/wikipedia2vec)
[![CircleCI](https://circleci.com/gh/wikipedia2vec/wikipedia2vec.svg?style=svg)](https://circleci.com/gh/wikipedia2vec/wikipedia2vec)
Wikipedia2Vec is a tool used for obtaining embeddings (or vector representations) of words and entities (i.e., concepts that have corresponding pages in Wikipedia) from Wikipedia.
It is developed and maintained by [Studio Ousia](http://www.ousia.jp).
This tool enables you to learn embeddings of words and entities simultaneously, and places similar words and entities close to one another in a continuous vector space.
Embeddings can be easily trained by a single command with a publicly available Wikipedia dump as input.
This tool implements the [conventional skip-gram model](https://en.wikipedia.org/wiki/Word2vec) to learn the embeddings of words, and its extension proposed in [Yamada et al. (2016)](https://arxiv.org/abs/1601.01343) to learn the embeddings of entities.
An empirical comparison between Wikipedia2Vec and existing embedding tools (i.e., FastText, Gensim, RDF2Vec, and Wiki2vec) is available [here](https://arxiv.org/abs/1812.06280).
Documentation  are available online at [http://wikipedia2vec.github.io/](http://wikipedia2vec.github.io/).
## Basic Usage
Wikipedia2Vec can be installed via PyPI:
```bash
% pip install wikipedia2vec
```
With this tool, embeddings can be learned by running a *train* command with a Wikipedia dump as input.
For example, the following commands download the latest English Wikipedia dump and learn embeddings from this dump:
```bash
% wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
% wikipedia2vec train enwiki-latest-pages-articles.xml.bz2 MODEL_FILE
```
Then, the learned embeddings are written to *MODEL\_FILE*.
Note that this command can take many optional parameters.
Please refer to [our documentation](https://wikipedia2vec.github.io/wikipedia2vec/commands/) for further details.
## Pretrained Embeddings
Pretrained embeddings for 12 languages (i.e., English, Arabic, Chinese, Dutch, French, German, Italian, Japanese, Polish, Portuguese, Russian, and Spanish) can be downloaded from [this page](https://wikipedia2vec.github.io/wikipedia2vec/pretrained/).
## Use Cases
Wikipedia2Vec has been applied to the following tasks:
* Entity linking: [Yamada et al., 2016](https://arxiv.org/abs/1601.01343), [Eshel et al., 2017](https://arxiv.org/abs/1706.09147), [Chen et al., 2019](https://arxiv.org/abs/1911.03834), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681), [van Hulst et al., 2020](https://arxiv.org/abs/2006.01969).
* Named entity recognition: [Sato et al., 2017](http://www.aclweb.org/anthology/I17-2017), [Lara-Clares and Garcia-Serrano, 2019](http://ceur-ws.org/Vol-2421/eHealth-KD_paper_6.pdf).
* Question answering: [Yamada et al., 2017](https://arxiv.org/abs/1803.08652), [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
* Entity typing: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960).
* Text classification: [Yamada et al., 2018](https://arxiv.org/abs/1806.02960), [Yamada and Shindo, 2019](https://arxiv.org/abs/1909.01259), [Alam et al., 2020](https://link.springer.com/chapter/10.1007/978-3-030-61244-3_9).
* Relation classification: [Poerner et al., 2020](https://arxiv.org/abs/1911.03681).
* Paraphrase detection: [Duong et al., 2018](https://ieeexplore.ieee.org/abstract/document/8606845).
* Knowledge graph completion: [Shah et al., 2019](https://aaai.org/ojs/index.php/AAAI/article/view/4162), [Shah et al., 2020](https://www.aclweb.org/anthology/2020.textgraphs-1.9/).
* Fake news detection: [Singh et al., 2019](https://arxiv.org/abs/1906.11126), [Ghosal et al., 2020](https://arxiv.org/abs/2010.10836).
* Plot analysis of movies: [Papalampidi et al., 2019](https://arxiv.org/abs/1908.10328).
* Novel entity discovery: [Zhang et al., 2020](https://arxiv.org/abs/2002.00206).
* Entity retrieval: [Gerritse et al., 2020](https://link.springer.com/chapter/10.1007%2F978-3-030-45439-5_7).
* Deepfake detection: [Zhong et al., 2020](https://arxiv.org/abs/2010.07475).
* Conversational information seeking: [Rodriguez et al., 2020](https://arxiv.org/abs/2005.00172).
* Query expansion: [Rosin et al., 2020](https://arxiv.org/abs/2012.12065).
## References
If you use Wikipedia2Vec in a scientific publication, please cite the following paper:
Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto, [Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia](https://arxiv.org/abs/1812.06280).
```
@inproceedings{yamada2020wikipedia2vec,
  title = "{W}ikipedia2{V}ec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from {W}ikipedia",
  author={Yamada, Ikuya and Asai, Akari and Sakuma, Jin and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu and Matsumoto, Yuji},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year = {2020},
  publisher = {Association for Computational Linguistics},
  pages = {23--30}
}
```
The embedding model was originally proposed in the following paper:
Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, [Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation](https://arxiv.org/abs/1601.01343).
```
@inproceedings{yamada2016joint,
  title={Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation},
  author={Yamada, Ikuya and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu},
  booktitle={Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning},
  year={2016},
  publisher={Association for Computational Linguistics},
  pages={250--259}
}
```
The text classification model implemented in [this example](https://github.com/wikipedia2vec/wikipedia2vec/tree/master/examples/text_classification) was proposed in the following paper:
Ikuya Yamada, Hiroyuki Shindo, [Neural Attentive Bag-of-Entities Model for Text Classification](https://arxiv.org/abs/1909.01259).
```
@article{yamada2019neural,
  title={Neural Attentive Bag-of-Entities Model for Text Classification},
  author={Yamada, Ikuya and Shindo, Hiroyuki},
  booktitle={Proceedings of The 23th SIGNLL Conference on Computational Natural Language Learning},
  year={2019},
  publisher={Association for Computational Linguistics},
  pages = {563--573}
}
```
## License
[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0)

%prep
%autosetup -n wikipedia2vec-1.0.5

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-wikipedia2vec -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.5-1
- Package Spec generated