diff options
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-lbsntransform.spec | 569 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 571 insertions, 0 deletions
@@ -0,0 +1 @@ +/lbsntransform-0.25.1.tar.gz diff --git a/python-lbsntransform.spec b/python-lbsntransform.spec new file mode 100644 index 0000000..5f97711 --- /dev/null +++ b/python-lbsntransform.spec @@ -0,0 +1,569 @@ +%global _empty_manifest_terminate_build 0 +Name: python-lbsntransform +Version: 0.25.1 +Release: 1 +Summary: Location based social network (LBSN) data structure format & transfer tool +License: GNU GPLv3 or any higher +URL: https://gitlab.vgiscience.de/lbsn/lbsntransform +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/18/61/fe7960bc1a74d44fd818f69d6373c755ec926cbde4bcb4549b7fd845d477/lbsntransform-0.25.1.tar.gz +BuildArch: noarch + +Requires: python3-lbsnstructure +Requires: python3-protobuf +Requires: python3-psycopg2-binary +Requires: python3-ppygis3 +Requires: python3-shapely +Requires: python3-emoji +Requires: python3-requests +Requires: python3-geos +Requires: python3-numpy +Requires: python3-regex +Requires: python3-nltk + +%description +[](https://pypi.org/project/lbsntransform/) [](https://gitlab.vgiscience.de/lbsn/lbsntransform) [](https://gitlab.vgiscience.de/lbsn/lbsntransform/-/commits/master) [](https://lbsn.vgiscience.org/lbsntransform/docs/) + +# LBSNTransform + +A python package that uses the [common location based social network (LBSN) data structure][lbsnstructure] +(ProtoBuf) to import, transform and export Social Media data such as Twitter and Flickr. + + + +## Motivation + +The goal is to provide a common interface to handle Social Media Data, +without the need to individually adapt to the myriad API endpoints available. +As an example, consider the ProtoBuf spec [lbsn.Post][lbsnpost], which can be a Tweet on Twitter, +a Photo shared on Flickr, or a post on Reddit. However, all of these objects share +a common set of attributes, which is reflected in the lbsnstructure. + +The tool is based on a 4-Facet conceptual framework for LBSN, introduced in a paper +by [Dunkel et al. (2018)](https://www.tandfonline.com/doi/full/10.1080/13658816.2018.1546390). + +The GDPR directly requests Social Media Network operators to allow +users to transfer accounts and data in-between services. +While there are attempts by Google, Facebook etc. (e.g. see the [data-transfer-project][data-transfer-project]), +this is not currently possible. With the lbsnstructure, a primary motivation is to systematically +characterize LBSN data aspects in a common, cross-network data scheme that enables privacy-by-design +for connected software, data handling and database design. + +## Description + +This tool enables data import from a Postgres database, JSON, or CSV and export to CSV, [LBSN ProtoBuf][lbsnstructure] +or the [hll][hlldb] and [raw][rawdb] versions of the LBSN prepared Postgres Databases. +The tool will map Social Media endpoints (e.g. Twitter tweets) to a common [LBSN Interchange Structure][lbsnstructure] +format in ProtoBuf. LBSNTransform can be used using the command line (CLI) or imported to other Python projects with +`import lbsntransform`, for on-the-fly conversion. + +## Quick Start + +The recommended way to install lbsntransform, for both Linux and Windows, +is through the conda package manager. + +1. Create a conda env using `environment.yml` + +First, create an environment with the dependencies for lbsntransform using +the [environment.yml][environment.yml] that is provided in the root of the repository. + +```bash +git clone https://github.com/Sieboldianus/lbsntransform.git +cd lbsntransform +# not necessary, but recommended: +conda config --env --set channel_priority strict +conda env create -f environment.yml +``` + +2. Install lbsntransform without dependencies + +Afterwards, install lbsntransform using pip, without dependencies. + +```bash +conda activate lbsntransform +pip install lbsntransform --no-deps --upgrade +# or locally, from the latest commits on master +# pip install . --no-deps --upgrade +``` + +3. Import data using a mapping + +For each data source, a mapping must be provided that +defines how data is mapped to the [lbsnstructure][lbsnstructure]. + +The default mapping is [lbsnraw][lbsnraw]. + +Additional mappings can be dynamically loaded from a folder. + +We have provided two [example mappings][mappings] for the [Flickr YFCC100M dataset][yfcc100m] (CSV) +and Twitter (json). + +For example, to import the first 1000 records from json data from Twitter to the +[lbsn raw database][rawdb], clone [field_mapping_twitter.py][field_mapping_twitter] +to a local folder `./resources/mappings/`, startup the Docker [rawdb][rawdb] container, +and use: + +```shell +lbsntransform --origin 3 \ + --mappings_path ./resources/mappings/ \ + --file_input \ + --file_type "json" \ + --mappings_path ./resources/mappings/ \ + --dbpassword_output "sample-key" \ + --dbuser_output "postgres" \ + --dbserveraddress_output "127.0.0.1:5432" \ + --dbname_output "rawdb" \ + --dbformat_output "lbsn" \ + --transferlimit 1000 +``` + +.. with the above input args, the the tool will: +- read local json from `./01_Input/` +- and store lbsn records to the [lbsn rawdb][rawdb]. + +Vice versa, to import data directly to the privacy-aware +version of lbsnstructure, called [hlldb][hlldb], startup the +Docker container, and use: + +```shell +lbsntransform --origin 3 \ + --mappings_path ./resources/mappings/ \ + --file_input \ + --file_type "json" \ + --mappings_path ./resources/mappings/ \ + --dbpassword_output "sample-key" \ + --dbuser_output "postgres" \ + --dbserveraddress_output "127.0.0.1:25432" \ + --dbname_output "hlldb" \ + --dbformat_output "hll" \ + --dbpassword_hllworker "sample-key" \ + --dbuser_hllworker "postgres" \ + --dbserveraddress_hllworker "127.0.0.1:25432" \ + --dbname_hllworker "hlldb" \ + --include_lbsn_objects "origin,post" \ + --include_lbsn_bases hashtag,place,date,community \ + --transferlimit 1000 +``` + +.. with the above input args, the the tool will: +- read local json from `./01_Input/` +- and store lbsn records to the privacy-aware [lbsn hlldb][hlldb] +- by converting only lbsn objects of type [origin][lbsnorigin] and [post][lbsnpost] +- and updating the HyperLogLog (HLL) target tables `hashtag`, `place`, `date` and `community` + +A full list of possible input and output args is available in the +[documentation](https://lbsn.vgiscience.org/lbsntransform/docs/). + +## Built With + +* [lbsnstructure](https://pypi.org/project/lbsnstructure/) - A common language independend and cross-network social-media datascheme +* [protobuf](https://github.com/google/protobuf) - Google's data interchange format +* [psycopg2](https://github.com/psycopg/psycopg2) - Python-PostgreSQL Database Adapter +* [ppygis3](https://github.com/AlexImmer/ppygis3) - A PPyGIS port for Python +* [shapely](https://github.com/Toblerity/Shapely) - Geometric objects processing in Python +* [emoji](https://github.com/carpedm20/emoji/) - Emoji handling in Python + +## Authors + +* **Alexander Dunkel** - Initial work + +See also the list of [contributors](/../graphs/master). + +## License + +This project is licensed under the GNU GPLv3 or any higher - +see the [LICENSE.md](LICENSE.md) file for details. + +[lbsnstructure]: https://lbsn.vgiscience.org/structure/ +[lbsnpost]: https://lbsn.vgiscience.org/structure/#post +[lbsnorigin]: https://lbsn.vgiscience.org/structure/#origin +[data-transfer-project]: https://datatransferproject.dev/ +[rawdb]: https://gitlab.vgiscience.de/lbsn/databases/rawdb +[hlldb]: https://gitlab.vgiscience.de/lbsn/databases/hlldb +[lbsnraw]: lbsntransform/input/mappings/field_mapping_lbsn.py +[mappings]: resources/mappings +[field_mapping_twitter]: resources/mappings/field_mapping_twitter.py +[yfcc100m]: http://projects.dfki.uni-kl.de/yfcc100m/ + + +%package -n python3-lbsntransform +Summary: Location based social network (LBSN) data structure format & transfer tool +Provides: python-lbsntransform +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-lbsntransform +[](https://pypi.org/project/lbsntransform/) [](https://gitlab.vgiscience.de/lbsn/lbsntransform) [](https://gitlab.vgiscience.de/lbsn/lbsntransform/-/commits/master) [](https://lbsn.vgiscience.org/lbsntransform/docs/) + +# LBSNTransform + +A python package that uses the [common location based social network (LBSN) data structure][lbsnstructure] +(ProtoBuf) to import, transform and export Social Media data such as Twitter and Flickr. + + + +## Motivation + +The goal is to provide a common interface to handle Social Media Data, +without the need to individually adapt to the myriad API endpoints available. +As an example, consider the ProtoBuf spec [lbsn.Post][lbsnpost], which can be a Tweet on Twitter, +a Photo shared on Flickr, or a post on Reddit. However, all of these objects share +a common set of attributes, which is reflected in the lbsnstructure. + +The tool is based on a 4-Facet conceptual framework for LBSN, introduced in a paper +by [Dunkel et al. (2018)](https://www.tandfonline.com/doi/full/10.1080/13658816.2018.1546390). + +The GDPR directly requests Social Media Network operators to allow +users to transfer accounts and data in-between services. +While there are attempts by Google, Facebook etc. (e.g. see the [data-transfer-project][data-transfer-project]), +this is not currently possible. With the lbsnstructure, a primary motivation is to systematically +characterize LBSN data aspects in a common, cross-network data scheme that enables privacy-by-design +for connected software, data handling and database design. + +## Description + +This tool enables data import from a Postgres database, JSON, or CSV and export to CSV, [LBSN ProtoBuf][lbsnstructure] +or the [hll][hlldb] and [raw][rawdb] versions of the LBSN prepared Postgres Databases. +The tool will map Social Media endpoints (e.g. Twitter tweets) to a common [LBSN Interchange Structure][lbsnstructure] +format in ProtoBuf. LBSNTransform can be used using the command line (CLI) or imported to other Python projects with +`import lbsntransform`, for on-the-fly conversion. + +## Quick Start + +The recommended way to install lbsntransform, for both Linux and Windows, +is through the conda package manager. + +1. Create a conda env using `environment.yml` + +First, create an environment with the dependencies for lbsntransform using +the [environment.yml][environment.yml] that is provided in the root of the repository. + +```bash +git clone https://github.com/Sieboldianus/lbsntransform.git +cd lbsntransform +# not necessary, but recommended: +conda config --env --set channel_priority strict +conda env create -f environment.yml +``` + +2. Install lbsntransform without dependencies + +Afterwards, install lbsntransform using pip, without dependencies. + +```bash +conda activate lbsntransform +pip install lbsntransform --no-deps --upgrade +# or locally, from the latest commits on master +# pip install . --no-deps --upgrade +``` + +3. Import data using a mapping + +For each data source, a mapping must be provided that +defines how data is mapped to the [lbsnstructure][lbsnstructure]. + +The default mapping is [lbsnraw][lbsnraw]. + +Additional mappings can be dynamically loaded from a folder. + +We have provided two [example mappings][mappings] for the [Flickr YFCC100M dataset][yfcc100m] (CSV) +and Twitter (json). + +For example, to import the first 1000 records from json data from Twitter to the +[lbsn raw database][rawdb], clone [field_mapping_twitter.py][field_mapping_twitter] +to a local folder `./resources/mappings/`, startup the Docker [rawdb][rawdb] container, +and use: + +```shell +lbsntransform --origin 3 \ + --mappings_path ./resources/mappings/ \ + --file_input \ + --file_type "json" \ + --mappings_path ./resources/mappings/ \ + --dbpassword_output "sample-key" \ + --dbuser_output "postgres" \ + --dbserveraddress_output "127.0.0.1:5432" \ + --dbname_output "rawdb" \ + --dbformat_output "lbsn" \ + --transferlimit 1000 +``` + +.. with the above input args, the the tool will: +- read local json from `./01_Input/` +- and store lbsn records to the [lbsn rawdb][rawdb]. + +Vice versa, to import data directly to the privacy-aware +version of lbsnstructure, called [hlldb][hlldb], startup the +Docker container, and use: + +```shell +lbsntransform --origin 3 \ + --mappings_path ./resources/mappings/ \ + --file_input \ + --file_type "json" \ + --mappings_path ./resources/mappings/ \ + --dbpassword_output "sample-key" \ + --dbuser_output "postgres" \ + --dbserveraddress_output "127.0.0.1:25432" \ + --dbname_output "hlldb" \ + --dbformat_output "hll" \ + --dbpassword_hllworker "sample-key" \ + --dbuser_hllworker "postgres" \ + --dbserveraddress_hllworker "127.0.0.1:25432" \ + --dbname_hllworker "hlldb" \ + --include_lbsn_objects "origin,post" \ + --include_lbsn_bases hashtag,place,date,community \ + --transferlimit 1000 +``` + +.. with the above input args, the the tool will: +- read local json from `./01_Input/` +- and store lbsn records to the privacy-aware [lbsn hlldb][hlldb] +- by converting only lbsn objects of type [origin][lbsnorigin] and [post][lbsnpost] +- and updating the HyperLogLog (HLL) target tables `hashtag`, `place`, `date` and `community` + +A full list of possible input and output args is available in the +[documentation](https://lbsn.vgiscience.org/lbsntransform/docs/). + +## Built With + +* [lbsnstructure](https://pypi.org/project/lbsnstructure/) - A common language independend and cross-network social-media datascheme +* [protobuf](https://github.com/google/protobuf) - Google's data interchange format +* [psycopg2](https://github.com/psycopg/psycopg2) - Python-PostgreSQL Database Adapter +* [ppygis3](https://github.com/AlexImmer/ppygis3) - A PPyGIS port for Python +* [shapely](https://github.com/Toblerity/Shapely) - Geometric objects processing in Python +* [emoji](https://github.com/carpedm20/emoji/) - Emoji handling in Python + +## Authors + +* **Alexander Dunkel** - Initial work + +See also the list of [contributors](/../graphs/master). + +## License + +This project is licensed under the GNU GPLv3 or any higher - +see the [LICENSE.md](LICENSE.md) file for details. + +[lbsnstructure]: https://lbsn.vgiscience.org/structure/ +[lbsnpost]: https://lbsn.vgiscience.org/structure/#post +[lbsnorigin]: https://lbsn.vgiscience.org/structure/#origin +[data-transfer-project]: https://datatransferproject.dev/ +[rawdb]: https://gitlab.vgiscience.de/lbsn/databases/rawdb +[hlldb]: https://gitlab.vgiscience.de/lbsn/databases/hlldb +[lbsnraw]: lbsntransform/input/mappings/field_mapping_lbsn.py +[mappings]: resources/mappings +[field_mapping_twitter]: resources/mappings/field_mapping_twitter.py +[yfcc100m]: http://projects.dfki.uni-kl.de/yfcc100m/ + + +%package help +Summary: Development documents and examples for lbsntransform +Provides: python3-lbsntransform-doc +%description help +[](https://pypi.org/project/lbsntransform/) [](https://gitlab.vgiscience.de/lbsn/lbsntransform) [](https://gitlab.vgiscience.de/lbsn/lbsntransform/-/commits/master) [](https://lbsn.vgiscience.org/lbsntransform/docs/) + +# LBSNTransform + +A python package that uses the [common location based social network (LBSN) data structure][lbsnstructure] +(ProtoBuf) to import, transform and export Social Media data such as Twitter and Flickr. + + + +## Motivation + +The goal is to provide a common interface to handle Social Media Data, +without the need to individually adapt to the myriad API endpoints available. +As an example, consider the ProtoBuf spec [lbsn.Post][lbsnpost], which can be a Tweet on Twitter, +a Photo shared on Flickr, or a post on Reddit. However, all of these objects share +a common set of attributes, which is reflected in the lbsnstructure. + +The tool is based on a 4-Facet conceptual framework for LBSN, introduced in a paper +by [Dunkel et al. (2018)](https://www.tandfonline.com/doi/full/10.1080/13658816.2018.1546390). + +The GDPR directly requests Social Media Network operators to allow +users to transfer accounts and data in-between services. +While there are attempts by Google, Facebook etc. (e.g. see the [data-transfer-project][data-transfer-project]), +this is not currently possible. With the lbsnstructure, a primary motivation is to systematically +characterize LBSN data aspects in a common, cross-network data scheme that enables privacy-by-design +for connected software, data handling and database design. + +## Description + +This tool enables data import from a Postgres database, JSON, or CSV and export to CSV, [LBSN ProtoBuf][lbsnstructure] +or the [hll][hlldb] and [raw][rawdb] versions of the LBSN prepared Postgres Databases. +The tool will map Social Media endpoints (e.g. Twitter tweets) to a common [LBSN Interchange Structure][lbsnstructure] +format in ProtoBuf. LBSNTransform can be used using the command line (CLI) or imported to other Python projects with +`import lbsntransform`, for on-the-fly conversion. + +## Quick Start + +The recommended way to install lbsntransform, for both Linux and Windows, +is through the conda package manager. + +1. Create a conda env using `environment.yml` + +First, create an environment with the dependencies for lbsntransform using +the [environment.yml][environment.yml] that is provided in the root of the repository. + +```bash +git clone https://github.com/Sieboldianus/lbsntransform.git +cd lbsntransform +# not necessary, but recommended: +conda config --env --set channel_priority strict +conda env create -f environment.yml +``` + +2. Install lbsntransform without dependencies + +Afterwards, install lbsntransform using pip, without dependencies. + +```bash +conda activate lbsntransform +pip install lbsntransform --no-deps --upgrade +# or locally, from the latest commits on master +# pip install . --no-deps --upgrade +``` + +3. Import data using a mapping + +For each data source, a mapping must be provided that +defines how data is mapped to the [lbsnstructure][lbsnstructure]. + +The default mapping is [lbsnraw][lbsnraw]. + +Additional mappings can be dynamically loaded from a folder. + +We have provided two [example mappings][mappings] for the [Flickr YFCC100M dataset][yfcc100m] (CSV) +and Twitter (json). + +For example, to import the first 1000 records from json data from Twitter to the +[lbsn raw database][rawdb], clone [field_mapping_twitter.py][field_mapping_twitter] +to a local folder `./resources/mappings/`, startup the Docker [rawdb][rawdb] container, +and use: + +```shell +lbsntransform --origin 3 \ + --mappings_path ./resources/mappings/ \ + --file_input \ + --file_type "json" \ + --mappings_path ./resources/mappings/ \ + --dbpassword_output "sample-key" \ + --dbuser_output "postgres" \ + --dbserveraddress_output "127.0.0.1:5432" \ + --dbname_output "rawdb" \ + --dbformat_output "lbsn" \ + --transferlimit 1000 +``` + +.. with the above input args, the the tool will: +- read local json from `./01_Input/` +- and store lbsn records to the [lbsn rawdb][rawdb]. + +Vice versa, to import data directly to the privacy-aware +version of lbsnstructure, called [hlldb][hlldb], startup the +Docker container, and use: + +```shell +lbsntransform --origin 3 \ + --mappings_path ./resources/mappings/ \ + --file_input \ + --file_type "json" \ + --mappings_path ./resources/mappings/ \ + --dbpassword_output "sample-key" \ + --dbuser_output "postgres" \ + --dbserveraddress_output "127.0.0.1:25432" \ + --dbname_output "hlldb" \ + --dbformat_output "hll" \ + --dbpassword_hllworker "sample-key" \ + --dbuser_hllworker "postgres" \ + --dbserveraddress_hllworker "127.0.0.1:25432" \ + --dbname_hllworker "hlldb" \ + --include_lbsn_objects "origin,post" \ + --include_lbsn_bases hashtag,place,date,community \ + --transferlimit 1000 +``` + +.. with the above input args, the the tool will: +- read local json from `./01_Input/` +- and store lbsn records to the privacy-aware [lbsn hlldb][hlldb] +- by converting only lbsn objects of type [origin][lbsnorigin] and [post][lbsnpost] +- and updating the HyperLogLog (HLL) target tables `hashtag`, `place`, `date` and `community` + +A full list of possible input and output args is available in the +[documentation](https://lbsn.vgiscience.org/lbsntransform/docs/). + +## Built With + +* [lbsnstructure](https://pypi.org/project/lbsnstructure/) - A common language independend and cross-network social-media datascheme +* [protobuf](https://github.com/google/protobuf) - Google's data interchange format +* [psycopg2](https://github.com/psycopg/psycopg2) - Python-PostgreSQL Database Adapter +* [ppygis3](https://github.com/AlexImmer/ppygis3) - A PPyGIS port for Python +* [shapely](https://github.com/Toblerity/Shapely) - Geometric objects processing in Python +* [emoji](https://github.com/carpedm20/emoji/) - Emoji handling in Python + +## Authors + +* **Alexander Dunkel** - Initial work + +See also the list of [contributors](/../graphs/master). + +## License + +This project is licensed under the GNU GPLv3 or any higher - +see the [LICENSE.md](LICENSE.md) file for details. + +[lbsnstructure]: https://lbsn.vgiscience.org/structure/ +[lbsnpost]: https://lbsn.vgiscience.org/structure/#post +[lbsnorigin]: https://lbsn.vgiscience.org/structure/#origin +[data-transfer-project]: https://datatransferproject.dev/ +[rawdb]: https://gitlab.vgiscience.de/lbsn/databases/rawdb +[hlldb]: https://gitlab.vgiscience.de/lbsn/databases/hlldb +[lbsnraw]: lbsntransform/input/mappings/field_mapping_lbsn.py +[mappings]: resources/mappings +[field_mapping_twitter]: resources/mappings/field_mapping_twitter.py +[yfcc100m]: http://projects.dfki.uni-kl.de/yfcc100m/ + + +%prep +%autosetup -n lbsntransform-0.25.1 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-lbsntransform -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 0.25.1-1 +- Package Spec generated @@ -0,0 +1 @@ +2c6ecd7cc1425756909e426c035f45ed lbsntransform-0.25.1.tar.gz |
