summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-m3inference.spec690
-rw-r--r--sources1
3 files changed, 692 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..f0a41f7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/m3inference-1.1.5.tar.gz
diff --git a/python-m3inference.spec b/python-m3inference.spec
new file mode 100644
index 0000000..9f0c450
--- /dev/null
+++ b/python-m3inference.spec
@@ -0,0 +1,690 @@
+%global _empty_manifest_terminate_build 0
+Name: python-m3inference
+Version: 1.1.5
+Release: 1
+Summary: M3 Inference
+License: GNU Affero General Public License v3.0
+URL: https://github.com/euagendas/m3inference
+Source0: https://mirrors.aliyun.com/pypi/web/packages/5a/d4/0a4a1947d84a8f4774b4fa163def170583f63e1088784680ef9cf045a66f/m3inference-1.1.5.tar.gz
+BuildArch: noarch
+
+Requires: python3-torch
+Requires: python3-numpy
+Requires: python3-tqdm
+Requires: python3-Pillow
+Requires: python3-torchvision
+Requires: python3-pycld2
+Requires: python3-requests
+Requires: python3-pandas
+Requires: python3-rauth
+
+%description
+# M3-Inference
+This is a PyTorch implementation of the M3 (Multimodal, Multilingual, and Multi-attribute) system described in the WebConf (WWW) 2019 paper [Demographic Inference and Representative Population Estimates from Multilingual Social Media Data](https://doi.org/10.1145/3308558.3313684).
+
+## Quick Links
+
+- [About](#about)
+- [Install](#install)
+- [FAQs](#faqs)
+- [Citation](#citation)
+- [Contact](#more-questions)
+- [License](#license)
+
+## About
+M3 is a deep learning system for demographic inference that was trained on a massive Twitter dataset. It features three major attributes:
+
+* Multimodal
+ - M3 takes both vision and text inputs. Particularly, the input may contain a profile image, a name (e.g., in the form of a natural language first and last name), a user name (e.g., the Twitter screen_name), and a short self-descriptive text (e.g., a Twitter biography).
+
+* Multilingual
+ - M3 operates in 32 major languages spoken in Europe, but note that these are not all "European" languages (e.g., Arabic is supported). They are `['en', 'cs', 'fr', 'nl', 'ar', 'ro', 'bs', 'da', 'it', 'pt', 'no', 'es', 'hr', 'tr', 'de', 'fi', 'el', 'ru', 'bg', 'hu', 'sk', 'et', 'pl', 'lv', 'sl', 'lt', 'ga', 'eu', 'mt', 'cy', 'rm', 'is', 'un']` in [ISO 639-1 two-letter codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (`un` stands for languages that are not in the list). A [list with the full names of the languages is on the wiki](https://github.com/euagendas/m3inference/wiki/Languages).
+
+* Multi-attribute
+ - Thanks to multi-task learning, the model can predict three demographic attributes (gender, age, and human-vs-organization status) at the same time.
+
+## Install
+### TL;DR
+`pip install m3inference`
+
+* If there is an error with the installation of `torch`, you may install it with `conda` (see [here](https://pytorch.org/)). Alternatively, you could create a conda environment - see instructions below.
+* Please ensure you have Python 3.6.6 or higher installed.
+
+### Manually Install
+
+
+#### With pip
+You must have `Python>=3.6.6` and `pip` ready to use. Then you can:
+1. Install dependency packages: `pip install -r requirements.txt`
+2. Install the package `python setup.py install`
+
+#### As a conda environment
+1. Simply run `conda-env create -f env_conda.yml`, you should then have a "m3env" environment available which you can enter with `conda activate m3env`. Run everything else from within there.
+2. Install the package `python setup.py install`
+
+
+### How to use
+#### With M3
+M3 takes an input of a `jsonl` file that contains `a list of json(dict) objects` (or a python object containing the data itself) and outputs the predictions for the three attributes.
+
+Demo with `test` dir:
+
+1. Clone this package (`git clone https://github.com/zijwang/m3inference.git`) and follow [Manually Install](#manually-install) to install the package.
+
+2. `Preprocess` the image to get them resized to the correct shape. To do this, at the same (root) dir, run
+ ```
+ python scripts/preprocess.py --source_dir test/pic/ --output_dir test/pic_resized/ --jsonl_path test/data.jsonl --jsonl_outpath test/data_resized.jsonl --verbose
+ ```
+
+ You may also run `python scripts/preprocess.py --help` to see detailed usages. Further, see [FAQs](#faqs) for more information on images.
+
+3. In Python, run:
+
+```
+from m3inference import M3Inference
+import pprint
+m3 = M3Inference() # see docstring for details
+pred = m3.infer('./test/data_resized.jsonl') # also see docstring for details
+pprint.pprint(pred)
+```
+
+You should see results like the following:
+
+
+```
+OrderedDict([('720389270335135745',
+ {'age': {'19-29': 0.1546,
+ '30-39': 0.114,
+ '<=18': 0.0481,
+ '>=40': 0.6833},
+ 'gender': {'female': 0.0066, 'male': 0.9934},
+ 'org': {'is-org': 0.7508, 'non-org': 0.2492}}),
+ ('21447363',
+ {'age': {'19-29': 0.0157,
+ '30-39': 0.9837,
+ '<=18': 0.0004,
+ '>=40': 0.0002},
+ 'gender': {'female': 0.9866, 'male': 0.0134},
+ 'org': {'is-org': 0.0002, 'non-org': 0.9998}}),
+ ...
+ ...
+```
+
+
+Each entry of the input file (`./test/data.jsonl`) should have the following keys: `id`, `name`, `screen_name`, `description`, `lang`, `img_path`.
+* The first four keys could be extracted directly from the Twitter JSON entry.
+* For `lang`, even if the official Twitter JSON entry contains this field, we recommend to try to use our [cld2](https://github.com/CLD2Owners/cld2) wrapper method (`from m3inference import get_lang`) to get the language from either user's biography/description or the user's tweets. You could also hard-code the language if you know the ground truth from other sources.
+* Images should be downloaded from Twitter as 400x400 pixel images and resized to 224x224 pixels using the preprocess code above.
+
+
+The output file is a dict in which the `id`s are the keys and the predictions are the nested values. The values represents the probability of that category (`[0, 1]`).
+
+
+For other model settings (e.g., output format, GPU setting, batch_size, etc.), please use the file `test/data.jsonl` as a sample input file and see the docstrings fo `M3Inference` initialization and infer method for detailed utilization.
+
+
+#### With M3 Twitter Wrapper
+
+##### Existing JSON Twitter data
+If you have a Twitter JSON object representing a user but do *not* have images ready, you can use our `M3Twitter` class to:
+* Download and resize the images
+* Add a detected language using CLD2 over the biography text
+* Transform the JSON into the input structure required for M3.
+
+```
+from m3inference import M3Twitter
+import pprint
+
+m3twitter=M3Twitter(cache_dir="twitter_cache") #Change the cache_dir parameter to control where profile images are downloaded
+m3twitter.transform_jsonl(input_file="test/twitter_cache/example_tweets.jsonl",output_file="test/twitter_cache/m3_input.jsonl")
+
+pprint.pprint(m3twitter.infer("test/twitter_cache/m3_input.jsonl")) #Same method as M3Inference.infer(...)
+```
+
+If you already have images locally, please include the ``image_path_key`` parameter and set it to the key in your JSON object containing the path to the image locally. Similarly, if you have detected languages, you can use the ``lang_key`` parameter. An example is given in `test/test_transform_jsonl.py`
+
+##### Nothing but a screen_name or numeric id
+You can also run the Twitter wrapper directly for a Twitter screen_name or numeric id.
+
+* Please download the "scripts" folders from this repository.
+* To run these examples, you need Twitter API credentials. Please create a Twitter app at https://developer.twitter.com/en/apps . Once you have an app, copy `scripts/auth_example.txt` to `auth.txt` and insert the API key, API secret, access token, and access token secret into this file.
+
+Then you can run the following commands:
+
+```
+#If you have a screen_name, use
+$ python m3twitter.py --screen-name=computermacgyve --auth auth.txt --skip-cache
+
+#If you have a numeric id, use
+$python m3twitter.py --id=19854920 --auth auth.txt --skip-cache
+```
+
+The `--skip-cache` option ensures fresh results are retrieved rather than served from the cache. This is great for debugging but not in a real-world setting; so, remove as needed.
+
+## FAQs
+### What if I just have a Twitter screen name or id?
+
+You can use the M3Twitter class to get all the needed profile information (and image) from the Twitter website. Please note this function should only be used for a small number of screen_names or numeric ids. If you have a large list, please use the Twitter API to get the required information (apart from the profile photo, which can be downloaded separately using the `.transform_jsonl(...)` method described above).
+
+```
+import pprint
+from m3inference import M3Twitter
+m3twitter=M3Twitter()
+
+# initialize twitter api
+m3twitter.twitter_init(api_key=...,api_secret=...,access_token=...,access_secret=...)
+# alternatively, you may do
+m3twitter.twitter_init_from_file('auth.txt')
+
+pprint.pprint(m3twitter.infer_id("2631881902"))
+```
+
+The `.infer_screen_name(...)` method does the same for a Twitter screen name. All results are stored/cached in "~/m3/cache/". This directory can be changed in the M3Twitter constructor and you can skip/update the cache for a single request by setting `skip_cache=True` on the `.infer_id(...)` or `.infer_screen_name(...)` method.
+
+You can also run these examples directly from the terminal to try things out:
+```
+python scripts/m3twitter.py --screen-name=barackobama --auth auth.txt
+```
+
+### How should I get the images?
+
+If you have nothing that a screen name or numeric id, you can use the `M3Twitter.infer_screen_name(...)` or `M3Twitter.infer_id(...)` methods. Please note, however, that these methods directly access the Twitter website, not the API and therefore are suitable only to small lists. With a large list of screen_names/ids, please use the Twitter API to get user information.
+
+Once you have Twitter JSON, you can use the `M3Twitter.transform_jsonl(...)` to download images, run language detection, and transform the data to the M3 input format.
+
+### What if I cannot have image data?
+In the package, we do provide the standalone `text-based model`. You could set `use_full_model=False` when initializing `M3Inference` object (i.e., `m3=M3Inference(use_full_model=False)`). You then do not need to provide `img_path` field in the input json file.
+
+*Warning*: as M3 model is optimized with the best performance when both image and text inputs are available. You may experience lower performance when using the `text-based model`. We recommend using image data whenever possible to get the most accurate predictions.
+
+
+
+## Citation
+Please cite our WWW 2019 paper if you use this package in your project.
+
+```
+@inproceedings{wang2019demographic,
+ title={Demographic inference and representative population estimates from multilingual social media data},
+ author={Wang, Zijian and Hale, Scott and Adelani, David Ifeoluwa and Grabowicz, Przemyslaw and Hartman, Timo and Fl{\"o}ck, Fabian and Jurgens, David},
+ booktitle={The World Wide Web Conference},
+ pages={2056--2067},
+ year={2019},
+ organization={ACM}
+}
+```
+
+## More Questions
+
+We use issues on this GitHub for all questions or suggestions. For specific inqueries, please contact us at `m3@euagendas.org`. Please note that we are unable to release or provide training data for this model due to existing terms of service.
+
+## License
+
+This source code is licensed under the GNU Affero General Public License, which allows for non-commercial re-use of this software. For commercial inqueries, please contact us directly. Please see the LICENSE file in the root directory of this source tree for details.
+
+
+
+
+%package -n python3-m3inference
+Summary: M3 Inference
+Provides: python-m3inference
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-m3inference
+# M3-Inference
+This is a PyTorch implementation of the M3 (Multimodal, Multilingual, and Multi-attribute) system described in the WebConf (WWW) 2019 paper [Demographic Inference and Representative Population Estimates from Multilingual Social Media Data](https://doi.org/10.1145/3308558.3313684).
+
+## Quick Links
+
+- [About](#about)
+- [Install](#install)
+- [FAQs](#faqs)
+- [Citation](#citation)
+- [Contact](#more-questions)
+- [License](#license)
+
+## About
+M3 is a deep learning system for demographic inference that was trained on a massive Twitter dataset. It features three major attributes:
+
+* Multimodal
+ - M3 takes both vision and text inputs. Particularly, the input may contain a profile image, a name (e.g., in the form of a natural language first and last name), a user name (e.g., the Twitter screen_name), and a short self-descriptive text (e.g., a Twitter biography).
+
+* Multilingual
+ - M3 operates in 32 major languages spoken in Europe, but note that these are not all "European" languages (e.g., Arabic is supported). They are `['en', 'cs', 'fr', 'nl', 'ar', 'ro', 'bs', 'da', 'it', 'pt', 'no', 'es', 'hr', 'tr', 'de', 'fi', 'el', 'ru', 'bg', 'hu', 'sk', 'et', 'pl', 'lv', 'sl', 'lt', 'ga', 'eu', 'mt', 'cy', 'rm', 'is', 'un']` in [ISO 639-1 two-letter codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (`un` stands for languages that are not in the list). A [list with the full names of the languages is on the wiki](https://github.com/euagendas/m3inference/wiki/Languages).
+
+* Multi-attribute
+ - Thanks to multi-task learning, the model can predict three demographic attributes (gender, age, and human-vs-organization status) at the same time.
+
+## Install
+### TL;DR
+`pip install m3inference`
+
+* If there is an error with the installation of `torch`, you may install it with `conda` (see [here](https://pytorch.org/)). Alternatively, you could create a conda environment - see instructions below.
+* Please ensure you have Python 3.6.6 or higher installed.
+
+### Manually Install
+
+
+#### With pip
+You must have `Python>=3.6.6` and `pip` ready to use. Then you can:
+1. Install dependency packages: `pip install -r requirements.txt`
+2. Install the package `python setup.py install`
+
+#### As a conda environment
+1. Simply run `conda-env create -f env_conda.yml`, you should then have a "m3env" environment available which you can enter with `conda activate m3env`. Run everything else from within there.
+2. Install the package `python setup.py install`
+
+
+### How to use
+#### With M3
+M3 takes an input of a `jsonl` file that contains `a list of json(dict) objects` (or a python object containing the data itself) and outputs the predictions for the three attributes.
+
+Demo with `test` dir:
+
+1. Clone this package (`git clone https://github.com/zijwang/m3inference.git`) and follow [Manually Install](#manually-install) to install the package.
+
+2. `Preprocess` the image to get them resized to the correct shape. To do this, at the same (root) dir, run
+ ```
+ python scripts/preprocess.py --source_dir test/pic/ --output_dir test/pic_resized/ --jsonl_path test/data.jsonl --jsonl_outpath test/data_resized.jsonl --verbose
+ ```
+
+ You may also run `python scripts/preprocess.py --help` to see detailed usages. Further, see [FAQs](#faqs) for more information on images.
+
+3. In Python, run:
+
+```
+from m3inference import M3Inference
+import pprint
+m3 = M3Inference() # see docstring for details
+pred = m3.infer('./test/data_resized.jsonl') # also see docstring for details
+pprint.pprint(pred)
+```
+
+You should see results like the following:
+
+
+```
+OrderedDict([('720389270335135745',
+ {'age': {'19-29': 0.1546,
+ '30-39': 0.114,
+ '<=18': 0.0481,
+ '>=40': 0.6833},
+ 'gender': {'female': 0.0066, 'male': 0.9934},
+ 'org': {'is-org': 0.7508, 'non-org': 0.2492}}),
+ ('21447363',
+ {'age': {'19-29': 0.0157,
+ '30-39': 0.9837,
+ '<=18': 0.0004,
+ '>=40': 0.0002},
+ 'gender': {'female': 0.9866, 'male': 0.0134},
+ 'org': {'is-org': 0.0002, 'non-org': 0.9998}}),
+ ...
+ ...
+```
+
+
+Each entry of the input file (`./test/data.jsonl`) should have the following keys: `id`, `name`, `screen_name`, `description`, `lang`, `img_path`.
+* The first four keys could be extracted directly from the Twitter JSON entry.
+* For `lang`, even if the official Twitter JSON entry contains this field, we recommend to try to use our [cld2](https://github.com/CLD2Owners/cld2) wrapper method (`from m3inference import get_lang`) to get the language from either user's biography/description or the user's tweets. You could also hard-code the language if you know the ground truth from other sources.
+* Images should be downloaded from Twitter as 400x400 pixel images and resized to 224x224 pixels using the preprocess code above.
+
+
+The output file is a dict in which the `id`s are the keys and the predictions are the nested values. The values represents the probability of that category (`[0, 1]`).
+
+
+For other model settings (e.g., output format, GPU setting, batch_size, etc.), please use the file `test/data.jsonl` as a sample input file and see the docstrings fo `M3Inference` initialization and infer method for detailed utilization.
+
+
+#### With M3 Twitter Wrapper
+
+##### Existing JSON Twitter data
+If you have a Twitter JSON object representing a user but do *not* have images ready, you can use our `M3Twitter` class to:
+* Download and resize the images
+* Add a detected language using CLD2 over the biography text
+* Transform the JSON into the input structure required for M3.
+
+```
+from m3inference import M3Twitter
+import pprint
+
+m3twitter=M3Twitter(cache_dir="twitter_cache") #Change the cache_dir parameter to control where profile images are downloaded
+m3twitter.transform_jsonl(input_file="test/twitter_cache/example_tweets.jsonl",output_file="test/twitter_cache/m3_input.jsonl")
+
+pprint.pprint(m3twitter.infer("test/twitter_cache/m3_input.jsonl")) #Same method as M3Inference.infer(...)
+```
+
+If you already have images locally, please include the ``image_path_key`` parameter and set it to the key in your JSON object containing the path to the image locally. Similarly, if you have detected languages, you can use the ``lang_key`` parameter. An example is given in `test/test_transform_jsonl.py`
+
+##### Nothing but a screen_name or numeric id
+You can also run the Twitter wrapper directly for a Twitter screen_name or numeric id.
+
+* Please download the "scripts" folders from this repository.
+* To run these examples, you need Twitter API credentials. Please create a Twitter app at https://developer.twitter.com/en/apps . Once you have an app, copy `scripts/auth_example.txt` to `auth.txt` and insert the API key, API secret, access token, and access token secret into this file.
+
+Then you can run the following commands:
+
+```
+#If you have a screen_name, use
+$ python m3twitter.py --screen-name=computermacgyve --auth auth.txt --skip-cache
+
+#If you have a numeric id, use
+$python m3twitter.py --id=19854920 --auth auth.txt --skip-cache
+```
+
+The `--skip-cache` option ensures fresh results are retrieved rather than served from the cache. This is great for debugging but not in a real-world setting; so, remove as needed.
+
+## FAQs
+### What if I just have a Twitter screen name or id?
+
+You can use the M3Twitter class to get all the needed profile information (and image) from the Twitter website. Please note this function should only be used for a small number of screen_names or numeric ids. If you have a large list, please use the Twitter API to get the required information (apart from the profile photo, which can be downloaded separately using the `.transform_jsonl(...)` method described above).
+
+```
+import pprint
+from m3inference import M3Twitter
+m3twitter=M3Twitter()
+
+# initialize twitter api
+m3twitter.twitter_init(api_key=...,api_secret=...,access_token=...,access_secret=...)
+# alternatively, you may do
+m3twitter.twitter_init_from_file('auth.txt')
+
+pprint.pprint(m3twitter.infer_id("2631881902"))
+```
+
+The `.infer_screen_name(...)` method does the same for a Twitter screen name. All results are stored/cached in "~/m3/cache/". This directory can be changed in the M3Twitter constructor and you can skip/update the cache for a single request by setting `skip_cache=True` on the `.infer_id(...)` or `.infer_screen_name(...)` method.
+
+You can also run these examples directly from the terminal to try things out:
+```
+python scripts/m3twitter.py --screen-name=barackobama --auth auth.txt
+```
+
+### How should I get the images?
+
+If you have nothing that a screen name or numeric id, you can use the `M3Twitter.infer_screen_name(...)` or `M3Twitter.infer_id(...)` methods. Please note, however, that these methods directly access the Twitter website, not the API and therefore are suitable only to small lists. With a large list of screen_names/ids, please use the Twitter API to get user information.
+
+Once you have Twitter JSON, you can use the `M3Twitter.transform_jsonl(...)` to download images, run language detection, and transform the data to the M3 input format.
+
+### What if I cannot have image data?
+In the package, we do provide the standalone `text-based model`. You could set `use_full_model=False` when initializing `M3Inference` object (i.e., `m3=M3Inference(use_full_model=False)`). You then do not need to provide `img_path` field in the input json file.
+
+*Warning*: as M3 model is optimized with the best performance when both image and text inputs are available. You may experience lower performance when using the `text-based model`. We recommend using image data whenever possible to get the most accurate predictions.
+
+
+
+## Citation
+Please cite our WWW 2019 paper if you use this package in your project.
+
+```
+@inproceedings{wang2019demographic,
+ title={Demographic inference and representative population estimates from multilingual social media data},
+ author={Wang, Zijian and Hale, Scott and Adelani, David Ifeoluwa and Grabowicz, Przemyslaw and Hartman, Timo and Fl{\"o}ck, Fabian and Jurgens, David},
+ booktitle={The World Wide Web Conference},
+ pages={2056--2067},
+ year={2019},
+ organization={ACM}
+}
+```
+
+## More Questions
+
+We use issues on this GitHub for all questions or suggestions. For specific inqueries, please contact us at `m3@euagendas.org`. Please note that we are unable to release or provide training data for this model due to existing terms of service.
+
+## License
+
+This source code is licensed under the GNU Affero General Public License, which allows for non-commercial re-use of this software. For commercial inqueries, please contact us directly. Please see the LICENSE file in the root directory of this source tree for details.
+
+
+
+
+%package help
+Summary: Development documents and examples for m3inference
+Provides: python3-m3inference-doc
+%description help
+# M3-Inference
+This is a PyTorch implementation of the M3 (Multimodal, Multilingual, and Multi-attribute) system described in the WebConf (WWW) 2019 paper [Demographic Inference and Representative Population Estimates from Multilingual Social Media Data](https://doi.org/10.1145/3308558.3313684).
+
+## Quick Links
+
+- [About](#about)
+- [Install](#install)
+- [FAQs](#faqs)
+- [Citation](#citation)
+- [Contact](#more-questions)
+- [License](#license)
+
+## About
+M3 is a deep learning system for demographic inference that was trained on a massive Twitter dataset. It features three major attributes:
+
+* Multimodal
+ - M3 takes both vision and text inputs. Particularly, the input may contain a profile image, a name (e.g., in the form of a natural language first and last name), a user name (e.g., the Twitter screen_name), and a short self-descriptive text (e.g., a Twitter biography).
+
+* Multilingual
+ - M3 operates in 32 major languages spoken in Europe, but note that these are not all "European" languages (e.g., Arabic is supported). They are `['en', 'cs', 'fr', 'nl', 'ar', 'ro', 'bs', 'da', 'it', 'pt', 'no', 'es', 'hr', 'tr', 'de', 'fi', 'el', 'ru', 'bg', 'hu', 'sk', 'et', 'pl', 'lv', 'sl', 'lt', 'ga', 'eu', 'mt', 'cy', 'rm', 'is', 'un']` in [ISO 639-1 two-letter codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (`un` stands for languages that are not in the list). A [list with the full names of the languages is on the wiki](https://github.com/euagendas/m3inference/wiki/Languages).
+
+* Multi-attribute
+ - Thanks to multi-task learning, the model can predict three demographic attributes (gender, age, and human-vs-organization status) at the same time.
+
+## Install
+### TL;DR
+`pip install m3inference`
+
+* If there is an error with the installation of `torch`, you may install it with `conda` (see [here](https://pytorch.org/)). Alternatively, you could create a conda environment - see instructions below.
+* Please ensure you have Python 3.6.6 or higher installed.
+
+### Manually Install
+
+
+#### With pip
+You must have `Python>=3.6.6` and `pip` ready to use. Then you can:
+1. Install dependency packages: `pip install -r requirements.txt`
+2. Install the package `python setup.py install`
+
+#### As a conda environment
+1. Simply run `conda-env create -f env_conda.yml`, you should then have a "m3env" environment available which you can enter with `conda activate m3env`. Run everything else from within there.
+2. Install the package `python setup.py install`
+
+
+### How to use
+#### With M3
+M3 takes an input of a `jsonl` file that contains `a list of json(dict) objects` (or a python object containing the data itself) and outputs the predictions for the three attributes.
+
+Demo with `test` dir:
+
+1. Clone this package (`git clone https://github.com/zijwang/m3inference.git`) and follow [Manually Install](#manually-install) to install the package.
+
+2. `Preprocess` the image to get them resized to the correct shape. To do this, at the same (root) dir, run
+ ```
+ python scripts/preprocess.py --source_dir test/pic/ --output_dir test/pic_resized/ --jsonl_path test/data.jsonl --jsonl_outpath test/data_resized.jsonl --verbose
+ ```
+
+ You may also run `python scripts/preprocess.py --help` to see detailed usages. Further, see [FAQs](#faqs) for more information on images.
+
+3. In Python, run:
+
+```
+from m3inference import M3Inference
+import pprint
+m3 = M3Inference() # see docstring for details
+pred = m3.infer('./test/data_resized.jsonl') # also see docstring for details
+pprint.pprint(pred)
+```
+
+You should see results like the following:
+
+
+```
+OrderedDict([('720389270335135745',
+ {'age': {'19-29': 0.1546,
+ '30-39': 0.114,
+ '<=18': 0.0481,
+ '>=40': 0.6833},
+ 'gender': {'female': 0.0066, 'male': 0.9934},
+ 'org': {'is-org': 0.7508, 'non-org': 0.2492}}),
+ ('21447363',
+ {'age': {'19-29': 0.0157,
+ '30-39': 0.9837,
+ '<=18': 0.0004,
+ '>=40': 0.0002},
+ 'gender': {'female': 0.9866, 'male': 0.0134},
+ 'org': {'is-org': 0.0002, 'non-org': 0.9998}}),
+ ...
+ ...
+```
+
+
+Each entry of the input file (`./test/data.jsonl`) should have the following keys: `id`, `name`, `screen_name`, `description`, `lang`, `img_path`.
+* The first four keys could be extracted directly from the Twitter JSON entry.
+* For `lang`, even if the official Twitter JSON entry contains this field, we recommend to try to use our [cld2](https://github.com/CLD2Owners/cld2) wrapper method (`from m3inference import get_lang`) to get the language from either user's biography/description or the user's tweets. You could also hard-code the language if you know the ground truth from other sources.
+* Images should be downloaded from Twitter as 400x400 pixel images and resized to 224x224 pixels using the preprocess code above.
+
+
+The output file is a dict in which the `id`s are the keys and the predictions are the nested values. The values represents the probability of that category (`[0, 1]`).
+
+
+For other model settings (e.g., output format, GPU setting, batch_size, etc.), please use the file `test/data.jsonl` as a sample input file and see the docstrings fo `M3Inference` initialization and infer method for detailed utilization.
+
+
+#### With M3 Twitter Wrapper
+
+##### Existing JSON Twitter data
+If you have a Twitter JSON object representing a user but do *not* have images ready, you can use our `M3Twitter` class to:
+* Download and resize the images
+* Add a detected language using CLD2 over the biography text
+* Transform the JSON into the input structure required for M3.
+
+```
+from m3inference import M3Twitter
+import pprint
+
+m3twitter=M3Twitter(cache_dir="twitter_cache") #Change the cache_dir parameter to control where profile images are downloaded
+m3twitter.transform_jsonl(input_file="test/twitter_cache/example_tweets.jsonl",output_file="test/twitter_cache/m3_input.jsonl")
+
+pprint.pprint(m3twitter.infer("test/twitter_cache/m3_input.jsonl")) #Same method as M3Inference.infer(...)
+```
+
+If you already have images locally, please include the ``image_path_key`` parameter and set it to the key in your JSON object containing the path to the image locally. Similarly, if you have detected languages, you can use the ``lang_key`` parameter. An example is given in `test/test_transform_jsonl.py`
+
+##### Nothing but a screen_name or numeric id
+You can also run the Twitter wrapper directly for a Twitter screen_name or numeric id.
+
+* Please download the "scripts" folders from this repository.
+* To run these examples, you need Twitter API credentials. Please create a Twitter app at https://developer.twitter.com/en/apps . Once you have an app, copy `scripts/auth_example.txt` to `auth.txt` and insert the API key, API secret, access token, and access token secret into this file.
+
+Then you can run the following commands:
+
+```
+#If you have a screen_name, use
+$ python m3twitter.py --screen-name=computermacgyve --auth auth.txt --skip-cache
+
+#If you have a numeric id, use
+$python m3twitter.py --id=19854920 --auth auth.txt --skip-cache
+```
+
+The `--skip-cache` option ensures fresh results are retrieved rather than served from the cache. This is great for debugging but not in a real-world setting; so, remove as needed.
+
+## FAQs
+### What if I just have a Twitter screen name or id?
+
+You can use the M3Twitter class to get all the needed profile information (and image) from the Twitter website. Please note this function should only be used for a small number of screen_names or numeric ids. If you have a large list, please use the Twitter API to get the required information (apart from the profile photo, which can be downloaded separately using the `.transform_jsonl(...)` method described above).
+
+```
+import pprint
+from m3inference import M3Twitter
+m3twitter=M3Twitter()
+
+# initialize twitter api
+m3twitter.twitter_init(api_key=...,api_secret=...,access_token=...,access_secret=...)
+# alternatively, you may do
+m3twitter.twitter_init_from_file('auth.txt')
+
+pprint.pprint(m3twitter.infer_id("2631881902"))
+```
+
+The `.infer_screen_name(...)` method does the same for a Twitter screen name. All results are stored/cached in "~/m3/cache/". This directory can be changed in the M3Twitter constructor and you can skip/update the cache for a single request by setting `skip_cache=True` on the `.infer_id(...)` or `.infer_screen_name(...)` method.
+
+You can also run these examples directly from the terminal to try things out:
+```
+python scripts/m3twitter.py --screen-name=barackobama --auth auth.txt
+```
+
+### How should I get the images?
+
+If you have nothing that a screen name or numeric id, you can use the `M3Twitter.infer_screen_name(...)` or `M3Twitter.infer_id(...)` methods. Please note, however, that these methods directly access the Twitter website, not the API and therefore are suitable only to small lists. With a large list of screen_names/ids, please use the Twitter API to get user information.
+
+Once you have Twitter JSON, you can use the `M3Twitter.transform_jsonl(...)` to download images, run language detection, and transform the data to the M3 input format.
+
+### What if I cannot have image data?
+In the package, we do provide the standalone `text-based model`. You could set `use_full_model=False` when initializing `M3Inference` object (i.e., `m3=M3Inference(use_full_model=False)`). You then do not need to provide `img_path` field in the input json file.
+
+*Warning*: as M3 model is optimized with the best performance when both image and text inputs are available. You may experience lower performance when using the `text-based model`. We recommend using image data whenever possible to get the most accurate predictions.
+
+
+
+## Citation
+Please cite our WWW 2019 paper if you use this package in your project.
+
+```
+@inproceedings{wang2019demographic,
+ title={Demographic inference and representative population estimates from multilingual social media data},
+ author={Wang, Zijian and Hale, Scott and Adelani, David Ifeoluwa and Grabowicz, Przemyslaw and Hartman, Timo and Fl{\"o}ck, Fabian and Jurgens, David},
+ booktitle={The World Wide Web Conference},
+ pages={2056--2067},
+ year={2019},
+ organization={ACM}
+}
+```
+
+## More Questions
+
+We use issues on this GitHub for all questions or suggestions. For specific inqueries, please contact us at `m3@euagendas.org`. Please note that we are unable to release or provide training data for this model due to existing terms of service.
+
+## License
+
+This source code is licensed under the GNU Affero General Public License, which allows for non-commercial re-use of this software. For commercial inqueries, please contact us directly. Please see the LICENSE file in the root directory of this source tree for details.
+
+
+
+
+%prep
+%autosetup -n m3inference-1.1.5
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-m3inference -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 1.1.5-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..24e86ba
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+3524bf7b2ec2ed2ccd18327760fff7b9 m3inference-1.1.5.tar.gz