diff options
Diffstat (limited to 'python-country-converter.spec')
| -rw-r--r-- | python-country-converter.spec | 1723 |
1 files changed, 1723 insertions, 0 deletions
diff --git a/python-country-converter.spec b/python-country-converter.spec new file mode 100644 index 0000000..439d393 --- /dev/null +++ b/python-country-converter.spec @@ -0,0 +1,1723 @@ +%global _empty_manifest_terminate_build 0 +Name: python-country-converter +Version: 1.0.0 +Release: 1 +Summary: The country converter (coco) - a Python package for converting country names between different classifications schemes +License: GNU General Public License v3 (GPLv3) +URL: https://pypi.org/project/country-converter/ +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/1e/19/a34c543fea205366135e968c89ef24a4a52ca4adab0ea957017a53ec27ba/country_converter-1.0.0.tar.gz +BuildArch: noarch + +Requires: python3-pandas +Requires: python3-country-converter[lint,test] +Requires: python3-black +Requires: python3-isort +Requires: python3-coveralls +Requires: python3-pytest +Requires: python3-pytest-black +Requires: python3-pytest-cov +Requires: python3-pytest-datadir +Requires: python3-pytest-mypy + +%description +# country converter + +The country converter (coco) is a Python package to convert and match +country names between different classifications and between different +naming versions. Internally it uses regular expressions to match country +names. Coco can also be used to build aggregation concordance matrices +between different classification schemes. + +[](https://pypi.python.org/pypi/country_converter/) +[](https://anaconda.org/conda-forge/country_converter) +[](https://doi.org/10.5281/zenodo.838035) +[](http://joss.theoj.org/papers/af694f2e5994b8aacbad119c4005e113) + +[](https://github.com/IndEcol/country_converter/actions) +[](https://coveralls.io/github/IndEcol/country_converter?branch=master) +[](https://github.com/psf/black) +[](https://www.gnu.org/licenses/gpl-3.0) + + +## Motivation + +To date, there is no single standard of how to name or specify +individual countries in a (meta) data description. While some data +sources follow ISO 3166, this standard defines a two and a three letter +code in addition to a numerical classification. To further complicate +the matter, instead of using one of the existing standards, many +databases use unstandardised country names to classify countries. + +The country converter (coco) automates the conversion from different +standards and version of country names. Internally, coco is based on a +table specifying the different ISO and UN standards per country together +with the official name and a regular expression which aim to match all +English versions of a specific country name. In addition, coco includes +classification based on UN-, EU-, OECD-membership, UN regions +specifications, continents and various MRIO and IAM databases (see +[Classification schemes](#classification-schemes) below). + +## Installation + +Country_converter is registered at PyPI. From the command line: + + pip install country_converter --upgrade + +The country converter is also available from the [conda +forge](https://conda-forge.org/) and can be installed using conda with +(if you don't have the conda_forge channel added to your conda config +add "-c conda-forge", see [the install instructions +here](https://github.com/conda-forge/country_converter-feedstock)): + + conda install country_converter + +Alternatively, the source code is available on +[GitHub](https://github.com/IndEcol/country_converter). + +The package depends on [Pandas](http://pandas.pydata.org/); for testing +[pytest](http://pytest.org/) is required. For further information on +running the tests see [CONTRIBUTING.md](CONTRIBUTING.md). + +## Usage + +### Basic usage + +#### Use within Python + +Convert various country names to some standard names: + +``` python +import country_converter as coco +some_names = ['United Rep. of Tanzania', 'DE', 'Cape Verde', '788', 'Burma', 'COG', + 'Iran (Islamic Republic of)', 'Korea, Republic of', + "Dem. People's Rep. of Korea"] +standard_names = coco.convert(names=some_names, to='name_short') +print(standard_names) +``` + +Which results in \['Tanzania', 'Germany', 'Cabo Verde', 'Tunisia', +'Myanmar', 'Congo Republic', 'Iran', 'South Korea', 'North Korea'\]. The +input format is determined automatically, based on ISO two letter, ISO +three letter, ISO numeric or regular expression matching. In case of any +ambiguity, the source format can be specified with the parameter 'src'. + +In case of multiple conversion, better performance can be achieved by +instantiating a single CountryConverter object for all conversions: + +``` python +import country_converter as coco +cc = coco.CountryConverter() + +some_names = ['United Rep. of Tanzania', 'Cape Verde', 'Burma', + 'Iran (Islamic Republic of)', 'Korea, Republic of', + "Dem. People's Rep. of Korea"] + +standard_names = cc.convert(names = some_names, to = 'name_short') +UNmembership = cc.convert(names = some_names, to = 'UNmember') +print(standard_names) +print(UNmembership) +``` + +In order to more efficiently convert Pandas Series, the `pandas_convert()` method can be used. The +performance gain is especially significant for large Series. For a series containing 1 million rows +a 4000x speedup can be achieved, compared to `convert()`. + +``` python +import country_converter as coco +import pandas as pd +cc = coco.CountryConverter() + +some_countries = pd.Series(['Australia', 'Belgium', 'Brazil', 'Bulgaria', 'Cyprus', 'Czech Republic', + 'Guatemala', 'Mexico', 'Honduras', 'Costa Rica', 'Colombia', 'Greece', 'Hungary', + 'India', 'Indonesia', 'Ireland', 'Italy', 'Japan', 'Latvia', 'Lithuania', + 'Luxembourg', 'Malta', 'Jamaica', 'Ireland', 'Turkey', 'United Kingdom', + 'United States'], name='country') + +iso3_codes = cc.pandas_convert(series=some_countries, to='ISO3') +``` + +Convert between classification schemes: + +``` python +iso3_codes = ['USA', 'VUT', 'TKL', 'AUT', 'XXX' ] +iso2_codes = coco.convert(names=iso3_codes, to='ISO2') +print(iso2_codes) +``` + +Which results in \['US', 'VU', 'TK', 'AT', 'not found'\] + +The not found indication can be specified (e.g. not_found = 'not +there'), if None is passed for 'not_found', the original entry gets +passed through: + +``` python +iso2_codes = coco.convert(names=iso3_codes, to='ISO2', not_found=None) +print(iso2_codes) +``` + +results in \['US', 'VU', 'TK', 'AT', 'XXX'\] + +Internally the data is stored in a Pandas DataFrame, which can be +accessed directly. For example, this can be used to filter countries for +membership organisations (per year). Note: for this, an instance of +CountryConverter is required. + +``` python +import country_converter as coco +cc = coco.CountryConverter() + +some_countries = ['Australia', 'Belgium', 'Brazil', 'Bulgaria', 'Cyprus', 'Czech Republic', + 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', + 'India', 'Indonesia', 'Ireland', 'Italy', 'Japan', 'Latvia', 'Lithuania', + 'Luxembourg', 'Malta', 'Romania', 'Russia', 'Turkey', 'United Kingdom', + 'United States'] + +oecd_since_1995 = cc.data[(cc.data.OECD >= 1995) & cc.data.name_short.isin(some_countries)].name_short +eu_until_1980 = cc.data[(cc.data.EU <= 1980) & cc.data.name_short.isin(some_countries)].name_short +print(oecd_since_1995) +print(eu_until_1980) +``` + +All classifications can be directly accessed by: + +``` python +cc.EU28 +cc.OECD + +cc.EU27as('ISO3') +``` + +and the classification schemes available: + +``` python +cc.valid_class +``` + +There is also a methdod for only getting country classifications (thus +omitting any grouping of countries): + +``` python +cc.valid_country_classifications +``` + +If you rather need a dictionary describing the classification/membership +use: + +``` python +import country_converter as coco +cc = coco.CountryConverter() +cc.get_correspondence_dict('EXIO3', 'ISO3') +``` + +to also include countries not assigned within a specific classification +use: + +``` python +cc.get_correspondence_dict('EU27', 'ISO2', replace_nan='NonEU') +``` + +The regular expressions can also be used to match any list of countries +to any other. For example: + +``` python +match_these = ['norway', 'united_states', 'china', 'taiwan'] +master_list = ['USA', 'The Swedish Kingdom', 'Norway is a Kingdom too', + 'Peoples Republic of China', 'Republic of China' ] + +matching_dict = coco.match(match_these, master_list) +``` + +Country converter by default provides a warning to the python <span +class="title-ref">logging</span> logger if no match is found. The +following example demonstrates how to configure the <span +class="title-ref">coco</span> logging behaviour. + +``` python +import logging +import country_converter as coco +logging.basicConfig(level=logging.INFO) +coco.convert("asdf") +# WARNING:country_converter.country_converter:asdf not found in regex +# Out: 'not found' + +coco_logger = coco.logging.getLogger() +coco_logger.setLevel(logging.CRITICAL) +coco.convert("asdf") +# Out: 'not found' +``` + +See the IPython Notebook +([country_converter_examples.ipynb](http://nbviewer.ipython.org/github/IndEcol/country_converter/blob/master/doc/country_converter_examples.ipynb)) +for more information. + +#### Command line usage + +The country converter package also provides a command line interface +called coco. + +Minimal example: + + coco Cyprus DE Denmark Estonia 4 'United Kingdom' AUT + +Converts the given names to ISO3 codes based on matching the input to +ISO2, ISO3, ISOnumeric or regular expression matching. The list of names +must be separated by spaces, country names consisting of multiple words +must be put in quotes (''). + +The input classification can be specified with '--src' or '-s' (or will +be determined automatically), the target classification with '--to' or +'-t'. + +The default output is a space separated list, this can be changed by +passing a separator by '--output_sep' or '-o' (e.g -o '\|'). + +Thus, to convert from ISO3 to UN number codes and receive the output as +comma separated list use: + + coco AUT DEU VAT AUS -s ISO3 -t UNcode -o ', ' + +The command line tool also allows to specify the output for none found +entries, including passing them through to the output by passing None: + + coco CAN Peru US Mexico Venezuela UK Arendelle --not_found=None + +and to specify an additional data file which will overwrite existing +country matching + + coco Congo --additional_data path/to/datafile.csv + +See +<https://github.com/IndEcol/country_converter/tree/master/tests/custom_data_example.txt> +for an example of an additional datafile. + +The flags --UNmember_only (-u) and --include_obsolete (-i) restrict the +search to UN member states only or extend it to also include currently +obsolete countries. For example, the [Netherlands +Antilles](https://en.wikipedia.org/wiki/Netherlands_Antilles) were +dissolved in 2010. + +Thus: + + coco "Netherlands Antilles" + +results in "not found". The search, however, can be extended to recently +dissolved countries by: + + coco "Netherlands Antilles" -i + +which results in 'ANT'. + +In addition to the countries, the coco command line tool also accepts +various country classifications (EXIO1, EXIO2, EXIO3, WIOD, Eora, +MESSAGE, OECD, EU27, EU28, UN, obsolete, Cecilia2050, BRIC, APEC, BASIC, +CIS, G7, G20). One of these can be passed by + + coco G20 + +which lists all countries in that classification. + +For the classifications covering almost all countries (MRIO and IAM +classifications) + + coco EXIO3 + +lists the unique classification names. When passing a --to parameter, a +simplified correspondence of the chosen classification is printed: + + coco EXIO3 --to ISO3 + +For further information call the help by + + coco -h + +#### Use in Matlab + +Newer (tested in 2016a) versions of Matlab allow to directly call Python +functions and libraries. This requires a Python version \>= 3.4 +installed in the system path (e.g. through Anaconda). + +To test, try this in Matlab: + +``` matlab +py.print(py.sys.version) +``` + +If this works, you can also use coco after installing it through pip (at +the windows commandline - see the installing instruction above): + +``` matlab +pip install country_converter --upgrade +``` + +And in matlab: + +``` matlab +coco = py.country_converter.CountryConverter() +countries = {'The Swedish Kingdom', 'Norway is a Kingdom too', 'Peoples Republic of China', 'Republic of China'}; +ISO2_pythontype = coco.convert(countries, pyargs('to', 'ISO2')); +ISO2_cellarray = cellfun(@char,cell(ISO2_pythontype),'UniformOutput',false); +``` + +Alternatively, as a long oneliner: + +``` matlab +short_names = cellfun(@char, cell(py.country_converter.convert({56, 276}, pyargs('src', 'UNcode', 'to', 'name_short'))), 'UniformOutput',false); +``` + +All properties of coco as explained above are also available in Matlab: + +``` matlab +coco = py.country_converter.CountryConverter(); +coco.EU27 +EU27ISO3 = coco.EU27as('ISO3'); +``` + +These functions return a Pandas DataFrame. The underlying values can be +access with .values (e.g. + +``` matlab +EU27ISO3.values +``` + +I leave it to professional Matlab users to figure out how to further +process them. + +See also IPython Notebook +([country_converter_examples.ipynb](http://nbviewer.ipython.org/github/IndEcol/country_converter/blob/master/doc/country_converter_examples.ipynb)) +for more information - all functions available in Python (for example +passing additional data files, specifying the output in case of missing +data) work also in Matlab by passing arguments through the pyargs +function. + +### Building concordances for country aggregation + +Coco provides a function for building concordance vectors, matrices and +dictionaries between different classifications. This can be used in +python as well as in matlab. For further information see +([country_converter_aggregation_helper.ipynb](http://nbviewer.ipython.org/github/IndEcol/country_converter/blob/master/doc/country_converter_aggregation_helper.ipynb)) + +## Classification schemes + +Currently the following classification schemes are available (see also +Data sources below for further information): + +1. ISO2 (ISO 3166-1 alpha-2) +2. ISO3 (ISO 3166-1 alpha-3) +3. ISO - numeric (ISO 3166-1 numeric) +4. UN numeric code (M.49 - follows to a large extend ISO-numeric) +5. A standard or short name +6. The "official" name +7. Continent +8. UN region +9. [EXIOBASE](http://exiobase.eu/) 1 classification +10. [EXIOBASE](http://exiobase.eu/) 2 classification +11. [EXIOBASE](http://exiobase.eu/) 3 classification +12. [WIOD](http://www.wiod.org/home) classification +13. [Eora](http://www.worldmrio.com/) +14. [OECD](http://www.oecd.org/about/membersandpartners/list-oecd-member-countries.htm) + membership (per year) +15. [MESSAGE](http://www.iiasa.ac.at/web/home/research/researchPrograms/Energy/MESSAGE-model-regions.en.html) + 11-region classification +16. [IMAGE](https://models.pbl.nl/image/index.php/Welcome_to_IMAGE_3.0_Documentation) +17. [REMIND](https://www.pik-potsdam.de/en/institute/departments/transformation-pathways/models/remind) +18. [UN](http://www.un.org/en/members/) membership (per year) +19. [EU](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:EU_enlargements) + membership (including EU12, EU15, EU25, EU27, EU27_2007, EU28) +20. [EEA](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:European_Economic_Area_(EEA)) + membership +21. [Schengen](https://en.wikipedia.org/wiki/Schengen_Area) region +22. [Cecilia](https://cecilia2050.eu/system/files/De%20Koning%20et%20al.%20%282014%29_Scenarios%20for%202050_0.pdf) + 2050 classification +23. [APEC](https://en.wikipedia.org/wiki/Asia-Pacific_Economic_Cooperation) +24. [BRIC](https://en.wikipedia.org/wiki/BRIC) +25. [BASIC](https://en.wikipedia.org/wiki/BASIC_countries) +26. [CIS](https://en.wikipedia.org/wiki/Commonwealth_of_Independent_States) + (as by 2019, excl. Turkmenistan) +27. [G7](https://en.wikipedia.org/wiki/Group_of_Seven) +28. [G20](https://en.wikipedia.org/wiki/G20) (listing all EU member + states as individual members) +29. [FAOcode](http://www.fao.org/faostat/en/#definitions) (numeric) +30. [GBDcode](http://ghdx.healthdata.org/) (numeric - Global Burden of + Disease country codes) +31. [IEA](https://www.iea.org/countries) (World Energy Balances 2021) +32. [DACcode](https://www.oecd.org/dac/financing-sustainable-development/development-finance-standards/dacandcrscodelists.htm) + (numeric - OECD Development Assistance Committee) +33. [ccTLD](https://en.wikipedia.org/wiki/Country_code_top-level_domain) - country code top-level domains +34. [GWcode](https://www.tandfonline.com/doi/abs/10.1080/03050629908434958) - Gledisch & Ward numerical codes as published in https://www.andybeger.com/states/articles/statelists.html + + +Coco contains official recognised codes as well as non-standard codes +for disputed or dissolved countries. To restrict the set to only the +official recognized UN members or include obsolete countries, pass + +``` python +import country_converter as coco +cc = coco.CountryConverter() +cc_UN = coco.CountryConverter(only_UNmember=True) +cc_all = coco.CountryConverter(include_obsolete=True) + +cc.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') +cc_UN.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') +cc_all.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') +``` + +cc results in \['Palestine', 'Kosovo', 'not found', 'France'\], whereas +cc_UN converts to \['not found', 'not found', 'not found', 'France'\] +and cc_all converts to \['Palestine', 'Kosovo', 'Zanzibar', 'France'\] +Note that the underlying dataframe is available at the attribute .data +(e.g. cc_all.data). + +## Data sources and further reading + +Most of the underlying data can be found in Wikipedia, the page +describing [ISO 3166-1](https://en.wikipedia.org/wiki/ISO_3166-1) is a +good starting point. UN regions/codes are given on the United Nation +Statistical Division +([unstats](http://unstats.un.org/unsd/methods/m49/m49regin.htm)) +webpage. The differences between the ISO numeric and UN (M.49) codes are +[also explained at wikipedia](https://en.wikipedia.org/wiki/UN_M.49). +[EXIOBASE](http://exiobase.eu/), [WIOD](http://www.wiod.org/home) and +[Eora](http://www.worldmrio.com/) classification were extracted from the +respective databases. For [Eora](http://www.worldmrio.com/), the names +are based on the 'Country names' csv file provided on the webpage, but +updated for different names used in the Eora26 database. The MESSAGE +classification follows the 11-region aggregation given in the +[MESSAGE](http://www.iiasa.ac.at/web/home/research/researchPrograms/Energy/MESSAGE-model-regions.en.html) +model regions description. The +[IMAGE](https://models.pbl.nl/image/index.php/Welcome_to_IMAGE_3.0_Documentation) +classification is based on the "[region classification +map](https://models.pbl.nl/image/index.php/Region_classification_map)", +for +[REMIND](https://www.pik-potsdam.de/en/institute/departments/transformation-pathways/models/remind) +we received a country mapping from the model developers. + +The membership of +[OECD](http://www.oecd.org/about/membersandpartners/list-oecd-member-countries.htm) +and [UN](http://www.un.org/en/members/) can be found at the membership +organisations' webpages, information about obsolete country codes on the +[Statoids](http://www.statoids.com/w3166his.html) webpage. + +The situation for the +[EU](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:EU_enlargements) +got complicated due to the Brexit process. For the naming, coco follows +the [Eurostat +glossary](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:EU_enlargements), +thus EU27 refers to the EU without UK, whereas EU27_2007 refers to the +EU without Croatia (the status after the 2007 enlargement). The shortcut +EU always links to the most recent classification. The +[EEA](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:European_Economic_Area_(EEA)) +agreements for the UK ended by 2021-01-01 (which also affects Guernsey, Isle of Man, Jersey and Gibraltar). +Switzerland is not part of the EEA but member of the single market. + +The Global Burden of Disease country codes were extracted form the [GBD +code book available +here.](https://ghdx.healthdata.org/sites/default/files/ihme_query_tool/IHME_GBD_2019_CODEBOOK.zip) + +## Communication, issues, bugs and enhancements + +Please use the issue tracker for documenting bugs, proposing +enhancements and all other communication related to coco. + +You can follow me on [mastodon - @kst@qoto.org](https://qoto.org/@kst) and [twitter](https://twitter.com/kst_stadler) to get +the latest news about all my open-source and research projects (and +occasionally some random retweets/toots). + +## Contributing + +Want to contribute? Great! Please check +[CONTRIBUTING.md](CONTRIBUTING.md) if you want to help to improve coco +and for some pointer for how to add classifications. + +## Related software + +The package [pycountry](https://pypi.python.org/pypi/pycountry) provides +access to the official ISO databases for historic countries, country +subdivisions, languages and currencies. In case you need to convert +non-English country names, +[countrynames](https://github.com/occrp/countrynames) includes an +extensive database of country names in different languages and functions +to convert them to the different ISO 3166 standards. +[Python-iso3166](https://github.com/deactivated/python-iso3166) focuses +on conversion between the two-letter, three-letter and three-digit codes +defined in the ISO 3166 standard. + +If you are using R, you should have a look at +[countrycode](https://github.com/vincentarelbundock/countrycode). + +## Citing the country converter + +Version 0.5 of the country converter was published in the [Journal of +Open Source Software](http://joss.theoj.org/). To cite the country +converter in publication please use: + +Stadler, K. (2017). The country converter coco - a Python package for +converting country names between different classification schemes. The +Journal of Open Source Software. doi: +[10.21105/joss.00332](http://dx.doi.org/10.21105/joss.00332) + +For the full bibtex key see [CITATION](CITATION) + +## Acknowledgements + +This package was inspired by (and the regular expression are mostly +based on) the R-package +[countrycode](https://github.com/vincentarelbundock/countrycode) by +[Vincent Arel-Bundock](http://arelbundock.com/) and his (defunct) port +to Python (pycountrycode). Many thanks to [Robert +Gieseke](https://github.com/rgieseke) for the review of the source code +and paper for the publication in the [Journal of Open Source +Software](http://joss.theoj.org/). + + +%package -n python3-country-converter +Summary: The country converter (coco) - a Python package for converting country names between different classifications schemes +Provides: python-country-converter +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-country-converter +# country converter + +The country converter (coco) is a Python package to convert and match +country names between different classifications and between different +naming versions. Internally it uses regular expressions to match country +names. Coco can also be used to build aggregation concordance matrices +between different classification schemes. + +[](https://pypi.python.org/pypi/country_converter/) +[](https://anaconda.org/conda-forge/country_converter) +[](https://doi.org/10.5281/zenodo.838035) +[](http://joss.theoj.org/papers/af694f2e5994b8aacbad119c4005e113) + +[](https://github.com/IndEcol/country_converter/actions) +[](https://coveralls.io/github/IndEcol/country_converter?branch=master) +[](https://github.com/psf/black) +[](https://www.gnu.org/licenses/gpl-3.0) + + +## Motivation + +To date, there is no single standard of how to name or specify +individual countries in a (meta) data description. While some data +sources follow ISO 3166, this standard defines a two and a three letter +code in addition to a numerical classification. To further complicate +the matter, instead of using one of the existing standards, many +databases use unstandardised country names to classify countries. + +The country converter (coco) automates the conversion from different +standards and version of country names. Internally, coco is based on a +table specifying the different ISO and UN standards per country together +with the official name and a regular expression which aim to match all +English versions of a specific country name. In addition, coco includes +classification based on UN-, EU-, OECD-membership, UN regions +specifications, continents and various MRIO and IAM databases (see +[Classification schemes](#classification-schemes) below). + +## Installation + +Country_converter is registered at PyPI. From the command line: + + pip install country_converter --upgrade + +The country converter is also available from the [conda +forge](https://conda-forge.org/) and can be installed using conda with +(if you don't have the conda_forge channel added to your conda config +add "-c conda-forge", see [the install instructions +here](https://github.com/conda-forge/country_converter-feedstock)): + + conda install country_converter + +Alternatively, the source code is available on +[GitHub](https://github.com/IndEcol/country_converter). + +The package depends on [Pandas](http://pandas.pydata.org/); for testing +[pytest](http://pytest.org/) is required. For further information on +running the tests see [CONTRIBUTING.md](CONTRIBUTING.md). + +## Usage + +### Basic usage + +#### Use within Python + +Convert various country names to some standard names: + +``` python +import country_converter as coco +some_names = ['United Rep. of Tanzania', 'DE', 'Cape Verde', '788', 'Burma', 'COG', + 'Iran (Islamic Republic of)', 'Korea, Republic of', + "Dem. People's Rep. of Korea"] +standard_names = coco.convert(names=some_names, to='name_short') +print(standard_names) +``` + +Which results in \['Tanzania', 'Germany', 'Cabo Verde', 'Tunisia', +'Myanmar', 'Congo Republic', 'Iran', 'South Korea', 'North Korea'\]. The +input format is determined automatically, based on ISO two letter, ISO +three letter, ISO numeric or regular expression matching. In case of any +ambiguity, the source format can be specified with the parameter 'src'. + +In case of multiple conversion, better performance can be achieved by +instantiating a single CountryConverter object for all conversions: + +``` python +import country_converter as coco +cc = coco.CountryConverter() + +some_names = ['United Rep. of Tanzania', 'Cape Verde', 'Burma', + 'Iran (Islamic Republic of)', 'Korea, Republic of', + "Dem. People's Rep. of Korea"] + +standard_names = cc.convert(names = some_names, to = 'name_short') +UNmembership = cc.convert(names = some_names, to = 'UNmember') +print(standard_names) +print(UNmembership) +``` + +In order to more efficiently convert Pandas Series, the `pandas_convert()` method can be used. The +performance gain is especially significant for large Series. For a series containing 1 million rows +a 4000x speedup can be achieved, compared to `convert()`. + +``` python +import country_converter as coco +import pandas as pd +cc = coco.CountryConverter() + +some_countries = pd.Series(['Australia', 'Belgium', 'Brazil', 'Bulgaria', 'Cyprus', 'Czech Republic', + 'Guatemala', 'Mexico', 'Honduras', 'Costa Rica', 'Colombia', 'Greece', 'Hungary', + 'India', 'Indonesia', 'Ireland', 'Italy', 'Japan', 'Latvia', 'Lithuania', + 'Luxembourg', 'Malta', 'Jamaica', 'Ireland', 'Turkey', 'United Kingdom', + 'United States'], name='country') + +iso3_codes = cc.pandas_convert(series=some_countries, to='ISO3') +``` + +Convert between classification schemes: + +``` python +iso3_codes = ['USA', 'VUT', 'TKL', 'AUT', 'XXX' ] +iso2_codes = coco.convert(names=iso3_codes, to='ISO2') +print(iso2_codes) +``` + +Which results in \['US', 'VU', 'TK', 'AT', 'not found'\] + +The not found indication can be specified (e.g. not_found = 'not +there'), if None is passed for 'not_found', the original entry gets +passed through: + +``` python +iso2_codes = coco.convert(names=iso3_codes, to='ISO2', not_found=None) +print(iso2_codes) +``` + +results in \['US', 'VU', 'TK', 'AT', 'XXX'\] + +Internally the data is stored in a Pandas DataFrame, which can be +accessed directly. For example, this can be used to filter countries for +membership organisations (per year). Note: for this, an instance of +CountryConverter is required. + +``` python +import country_converter as coco +cc = coco.CountryConverter() + +some_countries = ['Australia', 'Belgium', 'Brazil', 'Bulgaria', 'Cyprus', 'Czech Republic', + 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', + 'India', 'Indonesia', 'Ireland', 'Italy', 'Japan', 'Latvia', 'Lithuania', + 'Luxembourg', 'Malta', 'Romania', 'Russia', 'Turkey', 'United Kingdom', + 'United States'] + +oecd_since_1995 = cc.data[(cc.data.OECD >= 1995) & cc.data.name_short.isin(some_countries)].name_short +eu_until_1980 = cc.data[(cc.data.EU <= 1980) & cc.data.name_short.isin(some_countries)].name_short +print(oecd_since_1995) +print(eu_until_1980) +``` + +All classifications can be directly accessed by: + +``` python +cc.EU28 +cc.OECD + +cc.EU27as('ISO3') +``` + +and the classification schemes available: + +``` python +cc.valid_class +``` + +There is also a methdod for only getting country classifications (thus +omitting any grouping of countries): + +``` python +cc.valid_country_classifications +``` + +If you rather need a dictionary describing the classification/membership +use: + +``` python +import country_converter as coco +cc = coco.CountryConverter() +cc.get_correspondence_dict('EXIO3', 'ISO3') +``` + +to also include countries not assigned within a specific classification +use: + +``` python +cc.get_correspondence_dict('EU27', 'ISO2', replace_nan='NonEU') +``` + +The regular expressions can also be used to match any list of countries +to any other. For example: + +``` python +match_these = ['norway', 'united_states', 'china', 'taiwan'] +master_list = ['USA', 'The Swedish Kingdom', 'Norway is a Kingdom too', + 'Peoples Republic of China', 'Republic of China' ] + +matching_dict = coco.match(match_these, master_list) +``` + +Country converter by default provides a warning to the python <span +class="title-ref">logging</span> logger if no match is found. The +following example demonstrates how to configure the <span +class="title-ref">coco</span> logging behaviour. + +``` python +import logging +import country_converter as coco +logging.basicConfig(level=logging.INFO) +coco.convert("asdf") +# WARNING:country_converter.country_converter:asdf not found in regex +# Out: 'not found' + +coco_logger = coco.logging.getLogger() +coco_logger.setLevel(logging.CRITICAL) +coco.convert("asdf") +# Out: 'not found' +``` + +See the IPython Notebook +([country_converter_examples.ipynb](http://nbviewer.ipython.org/github/IndEcol/country_converter/blob/master/doc/country_converter_examples.ipynb)) +for more information. + +#### Command line usage + +The country converter package also provides a command line interface +called coco. + +Minimal example: + + coco Cyprus DE Denmark Estonia 4 'United Kingdom' AUT + +Converts the given names to ISO3 codes based on matching the input to +ISO2, ISO3, ISOnumeric or regular expression matching. The list of names +must be separated by spaces, country names consisting of multiple words +must be put in quotes (''). + +The input classification can be specified with '--src' or '-s' (or will +be determined automatically), the target classification with '--to' or +'-t'. + +The default output is a space separated list, this can be changed by +passing a separator by '--output_sep' or '-o' (e.g -o '\|'). + +Thus, to convert from ISO3 to UN number codes and receive the output as +comma separated list use: + + coco AUT DEU VAT AUS -s ISO3 -t UNcode -o ', ' + +The command line tool also allows to specify the output for none found +entries, including passing them through to the output by passing None: + + coco CAN Peru US Mexico Venezuela UK Arendelle --not_found=None + +and to specify an additional data file which will overwrite existing +country matching + + coco Congo --additional_data path/to/datafile.csv + +See +<https://github.com/IndEcol/country_converter/tree/master/tests/custom_data_example.txt> +for an example of an additional datafile. + +The flags --UNmember_only (-u) and --include_obsolete (-i) restrict the +search to UN member states only or extend it to also include currently +obsolete countries. For example, the [Netherlands +Antilles](https://en.wikipedia.org/wiki/Netherlands_Antilles) were +dissolved in 2010. + +Thus: + + coco "Netherlands Antilles" + +results in "not found". The search, however, can be extended to recently +dissolved countries by: + + coco "Netherlands Antilles" -i + +which results in 'ANT'. + +In addition to the countries, the coco command line tool also accepts +various country classifications (EXIO1, EXIO2, EXIO3, WIOD, Eora, +MESSAGE, OECD, EU27, EU28, UN, obsolete, Cecilia2050, BRIC, APEC, BASIC, +CIS, G7, G20). One of these can be passed by + + coco G20 + +which lists all countries in that classification. + +For the classifications covering almost all countries (MRIO and IAM +classifications) + + coco EXIO3 + +lists the unique classification names. When passing a --to parameter, a +simplified correspondence of the chosen classification is printed: + + coco EXIO3 --to ISO3 + +For further information call the help by + + coco -h + +#### Use in Matlab + +Newer (tested in 2016a) versions of Matlab allow to directly call Python +functions and libraries. This requires a Python version \>= 3.4 +installed in the system path (e.g. through Anaconda). + +To test, try this in Matlab: + +``` matlab +py.print(py.sys.version) +``` + +If this works, you can also use coco after installing it through pip (at +the windows commandline - see the installing instruction above): + +``` matlab +pip install country_converter --upgrade +``` + +And in matlab: + +``` matlab +coco = py.country_converter.CountryConverter() +countries = {'The Swedish Kingdom', 'Norway is a Kingdom too', 'Peoples Republic of China', 'Republic of China'}; +ISO2_pythontype = coco.convert(countries, pyargs('to', 'ISO2')); +ISO2_cellarray = cellfun(@char,cell(ISO2_pythontype),'UniformOutput',false); +``` + +Alternatively, as a long oneliner: + +``` matlab +short_names = cellfun(@char, cell(py.country_converter.convert({56, 276}, pyargs('src', 'UNcode', 'to', 'name_short'))), 'UniformOutput',false); +``` + +All properties of coco as explained above are also available in Matlab: + +``` matlab +coco = py.country_converter.CountryConverter(); +coco.EU27 +EU27ISO3 = coco.EU27as('ISO3'); +``` + +These functions return a Pandas DataFrame. The underlying values can be +access with .values (e.g. + +``` matlab +EU27ISO3.values +``` + +I leave it to professional Matlab users to figure out how to further +process them. + +See also IPython Notebook +([country_converter_examples.ipynb](http://nbviewer.ipython.org/github/IndEcol/country_converter/blob/master/doc/country_converter_examples.ipynb)) +for more information - all functions available in Python (for example +passing additional data files, specifying the output in case of missing +data) work also in Matlab by passing arguments through the pyargs +function. + +### Building concordances for country aggregation + +Coco provides a function for building concordance vectors, matrices and +dictionaries between different classifications. This can be used in +python as well as in matlab. For further information see +([country_converter_aggregation_helper.ipynb](http://nbviewer.ipython.org/github/IndEcol/country_converter/blob/master/doc/country_converter_aggregation_helper.ipynb)) + +## Classification schemes + +Currently the following classification schemes are available (see also +Data sources below for further information): + +1. ISO2 (ISO 3166-1 alpha-2) +2. ISO3 (ISO 3166-1 alpha-3) +3. ISO - numeric (ISO 3166-1 numeric) +4. UN numeric code (M.49 - follows to a large extend ISO-numeric) +5. A standard or short name +6. The "official" name +7. Continent +8. UN region +9. [EXIOBASE](http://exiobase.eu/) 1 classification +10. [EXIOBASE](http://exiobase.eu/) 2 classification +11. [EXIOBASE](http://exiobase.eu/) 3 classification +12. [WIOD](http://www.wiod.org/home) classification +13. [Eora](http://www.worldmrio.com/) +14. [OECD](http://www.oecd.org/about/membersandpartners/list-oecd-member-countries.htm) + membership (per year) +15. [MESSAGE](http://www.iiasa.ac.at/web/home/research/researchPrograms/Energy/MESSAGE-model-regions.en.html) + 11-region classification +16. [IMAGE](https://models.pbl.nl/image/index.php/Welcome_to_IMAGE_3.0_Documentation) +17. [REMIND](https://www.pik-potsdam.de/en/institute/departments/transformation-pathways/models/remind) +18. [UN](http://www.un.org/en/members/) membership (per year) +19. [EU](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:EU_enlargements) + membership (including EU12, EU15, EU25, EU27, EU27_2007, EU28) +20. [EEA](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:European_Economic_Area_(EEA)) + membership +21. [Schengen](https://en.wikipedia.org/wiki/Schengen_Area) region +22. [Cecilia](https://cecilia2050.eu/system/files/De%20Koning%20et%20al.%20%282014%29_Scenarios%20for%202050_0.pdf) + 2050 classification +23. [APEC](https://en.wikipedia.org/wiki/Asia-Pacific_Economic_Cooperation) +24. [BRIC](https://en.wikipedia.org/wiki/BRIC) +25. [BASIC](https://en.wikipedia.org/wiki/BASIC_countries) +26. [CIS](https://en.wikipedia.org/wiki/Commonwealth_of_Independent_States) + (as by 2019, excl. Turkmenistan) +27. [G7](https://en.wikipedia.org/wiki/Group_of_Seven) +28. [G20](https://en.wikipedia.org/wiki/G20) (listing all EU member + states as individual members) +29. [FAOcode](http://www.fao.org/faostat/en/#definitions) (numeric) +30. [GBDcode](http://ghdx.healthdata.org/) (numeric - Global Burden of + Disease country codes) +31. [IEA](https://www.iea.org/countries) (World Energy Balances 2021) +32. [DACcode](https://www.oecd.org/dac/financing-sustainable-development/development-finance-standards/dacandcrscodelists.htm) + (numeric - OECD Development Assistance Committee) +33. [ccTLD](https://en.wikipedia.org/wiki/Country_code_top-level_domain) - country code top-level domains +34. [GWcode](https://www.tandfonline.com/doi/abs/10.1080/03050629908434958) - Gledisch & Ward numerical codes as published in https://www.andybeger.com/states/articles/statelists.html + + +Coco contains official recognised codes as well as non-standard codes +for disputed or dissolved countries. To restrict the set to only the +official recognized UN members or include obsolete countries, pass + +``` python +import country_converter as coco +cc = coco.CountryConverter() +cc_UN = coco.CountryConverter(only_UNmember=True) +cc_all = coco.CountryConverter(include_obsolete=True) + +cc.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') +cc_UN.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') +cc_all.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') +``` + +cc results in \['Palestine', 'Kosovo', 'not found', 'France'\], whereas +cc_UN converts to \['not found', 'not found', 'not found', 'France'\] +and cc_all converts to \['Palestine', 'Kosovo', 'Zanzibar', 'France'\] +Note that the underlying dataframe is available at the attribute .data +(e.g. cc_all.data). + +## Data sources and further reading + +Most of the underlying data can be found in Wikipedia, the page +describing [ISO 3166-1](https://en.wikipedia.org/wiki/ISO_3166-1) is a +good starting point. UN regions/codes are given on the United Nation +Statistical Division +([unstats](http://unstats.un.org/unsd/methods/m49/m49regin.htm)) +webpage. The differences between the ISO numeric and UN (M.49) codes are +[also explained at wikipedia](https://en.wikipedia.org/wiki/UN_M.49). +[EXIOBASE](http://exiobase.eu/), [WIOD](http://www.wiod.org/home) and +[Eora](http://www.worldmrio.com/) classification were extracted from the +respective databases. For [Eora](http://www.worldmrio.com/), the names +are based on the 'Country names' csv file provided on the webpage, but +updated for different names used in the Eora26 database. The MESSAGE +classification follows the 11-region aggregation given in the +[MESSAGE](http://www.iiasa.ac.at/web/home/research/researchPrograms/Energy/MESSAGE-model-regions.en.html) +model regions description. The +[IMAGE](https://models.pbl.nl/image/index.php/Welcome_to_IMAGE_3.0_Documentation) +classification is based on the "[region classification +map](https://models.pbl.nl/image/index.php/Region_classification_map)", +for +[REMIND](https://www.pik-potsdam.de/en/institute/departments/transformation-pathways/models/remind) +we received a country mapping from the model developers. + +The membership of +[OECD](http://www.oecd.org/about/membersandpartners/list-oecd-member-countries.htm) +and [UN](http://www.un.org/en/members/) can be found at the membership +organisations' webpages, information about obsolete country codes on the +[Statoids](http://www.statoids.com/w3166his.html) webpage. + +The situation for the +[EU](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:EU_enlargements) +got complicated due to the Brexit process. For the naming, coco follows +the [Eurostat +glossary](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:EU_enlargements), +thus EU27 refers to the EU without UK, whereas EU27_2007 refers to the +EU without Croatia (the status after the 2007 enlargement). The shortcut +EU always links to the most recent classification. The +[EEA](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:European_Economic_Area_(EEA)) +agreements for the UK ended by 2021-01-01 (which also affects Guernsey, Isle of Man, Jersey and Gibraltar). +Switzerland is not part of the EEA but member of the single market. + +The Global Burden of Disease country codes were extracted form the [GBD +code book available +here.](https://ghdx.healthdata.org/sites/default/files/ihme_query_tool/IHME_GBD_2019_CODEBOOK.zip) + +## Communication, issues, bugs and enhancements + +Please use the issue tracker for documenting bugs, proposing +enhancements and all other communication related to coco. + +You can follow me on [mastodon - @kst@qoto.org](https://qoto.org/@kst) and [twitter](https://twitter.com/kst_stadler) to get +the latest news about all my open-source and research projects (and +occasionally some random retweets/toots). + +## Contributing + +Want to contribute? Great! Please check +[CONTRIBUTING.md](CONTRIBUTING.md) if you want to help to improve coco +and for some pointer for how to add classifications. + +## Related software + +The package [pycountry](https://pypi.python.org/pypi/pycountry) provides +access to the official ISO databases for historic countries, country +subdivisions, languages and currencies. In case you need to convert +non-English country names, +[countrynames](https://github.com/occrp/countrynames) includes an +extensive database of country names in different languages and functions +to convert them to the different ISO 3166 standards. +[Python-iso3166](https://github.com/deactivated/python-iso3166) focuses +on conversion between the two-letter, three-letter and three-digit codes +defined in the ISO 3166 standard. + +If you are using R, you should have a look at +[countrycode](https://github.com/vincentarelbundock/countrycode). + +## Citing the country converter + +Version 0.5 of the country converter was published in the [Journal of +Open Source Software](http://joss.theoj.org/). To cite the country +converter in publication please use: + +Stadler, K. (2017). The country converter coco - a Python package for +converting country names between different classification schemes. The +Journal of Open Source Software. doi: +[10.21105/joss.00332](http://dx.doi.org/10.21105/joss.00332) + +For the full bibtex key see [CITATION](CITATION) + +## Acknowledgements + +This package was inspired by (and the regular expression are mostly +based on) the R-package +[countrycode](https://github.com/vincentarelbundock/countrycode) by +[Vincent Arel-Bundock](http://arelbundock.com/) and his (defunct) port +to Python (pycountrycode). Many thanks to [Robert +Gieseke](https://github.com/rgieseke) for the review of the source code +and paper for the publication in the [Journal of Open Source +Software](http://joss.theoj.org/). + + +%package help +Summary: Development documents and examples for country-converter +Provides: python3-country-converter-doc +%description help +# country converter + +The country converter (coco) is a Python package to convert and match +country names between different classifications and between different +naming versions. Internally it uses regular expressions to match country +names. Coco can also be used to build aggregation concordance matrices +between different classification schemes. + +[](https://pypi.python.org/pypi/country_converter/) +[](https://anaconda.org/conda-forge/country_converter) +[](https://doi.org/10.5281/zenodo.838035) +[](http://joss.theoj.org/papers/af694f2e5994b8aacbad119c4005e113) + +[](https://github.com/IndEcol/country_converter/actions) +[](https://coveralls.io/github/IndEcol/country_converter?branch=master) +[](https://github.com/psf/black) +[](https://www.gnu.org/licenses/gpl-3.0) + + +## Motivation + +To date, there is no single standard of how to name or specify +individual countries in a (meta) data description. While some data +sources follow ISO 3166, this standard defines a two and a three letter +code in addition to a numerical classification. To further complicate +the matter, instead of using one of the existing standards, many +databases use unstandardised country names to classify countries. + +The country converter (coco) automates the conversion from different +standards and version of country names. Internally, coco is based on a +table specifying the different ISO and UN standards per country together +with the official name and a regular expression which aim to match all +English versions of a specific country name. In addition, coco includes +classification based on UN-, EU-, OECD-membership, UN regions +specifications, continents and various MRIO and IAM databases (see +[Classification schemes](#classification-schemes) below). + +## Installation + +Country_converter is registered at PyPI. From the command line: + + pip install country_converter --upgrade + +The country converter is also available from the [conda +forge](https://conda-forge.org/) and can be installed using conda with +(if you don't have the conda_forge channel added to your conda config +add "-c conda-forge", see [the install instructions +here](https://github.com/conda-forge/country_converter-feedstock)): + + conda install country_converter + +Alternatively, the source code is available on +[GitHub](https://github.com/IndEcol/country_converter). + +The package depends on [Pandas](http://pandas.pydata.org/); for testing +[pytest](http://pytest.org/) is required. For further information on +running the tests see [CONTRIBUTING.md](CONTRIBUTING.md). + +## Usage + +### Basic usage + +#### Use within Python + +Convert various country names to some standard names: + +``` python +import country_converter as coco +some_names = ['United Rep. of Tanzania', 'DE', 'Cape Verde', '788', 'Burma', 'COG', + 'Iran (Islamic Republic of)', 'Korea, Republic of', + "Dem. People's Rep. of Korea"] +standard_names = coco.convert(names=some_names, to='name_short') +print(standard_names) +``` + +Which results in \['Tanzania', 'Germany', 'Cabo Verde', 'Tunisia', +'Myanmar', 'Congo Republic', 'Iran', 'South Korea', 'North Korea'\]. The +input format is determined automatically, based on ISO two letter, ISO +three letter, ISO numeric or regular expression matching. In case of any +ambiguity, the source format can be specified with the parameter 'src'. + +In case of multiple conversion, better performance can be achieved by +instantiating a single CountryConverter object for all conversions: + +``` python +import country_converter as coco +cc = coco.CountryConverter() + +some_names = ['United Rep. of Tanzania', 'Cape Verde', 'Burma', + 'Iran (Islamic Republic of)', 'Korea, Republic of', + "Dem. People's Rep. of Korea"] + +standard_names = cc.convert(names = some_names, to = 'name_short') +UNmembership = cc.convert(names = some_names, to = 'UNmember') +print(standard_names) +print(UNmembership) +``` + +In order to more efficiently convert Pandas Series, the `pandas_convert()` method can be used. The +performance gain is especially significant for large Series. For a series containing 1 million rows +a 4000x speedup can be achieved, compared to `convert()`. + +``` python +import country_converter as coco +import pandas as pd +cc = coco.CountryConverter() + +some_countries = pd.Series(['Australia', 'Belgium', 'Brazil', 'Bulgaria', 'Cyprus', 'Czech Republic', + 'Guatemala', 'Mexico', 'Honduras', 'Costa Rica', 'Colombia', 'Greece', 'Hungary', + 'India', 'Indonesia', 'Ireland', 'Italy', 'Japan', 'Latvia', 'Lithuania', + 'Luxembourg', 'Malta', 'Jamaica', 'Ireland', 'Turkey', 'United Kingdom', + 'United States'], name='country') + +iso3_codes = cc.pandas_convert(series=some_countries, to='ISO3') +``` + +Convert between classification schemes: + +``` python +iso3_codes = ['USA', 'VUT', 'TKL', 'AUT', 'XXX' ] +iso2_codes = coco.convert(names=iso3_codes, to='ISO2') +print(iso2_codes) +``` + +Which results in \['US', 'VU', 'TK', 'AT', 'not found'\] + +The not found indication can be specified (e.g. not_found = 'not +there'), if None is passed for 'not_found', the original entry gets +passed through: + +``` python +iso2_codes = coco.convert(names=iso3_codes, to='ISO2', not_found=None) +print(iso2_codes) +``` + +results in \['US', 'VU', 'TK', 'AT', 'XXX'\] + +Internally the data is stored in a Pandas DataFrame, which can be +accessed directly. For example, this can be used to filter countries for +membership organisations (per year). Note: for this, an instance of +CountryConverter is required. + +``` python +import country_converter as coco +cc = coco.CountryConverter() + +some_countries = ['Australia', 'Belgium', 'Brazil', 'Bulgaria', 'Cyprus', 'Czech Republic', + 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', + 'India', 'Indonesia', 'Ireland', 'Italy', 'Japan', 'Latvia', 'Lithuania', + 'Luxembourg', 'Malta', 'Romania', 'Russia', 'Turkey', 'United Kingdom', + 'United States'] + +oecd_since_1995 = cc.data[(cc.data.OECD >= 1995) & cc.data.name_short.isin(some_countries)].name_short +eu_until_1980 = cc.data[(cc.data.EU <= 1980) & cc.data.name_short.isin(some_countries)].name_short +print(oecd_since_1995) +print(eu_until_1980) +``` + +All classifications can be directly accessed by: + +``` python +cc.EU28 +cc.OECD + +cc.EU27as('ISO3') +``` + +and the classification schemes available: + +``` python +cc.valid_class +``` + +There is also a methdod for only getting country classifications (thus +omitting any grouping of countries): + +``` python +cc.valid_country_classifications +``` + +If you rather need a dictionary describing the classification/membership +use: + +``` python +import country_converter as coco +cc = coco.CountryConverter() +cc.get_correspondence_dict('EXIO3', 'ISO3') +``` + +to also include countries not assigned within a specific classification +use: + +``` python +cc.get_correspondence_dict('EU27', 'ISO2', replace_nan='NonEU') +``` + +The regular expressions can also be used to match any list of countries +to any other. For example: + +``` python +match_these = ['norway', 'united_states', 'china', 'taiwan'] +master_list = ['USA', 'The Swedish Kingdom', 'Norway is a Kingdom too', + 'Peoples Republic of China', 'Republic of China' ] + +matching_dict = coco.match(match_these, master_list) +``` + +Country converter by default provides a warning to the python <span +class="title-ref">logging</span> logger if no match is found. The +following example demonstrates how to configure the <span +class="title-ref">coco</span> logging behaviour. + +``` python +import logging +import country_converter as coco +logging.basicConfig(level=logging.INFO) +coco.convert("asdf") +# WARNING:country_converter.country_converter:asdf not found in regex +# Out: 'not found' + +coco_logger = coco.logging.getLogger() +coco_logger.setLevel(logging.CRITICAL) +coco.convert("asdf") +# Out: 'not found' +``` + +See the IPython Notebook +([country_converter_examples.ipynb](http://nbviewer.ipython.org/github/IndEcol/country_converter/blob/master/doc/country_converter_examples.ipynb)) +for more information. + +#### Command line usage + +The country converter package also provides a command line interface +called coco. + +Minimal example: + + coco Cyprus DE Denmark Estonia 4 'United Kingdom' AUT + +Converts the given names to ISO3 codes based on matching the input to +ISO2, ISO3, ISOnumeric or regular expression matching. The list of names +must be separated by spaces, country names consisting of multiple words +must be put in quotes (''). + +The input classification can be specified with '--src' or '-s' (or will +be determined automatically), the target classification with '--to' or +'-t'. + +The default output is a space separated list, this can be changed by +passing a separator by '--output_sep' or '-o' (e.g -o '\|'). + +Thus, to convert from ISO3 to UN number codes and receive the output as +comma separated list use: + + coco AUT DEU VAT AUS -s ISO3 -t UNcode -o ', ' + +The command line tool also allows to specify the output for none found +entries, including passing them through to the output by passing None: + + coco CAN Peru US Mexico Venezuela UK Arendelle --not_found=None + +and to specify an additional data file which will overwrite existing +country matching + + coco Congo --additional_data path/to/datafile.csv + +See +<https://github.com/IndEcol/country_converter/tree/master/tests/custom_data_example.txt> +for an example of an additional datafile. + +The flags --UNmember_only (-u) and --include_obsolete (-i) restrict the +search to UN member states only or extend it to also include currently +obsolete countries. For example, the [Netherlands +Antilles](https://en.wikipedia.org/wiki/Netherlands_Antilles) were +dissolved in 2010. + +Thus: + + coco "Netherlands Antilles" + +results in "not found". The search, however, can be extended to recently +dissolved countries by: + + coco "Netherlands Antilles" -i + +which results in 'ANT'. + +In addition to the countries, the coco command line tool also accepts +various country classifications (EXIO1, EXIO2, EXIO3, WIOD, Eora, +MESSAGE, OECD, EU27, EU28, UN, obsolete, Cecilia2050, BRIC, APEC, BASIC, +CIS, G7, G20). One of these can be passed by + + coco G20 + +which lists all countries in that classification. + +For the classifications covering almost all countries (MRIO and IAM +classifications) + + coco EXIO3 + +lists the unique classification names. When passing a --to parameter, a +simplified correspondence of the chosen classification is printed: + + coco EXIO3 --to ISO3 + +For further information call the help by + + coco -h + +#### Use in Matlab + +Newer (tested in 2016a) versions of Matlab allow to directly call Python +functions and libraries. This requires a Python version \>= 3.4 +installed in the system path (e.g. through Anaconda). + +To test, try this in Matlab: + +``` matlab +py.print(py.sys.version) +``` + +If this works, you can also use coco after installing it through pip (at +the windows commandline - see the installing instruction above): + +``` matlab +pip install country_converter --upgrade +``` + +And in matlab: + +``` matlab +coco = py.country_converter.CountryConverter() +countries = {'The Swedish Kingdom', 'Norway is a Kingdom too', 'Peoples Republic of China', 'Republic of China'}; +ISO2_pythontype = coco.convert(countries, pyargs('to', 'ISO2')); +ISO2_cellarray = cellfun(@char,cell(ISO2_pythontype),'UniformOutput',false); +``` + +Alternatively, as a long oneliner: + +``` matlab +short_names = cellfun(@char, cell(py.country_converter.convert({56, 276}, pyargs('src', 'UNcode', 'to', 'name_short'))), 'UniformOutput',false); +``` + +All properties of coco as explained above are also available in Matlab: + +``` matlab +coco = py.country_converter.CountryConverter(); +coco.EU27 +EU27ISO3 = coco.EU27as('ISO3'); +``` + +These functions return a Pandas DataFrame. The underlying values can be +access with .values (e.g. + +``` matlab +EU27ISO3.values +``` + +I leave it to professional Matlab users to figure out how to further +process them. + +See also IPython Notebook +([country_converter_examples.ipynb](http://nbviewer.ipython.org/github/IndEcol/country_converter/blob/master/doc/country_converter_examples.ipynb)) +for more information - all functions available in Python (for example +passing additional data files, specifying the output in case of missing +data) work also in Matlab by passing arguments through the pyargs +function. + +### Building concordances for country aggregation + +Coco provides a function for building concordance vectors, matrices and +dictionaries between different classifications. This can be used in +python as well as in matlab. For further information see +([country_converter_aggregation_helper.ipynb](http://nbviewer.ipython.org/github/IndEcol/country_converter/blob/master/doc/country_converter_aggregation_helper.ipynb)) + +## Classification schemes + +Currently the following classification schemes are available (see also +Data sources below for further information): + +1. ISO2 (ISO 3166-1 alpha-2) +2. ISO3 (ISO 3166-1 alpha-3) +3. ISO - numeric (ISO 3166-1 numeric) +4. UN numeric code (M.49 - follows to a large extend ISO-numeric) +5. A standard or short name +6. The "official" name +7. Continent +8. UN region +9. [EXIOBASE](http://exiobase.eu/) 1 classification +10. [EXIOBASE](http://exiobase.eu/) 2 classification +11. [EXIOBASE](http://exiobase.eu/) 3 classification +12. [WIOD](http://www.wiod.org/home) classification +13. [Eora](http://www.worldmrio.com/) +14. [OECD](http://www.oecd.org/about/membersandpartners/list-oecd-member-countries.htm) + membership (per year) +15. [MESSAGE](http://www.iiasa.ac.at/web/home/research/researchPrograms/Energy/MESSAGE-model-regions.en.html) + 11-region classification +16. [IMAGE](https://models.pbl.nl/image/index.php/Welcome_to_IMAGE_3.0_Documentation) +17. [REMIND](https://www.pik-potsdam.de/en/institute/departments/transformation-pathways/models/remind) +18. [UN](http://www.un.org/en/members/) membership (per year) +19. [EU](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:EU_enlargements) + membership (including EU12, EU15, EU25, EU27, EU27_2007, EU28) +20. [EEA](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:European_Economic_Area_(EEA)) + membership +21. [Schengen](https://en.wikipedia.org/wiki/Schengen_Area) region +22. [Cecilia](https://cecilia2050.eu/system/files/De%20Koning%20et%20al.%20%282014%29_Scenarios%20for%202050_0.pdf) + 2050 classification +23. [APEC](https://en.wikipedia.org/wiki/Asia-Pacific_Economic_Cooperation) +24. [BRIC](https://en.wikipedia.org/wiki/BRIC) +25. [BASIC](https://en.wikipedia.org/wiki/BASIC_countries) +26. [CIS](https://en.wikipedia.org/wiki/Commonwealth_of_Independent_States) + (as by 2019, excl. Turkmenistan) +27. [G7](https://en.wikipedia.org/wiki/Group_of_Seven) +28. [G20](https://en.wikipedia.org/wiki/G20) (listing all EU member + states as individual members) +29. [FAOcode](http://www.fao.org/faostat/en/#definitions) (numeric) +30. [GBDcode](http://ghdx.healthdata.org/) (numeric - Global Burden of + Disease country codes) +31. [IEA](https://www.iea.org/countries) (World Energy Balances 2021) +32. [DACcode](https://www.oecd.org/dac/financing-sustainable-development/development-finance-standards/dacandcrscodelists.htm) + (numeric - OECD Development Assistance Committee) +33. [ccTLD](https://en.wikipedia.org/wiki/Country_code_top-level_domain) - country code top-level domains +34. [GWcode](https://www.tandfonline.com/doi/abs/10.1080/03050629908434958) - Gledisch & Ward numerical codes as published in https://www.andybeger.com/states/articles/statelists.html + + +Coco contains official recognised codes as well as non-standard codes +for disputed or dissolved countries. To restrict the set to only the +official recognized UN members or include obsolete countries, pass + +``` python +import country_converter as coco +cc = coco.CountryConverter() +cc_UN = coco.CountryConverter(only_UNmember=True) +cc_all = coco.CountryConverter(include_obsolete=True) + +cc.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') +cc_UN.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') +cc_all.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') +``` + +cc results in \['Palestine', 'Kosovo', 'not found', 'France'\], whereas +cc_UN converts to \['not found', 'not found', 'not found', 'France'\] +and cc_all converts to \['Palestine', 'Kosovo', 'Zanzibar', 'France'\] +Note that the underlying dataframe is available at the attribute .data +(e.g. cc_all.data). + +## Data sources and further reading + +Most of the underlying data can be found in Wikipedia, the page +describing [ISO 3166-1](https://en.wikipedia.org/wiki/ISO_3166-1) is a +good starting point. UN regions/codes are given on the United Nation +Statistical Division +([unstats](http://unstats.un.org/unsd/methods/m49/m49regin.htm)) +webpage. The differences between the ISO numeric and UN (M.49) codes are +[also explained at wikipedia](https://en.wikipedia.org/wiki/UN_M.49). +[EXIOBASE](http://exiobase.eu/), [WIOD](http://www.wiod.org/home) and +[Eora](http://www.worldmrio.com/) classification were extracted from the +respective databases. For [Eora](http://www.worldmrio.com/), the names +are based on the 'Country names' csv file provided on the webpage, but +updated for different names used in the Eora26 database. The MESSAGE +classification follows the 11-region aggregation given in the +[MESSAGE](http://www.iiasa.ac.at/web/home/research/researchPrograms/Energy/MESSAGE-model-regions.en.html) +model regions description. The +[IMAGE](https://models.pbl.nl/image/index.php/Welcome_to_IMAGE_3.0_Documentation) +classification is based on the "[region classification +map](https://models.pbl.nl/image/index.php/Region_classification_map)", +for +[REMIND](https://www.pik-potsdam.de/en/institute/departments/transformation-pathways/models/remind) +we received a country mapping from the model developers. + +The membership of +[OECD](http://www.oecd.org/about/membersandpartners/list-oecd-member-countries.htm) +and [UN](http://www.un.org/en/members/) can be found at the membership +organisations' webpages, information about obsolete country codes on the +[Statoids](http://www.statoids.com/w3166his.html) webpage. + +The situation for the +[EU](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:EU_enlargements) +got complicated due to the Brexit process. For the naming, coco follows +the [Eurostat +glossary](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:EU_enlargements), +thus EU27 refers to the EU without UK, whereas EU27_2007 refers to the +EU without Croatia (the status after the 2007 enlargement). The shortcut +EU always links to the most recent classification. The +[EEA](https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:European_Economic_Area_(EEA)) +agreements for the UK ended by 2021-01-01 (which also affects Guernsey, Isle of Man, Jersey and Gibraltar). +Switzerland is not part of the EEA but member of the single market. + +The Global Burden of Disease country codes were extracted form the [GBD +code book available +here.](https://ghdx.healthdata.org/sites/default/files/ihme_query_tool/IHME_GBD_2019_CODEBOOK.zip) + +## Communication, issues, bugs and enhancements + +Please use the issue tracker for documenting bugs, proposing +enhancements and all other communication related to coco. + +You can follow me on [mastodon - @kst@qoto.org](https://qoto.org/@kst) and [twitter](https://twitter.com/kst_stadler) to get +the latest news about all my open-source and research projects (and +occasionally some random retweets/toots). + +## Contributing + +Want to contribute? Great! Please check +[CONTRIBUTING.md](CONTRIBUTING.md) if you want to help to improve coco +and for some pointer for how to add classifications. + +## Related software + +The package [pycountry](https://pypi.python.org/pypi/pycountry) provides +access to the official ISO databases for historic countries, country +subdivisions, languages and currencies. In case you need to convert +non-English country names, +[countrynames](https://github.com/occrp/countrynames) includes an +extensive database of country names in different languages and functions +to convert them to the different ISO 3166 standards. +[Python-iso3166](https://github.com/deactivated/python-iso3166) focuses +on conversion between the two-letter, three-letter and three-digit codes +defined in the ISO 3166 standard. + +If you are using R, you should have a look at +[countrycode](https://github.com/vincentarelbundock/countrycode). + +## Citing the country converter + +Version 0.5 of the country converter was published in the [Journal of +Open Source Software](http://joss.theoj.org/). To cite the country +converter in publication please use: + +Stadler, K. (2017). The country converter coco - a Python package for +converting country names between different classification schemes. The +Journal of Open Source Software. doi: +[10.21105/joss.00332](http://dx.doi.org/10.21105/joss.00332) + +For the full bibtex key see [CITATION](CITATION) + +## Acknowledgements + +This package was inspired by (and the regular expression are mostly +based on) the R-package +[countrycode](https://github.com/vincentarelbundock/countrycode) by +[Vincent Arel-Bundock](http://arelbundock.com/) and his (defunct) port +to Python (pycountrycode). Many thanks to [Robert +Gieseke](https://github.com/rgieseke) for the review of the source code +and paper for the publication in the [Journal of Open Source +Software](http://joss.theoj.org/). + + +%prep +%autosetup -n country-converter-1.0.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-country-converter -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Mon Apr 10 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.0-1 +- Package Spec generated |
