diff options
author | CoprDistGit <infra@openeuler.org> | 2023-05-31 04:39:17 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-05-31 04:39:17 +0000 |
commit | 80c05996df2314a4e330904369fccfae432ade00 (patch) | |
tree | 2f5f70d2f8c4d8b0003b529a73befbf7ffcd20ec | |
parent | 7277f0caeddabde4b14c8c277c68643678a778c1 (diff) |
automatic import of python-datasette-block-robots
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-datasette-block-robots.spec | 490 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 492 insertions, 0 deletions
@@ -0,0 +1 @@ +/datasette-block-robots-1.1.tar.gz diff --git a/python-datasette-block-robots.spec b/python-datasette-block-robots.spec new file mode 100644 index 0000000..9396e26 --- /dev/null +++ b/python-datasette-block-robots.spec @@ -0,0 +1,490 @@ +%global _empty_manifest_terminate_build 0 +Name: python-datasette-block-robots +Version: 1.1 +Release: 1 +Summary: Datasette plugin that blocks all robots using robots.txt +License: Apache License, Version 2.0 +URL: https://github.com/simonw/datasette-block-robots +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/00/9b/983c94277d304381ee875500e8146e3d3c1456e01cd584c81e3bbcccf1c7/datasette-block-robots-1.1.tar.gz +BuildArch: noarch + +Requires: python3-datasette +Requires: python3-pytest +Requires: python3-pytest-asyncio +Requires: python3-httpx + +%description +# datasette-block-robots + +[](https://pypi.org/project/datasette-block-robots/) +[](https://github.com/simonw/datasette-block-robots/releases) +[](https://github.com/simonw/datasette-block-robots/blob/master/LICENSE) + +Datasette plugin that blocks robots and crawlers using robots.txt + +## Installation + +Install this plugin in the same environment as Datasette. + + $ pip install datasette-block-robots + +## Usage + +Having installed the plugin, `/robots.txt` on your Datasette instance will return the following: + + User-agent: * + Disallow: / + +This will request all robots and crawlers not to visit any of the pages on your site. + +Here's a demo of the plugin in action: https://sqlite-generate-demo.datasette.io/robots.txt + +## Configuration + +By default the plugin will block all access to the site, using `Disallow: /`. + +If you want the index page to be indexed by search engines without crawling the database, table or row pages themselves, you can use the following: + +```json +{ + "plugins": { + "datasette-block-robots": { + "allow_only_index": true + } + } +} +``` +This will return a `/robots.txt` like so: + + User-agent: * + Disallow: /db1 + Disallow: /db2 + +With a `Disallow` line for every attached database. + +To block access to specific areas of the site using custom paths, add this to your `metadata.json` configuration file: + +```json +{ + "plugins": { + "datasette-block-robots": { + "disallow": ["/mydatabase/mytable"] + } + } +} +``` +This will result in a `/robots.txt` that looks like this: + + User-agent: * + Disallow: /mydatabase/mytable + +Alternatively you can set the full contents of the `robots.txt` file using the `literal` configuration option. Here's how to do that if you are using YAML rather than JSON and have a `metadata.yml` file: + +```yaml +plugins: + datasette-block-robots: + literal: |- + User-agent: * + Disallow: / + User-agent: Bingbot + User-agent: Googlebot + Disallow: +``` +This example would block all crawlers with the exception of Googlebot and Bingbot, which are allowed to crawl the entire site. + +## Extending this with other plugins + +This plugin adds a new [plugin hook](https://docs.datasette.io/en/stable/plugin_hooks.html) to Datasete called `block_robots_extra_lines()` which can be used by other plugins to add their own additional lines to the `robots.txt` file. + +The hook can optionally accept these parameters: + +- `datasette`: The current [Datasette instance](https://docs.datasette.io/en/stable/internals.html#datasette-class). You can use this to execute SQL queries or read plugin configuration settings. +- `request`: The [Request object](https://docs.datasette.io/en/stable/internals.html#request-object) representing the incoming request to `/robots.txt`. + +The hook should return a list of strings, each representing a line to be added to the `robots.txt` file. + +It can also return an `async def` function, which will be awaited and used to generate a list of lines. Use this option if you need to make `await` calls inside you hook implementation. + +This example uses the hook to add a `Sitemap: http://example.com/sitemap.xml` line to the `robots.txt` file: + +```python +from datasette import hookimpl + +@hookimpl +def block_robots_extra_lines(datasette, request): + return [ + "Sitemap: {}".format(datasette.absolute_url(request, "/sitemap.xml")), + ] +``` +This example blocks access to paths based on a database query: + +```python +@hookimpl +def block_robots_extra_lines(datasette): + async def inner(): + db = datasette.get_database() + result = await db.execute("select path from mytable") + return [ + "Disallow: /{}".format(row["path"]) for row in result + ] + return inner +``` +[datasette-sitemap](https://datasette.io/plugins/datasette-sitemap) is an example of a plugin that uses this hook. + +## Development + +To set up this plugin locally, first checkout the code. Then create a new virtual environment: + + cd datasette-block-robots + python3 -mvenv venv + source venv/bin/activate + +Or if you are using `pipenv`: + + pipenv shell + +Now install the dependencies and tests: + + pip install -e '.[test]' + +To run the tests: + + pytest + + + + +%package -n python3-datasette-block-robots +Summary: Datasette plugin that blocks all robots using robots.txt +Provides: python-datasette-block-robots +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-datasette-block-robots +# datasette-block-robots + +[](https://pypi.org/project/datasette-block-robots/) +[](https://github.com/simonw/datasette-block-robots/releases) +[](https://github.com/simonw/datasette-block-robots/blob/master/LICENSE) + +Datasette plugin that blocks robots and crawlers using robots.txt + +## Installation + +Install this plugin in the same environment as Datasette. + + $ pip install datasette-block-robots + +## Usage + +Having installed the plugin, `/robots.txt` on your Datasette instance will return the following: + + User-agent: * + Disallow: / + +This will request all robots and crawlers not to visit any of the pages on your site. + +Here's a demo of the plugin in action: https://sqlite-generate-demo.datasette.io/robots.txt + +## Configuration + +By default the plugin will block all access to the site, using `Disallow: /`. + +If you want the index page to be indexed by search engines without crawling the database, table or row pages themselves, you can use the following: + +```json +{ + "plugins": { + "datasette-block-robots": { + "allow_only_index": true + } + } +} +``` +This will return a `/robots.txt` like so: + + User-agent: * + Disallow: /db1 + Disallow: /db2 + +With a `Disallow` line for every attached database. + +To block access to specific areas of the site using custom paths, add this to your `metadata.json` configuration file: + +```json +{ + "plugins": { + "datasette-block-robots": { + "disallow": ["/mydatabase/mytable"] + } + } +} +``` +This will result in a `/robots.txt` that looks like this: + + User-agent: * + Disallow: /mydatabase/mytable + +Alternatively you can set the full contents of the `robots.txt` file using the `literal` configuration option. Here's how to do that if you are using YAML rather than JSON and have a `metadata.yml` file: + +```yaml +plugins: + datasette-block-robots: + literal: |- + User-agent: * + Disallow: / + User-agent: Bingbot + User-agent: Googlebot + Disallow: +``` +This example would block all crawlers with the exception of Googlebot and Bingbot, which are allowed to crawl the entire site. + +## Extending this with other plugins + +This plugin adds a new [plugin hook](https://docs.datasette.io/en/stable/plugin_hooks.html) to Datasete called `block_robots_extra_lines()` which can be used by other plugins to add their own additional lines to the `robots.txt` file. + +The hook can optionally accept these parameters: + +- `datasette`: The current [Datasette instance](https://docs.datasette.io/en/stable/internals.html#datasette-class). You can use this to execute SQL queries or read plugin configuration settings. +- `request`: The [Request object](https://docs.datasette.io/en/stable/internals.html#request-object) representing the incoming request to `/robots.txt`. + +The hook should return a list of strings, each representing a line to be added to the `robots.txt` file. + +It can also return an `async def` function, which will be awaited and used to generate a list of lines. Use this option if you need to make `await` calls inside you hook implementation. + +This example uses the hook to add a `Sitemap: http://example.com/sitemap.xml` line to the `robots.txt` file: + +```python +from datasette import hookimpl + +@hookimpl +def block_robots_extra_lines(datasette, request): + return [ + "Sitemap: {}".format(datasette.absolute_url(request, "/sitemap.xml")), + ] +``` +This example blocks access to paths based on a database query: + +```python +@hookimpl +def block_robots_extra_lines(datasette): + async def inner(): + db = datasette.get_database() + result = await db.execute("select path from mytable") + return [ + "Disallow: /{}".format(row["path"]) for row in result + ] + return inner +``` +[datasette-sitemap](https://datasette.io/plugins/datasette-sitemap) is an example of a plugin that uses this hook. + +## Development + +To set up this plugin locally, first checkout the code. Then create a new virtual environment: + + cd datasette-block-robots + python3 -mvenv venv + source venv/bin/activate + +Or if you are using `pipenv`: + + pipenv shell + +Now install the dependencies and tests: + + pip install -e '.[test]' + +To run the tests: + + pytest + + + + +%package help +Summary: Development documents and examples for datasette-block-robots +Provides: python3-datasette-block-robots-doc +%description help +# datasette-block-robots + +[](https://pypi.org/project/datasette-block-robots/) +[](https://github.com/simonw/datasette-block-robots/releases) +[](https://github.com/simonw/datasette-block-robots/blob/master/LICENSE) + +Datasette plugin that blocks robots and crawlers using robots.txt + +## Installation + +Install this plugin in the same environment as Datasette. + + $ pip install datasette-block-robots + +## Usage + +Having installed the plugin, `/robots.txt` on your Datasette instance will return the following: + + User-agent: * + Disallow: / + +This will request all robots and crawlers not to visit any of the pages on your site. + +Here's a demo of the plugin in action: https://sqlite-generate-demo.datasette.io/robots.txt + +## Configuration + +By default the plugin will block all access to the site, using `Disallow: /`. + +If you want the index page to be indexed by search engines without crawling the database, table or row pages themselves, you can use the following: + +```json +{ + "plugins": { + "datasette-block-robots": { + "allow_only_index": true + } + } +} +``` +This will return a `/robots.txt` like so: + + User-agent: * + Disallow: /db1 + Disallow: /db2 + +With a `Disallow` line for every attached database. + +To block access to specific areas of the site using custom paths, add this to your `metadata.json` configuration file: + +```json +{ + "plugins": { + "datasette-block-robots": { + "disallow": ["/mydatabase/mytable"] + } + } +} +``` +This will result in a `/robots.txt` that looks like this: + + User-agent: * + Disallow: /mydatabase/mytable + +Alternatively you can set the full contents of the `robots.txt` file using the `literal` configuration option. Here's how to do that if you are using YAML rather than JSON and have a `metadata.yml` file: + +```yaml +plugins: + datasette-block-robots: + literal: |- + User-agent: * + Disallow: / + User-agent: Bingbot + User-agent: Googlebot + Disallow: +``` +This example would block all crawlers with the exception of Googlebot and Bingbot, which are allowed to crawl the entire site. + +## Extending this with other plugins + +This plugin adds a new [plugin hook](https://docs.datasette.io/en/stable/plugin_hooks.html) to Datasete called `block_robots_extra_lines()` which can be used by other plugins to add their own additional lines to the `robots.txt` file. + +The hook can optionally accept these parameters: + +- `datasette`: The current [Datasette instance](https://docs.datasette.io/en/stable/internals.html#datasette-class). You can use this to execute SQL queries or read plugin configuration settings. +- `request`: The [Request object](https://docs.datasette.io/en/stable/internals.html#request-object) representing the incoming request to `/robots.txt`. + +The hook should return a list of strings, each representing a line to be added to the `robots.txt` file. + +It can also return an `async def` function, which will be awaited and used to generate a list of lines. Use this option if you need to make `await` calls inside you hook implementation. + +This example uses the hook to add a `Sitemap: http://example.com/sitemap.xml` line to the `robots.txt` file: + +```python +from datasette import hookimpl + +@hookimpl +def block_robots_extra_lines(datasette, request): + return [ + "Sitemap: {}".format(datasette.absolute_url(request, "/sitemap.xml")), + ] +``` +This example blocks access to paths based on a database query: + +```python +@hookimpl +def block_robots_extra_lines(datasette): + async def inner(): + db = datasette.get_database() + result = await db.execute("select path from mytable") + return [ + "Disallow: /{}".format(row["path"]) for row in result + ] + return inner +``` +[datasette-sitemap](https://datasette.io/plugins/datasette-sitemap) is an example of a plugin that uses this hook. + +## Development + +To set up this plugin locally, first checkout the code. Then create a new virtual environment: + + cd datasette-block-robots + python3 -mvenv venv + source venv/bin/activate + +Or if you are using `pipenv`: + + pipenv shell + +Now install the dependencies and tests: + + pip install -e '.[test]' + +To run the tests: + + pytest + + + + +%prep +%autosetup -n datasette-block-robots-1.1 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-datasette-block-robots -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed May 31 2023 Python_Bot <Python_Bot@openeuler.org> - 1.1-1 +- Package Spec generated @@ -0,0 +1 @@ +02d9fef22e47b885b0ec2082ca450a2d datasette-block-robots-1.1.tar.gz |