diff options
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-rtfde.spec | 241 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 243 insertions, 0 deletions
@@ -0,0 +1 @@ +/RTFDE-0.0.2.tar.gz diff --git a/python-rtfde.spec b/python-rtfde.spec new file mode 100644 index 0000000..c2e4eeb --- /dev/null +++ b/python-rtfde.spec @@ -0,0 +1,241 @@ +%global _empty_manifest_terminate_build 0 +Name: python-RTFDE +Version: 0.0.2 +Release: 1 +Summary: A library for extracting HTML content from RTF encapsulated HTML as commonly found in the exchange MSG email format. +License: GNU Lesser General Public License v3 (LGPLv3) +URL: https://github.com/seamustuohy/RTFDE +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/81/ea/28f5ab6b46a072887c8c8fd8c8a1f7b54025fc4bb2e09024668ea6686044/RTFDE-0.0.2.tar.gz +BuildArch: noarch + +Requires: python3-lark-parser +Requires: python3-oletools +Requires: python3-lxml +Requires: python3-extract-msg + +%description +# RTFDE: RTF De-Encapsulator + +A python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files. + +De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content. + +# Features + +- De-encapsulate HTML from RTF encapsulated HTML. +- De-encapsulate plain text from RTF encapsulated text. + +# Known Issues + +- This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped. +- This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML. + +# Anti-Features (I don't intend to have this library do this.) + +- Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library. + +# Installation + +**To install from the pip package.** + +``` +pip3 install RTFDE + +``` + +# Usage + +## De-encapsulating HTML or TEXT + +```python +from RTFDE.deencapsulate import DeEncapsulator + +with open('rtf_file', 'r') as fp: + raw_rtf = fp.read() + rtf_obj = DeEncapsulator(raw_rtf) + rtf_obj.deencapsulate() + if rtf_obj.content_type == 'html': + print(rtf_obj.html) + else: + print(rtf_obj.text) +``` + +# Contribute + +Please check the [contributing guidelines](./CONTRIBUTING.md) + +# License + +Please see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github. + + + + +%package -n python3-RTFDE +Summary: A library for extracting HTML content from RTF encapsulated HTML as commonly found in the exchange MSG email format. +Provides: python-RTFDE +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-RTFDE +# RTFDE: RTF De-Encapsulator + +A python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files. + +De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content. + +# Features + +- De-encapsulate HTML from RTF encapsulated HTML. +- De-encapsulate plain text from RTF encapsulated text. + +# Known Issues + +- This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped. +- This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML. + +# Anti-Features (I don't intend to have this library do this.) + +- Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library. + +# Installation + +**To install from the pip package.** + +``` +pip3 install RTFDE + +``` + +# Usage + +## De-encapsulating HTML or TEXT + +```python +from RTFDE.deencapsulate import DeEncapsulator + +with open('rtf_file', 'r') as fp: + raw_rtf = fp.read() + rtf_obj = DeEncapsulator(raw_rtf) + rtf_obj.deencapsulate() + if rtf_obj.content_type == 'html': + print(rtf_obj.html) + else: + print(rtf_obj.text) +``` + +# Contribute + +Please check the [contributing guidelines](./CONTRIBUTING.md) + +# License + +Please see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github. + + + + +%package help +Summary: Development documents and examples for RTFDE +Provides: python3-RTFDE-doc +%description help +# RTFDE: RTF De-Encapsulator + +A python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files. + +De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content. + +# Features + +- De-encapsulate HTML from RTF encapsulated HTML. +- De-encapsulate plain text from RTF encapsulated text. + +# Known Issues + +- This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped. +- This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML. + +# Anti-Features (I don't intend to have this library do this.) + +- Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library. + +# Installation + +**To install from the pip package.** + +``` +pip3 install RTFDE + +``` + +# Usage + +## De-encapsulating HTML or TEXT + +```python +from RTFDE.deencapsulate import DeEncapsulator + +with open('rtf_file', 'r') as fp: + raw_rtf = fp.read() + rtf_obj = DeEncapsulator(raw_rtf) + rtf_obj.deencapsulate() + if rtf_obj.content_type == 'html': + print(rtf_obj.html) + else: + print(rtf_obj.text) +``` + +# Contribute + +Please check the [contributing guidelines](./CONTRIBUTING.md) + +# License + +Please see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github. + + + + +%prep +%autosetup -n RTFDE-0.0.2 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-RTFDE -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.2-1 +- Package Spec generated @@ -0,0 +1 @@ +8ea48d10c9dd11b3d2eba95082126b9c RTFDE-0.0.2.tar.gz |