%global _empty_manifest_terminate_build 0 Name: python-RTFDE Version: 0.0.2 Release: 1 Summary: A library for extracting HTML content from RTF encapsulated HTML as commonly found in the exchange MSG email format. License: GNU Lesser General Public License v3 (LGPLv3) URL: https://github.com/seamustuohy/RTFDE Source0: https://mirrors.nju.edu.cn/pypi/web/packages/81/ea/28f5ab6b46a072887c8c8fd8c8a1f7b54025fc4bb2e09024668ea6686044/RTFDE-0.0.2.tar.gz BuildArch: noarch Requires: python3-lark-parser Requires: python3-oletools Requires: python3-lxml Requires: python3-extract-msg %description # RTFDE: RTF De-Encapsulator A python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files. De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content. # Features - De-encapsulate HTML from RTF encapsulated HTML. - De-encapsulate plain text from RTF encapsulated text. # Known Issues - This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped. - This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML. # Anti-Features (I don't intend to have this library do this.) - Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library. # Installation **To install from the pip package.** ``` pip3 install RTFDE ``` # Usage ## De-encapsulating HTML or TEXT ```python from RTFDE.deencapsulate import DeEncapsulator with open('rtf_file', 'r') as fp: raw_rtf = fp.read() rtf_obj = DeEncapsulator(raw_rtf) rtf_obj.deencapsulate() if rtf_obj.content_type == 'html': print(rtf_obj.html) else: print(rtf_obj.text) ``` # Contribute Please check the [contributing guidelines](./CONTRIBUTING.md) # License Please see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github. %package -n python3-RTFDE Summary: A library for extracting HTML content from RTF encapsulated HTML as commonly found in the exchange MSG email format. Provides: python-RTFDE BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-RTFDE # RTFDE: RTF De-Encapsulator A python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files. De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content. # Features - De-encapsulate HTML from RTF encapsulated HTML. - De-encapsulate plain text from RTF encapsulated text. # Known Issues - This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped. - This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML. # Anti-Features (I don't intend to have this library do this.) - Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library. # Installation **To install from the pip package.** ``` pip3 install RTFDE ``` # Usage ## De-encapsulating HTML or TEXT ```python from RTFDE.deencapsulate import DeEncapsulator with open('rtf_file', 'r') as fp: raw_rtf = fp.read() rtf_obj = DeEncapsulator(raw_rtf) rtf_obj.deencapsulate() if rtf_obj.content_type == 'html': print(rtf_obj.html) else: print(rtf_obj.text) ``` # Contribute Please check the [contributing guidelines](./CONTRIBUTING.md) # License Please see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github. %package help Summary: Development documents and examples for RTFDE Provides: python3-RTFDE-doc %description help # RTFDE: RTF De-Encapsulator A python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files. De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content. # Features - De-encapsulate HTML from RTF encapsulated HTML. - De-encapsulate plain text from RTF encapsulated text. # Known Issues - This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped. - This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML. # Anti-Features (I don't intend to have this library do this.) - Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library. # Installation **To install from the pip package.** ``` pip3 install RTFDE ``` # Usage ## De-encapsulating HTML or TEXT ```python from RTFDE.deencapsulate import DeEncapsulator with open('rtf_file', 'r') as fp: raw_rtf = fp.read() rtf_obj = DeEncapsulator(raw_rtf) rtf_obj.deencapsulate() if rtf_obj.content_type == 'html': print(rtf_obj.html) else: print(rtf_obj.text) ``` # Contribute Please check the [contributing guidelines](./CONTRIBUTING.md) # License Please see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github. %prep %autosetup -n RTFDE-0.0.2 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-RTFDE -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Fri May 05 2023 Python_Bot - 0.0.2-1 - Package Spec generated