summaryrefslogtreecommitdiff
path: root/python-rtfde.spec
diff options
context:
space:
mode:
Diffstat (limited to 'python-rtfde.spec')
-rw-r--r--python-rtfde.spec241
1 files changed, 241 insertions, 0 deletions
diff --git a/python-rtfde.spec b/python-rtfde.spec
new file mode 100644
index 0000000..c2e4eeb
--- /dev/null
+++ b/python-rtfde.spec
@@ -0,0 +1,241 @@
+%global _empty_manifest_terminate_build 0
+Name: python-RTFDE
+Version: 0.0.2
+Release: 1
+Summary: A library for extracting HTML content from RTF encapsulated HTML as commonly found in the exchange MSG email format.
+License: GNU Lesser General Public License v3 (LGPLv3)
+URL: https://github.com/seamustuohy/RTFDE
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/81/ea/28f5ab6b46a072887c8c8fd8c8a1f7b54025fc4bb2e09024668ea6686044/RTFDE-0.0.2.tar.gz
+BuildArch: noarch
+
+Requires: python3-lark-parser
+Requires: python3-oletools
+Requires: python3-lxml
+Requires: python3-extract-msg
+
+%description
+# RTFDE: RTF De-Encapsulator
+
+A python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files.
+
+De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content.
+
+# Features
+
+- De-encapsulate HTML from RTF encapsulated HTML.
+- De-encapsulate plain text from RTF encapsulated text.
+
+# Known Issues
+
+- This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped.
+- This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML.
+
+# Anti-Features (I don't intend to have this library do this.)
+
+- Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library.
+
+# Installation
+
+**To install from the pip package.**
+
+```
+pip3 install RTFDE
+
+```
+
+# Usage
+
+## De-encapsulating HTML or TEXT
+
+```python
+from RTFDE.deencapsulate import DeEncapsulator
+
+with open('rtf_file', 'r') as fp:
+ raw_rtf = fp.read()
+ rtf_obj = DeEncapsulator(raw_rtf)
+ rtf_obj.deencapsulate()
+ if rtf_obj.content_type == 'html':
+ print(rtf_obj.html)
+ else:
+ print(rtf_obj.text)
+```
+
+# Contribute
+
+Please check the [contributing guidelines](./CONTRIBUTING.md)
+
+# License
+
+Please see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github.
+
+
+
+
+%package -n python3-RTFDE
+Summary: A library for extracting HTML content from RTF encapsulated HTML as commonly found in the exchange MSG email format.
+Provides: python-RTFDE
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-RTFDE
+# RTFDE: RTF De-Encapsulator
+
+A python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files.
+
+De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content.
+
+# Features
+
+- De-encapsulate HTML from RTF encapsulated HTML.
+- De-encapsulate plain text from RTF encapsulated text.
+
+# Known Issues
+
+- This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped.
+- This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML.
+
+# Anti-Features (I don't intend to have this library do this.)
+
+- Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library.
+
+# Installation
+
+**To install from the pip package.**
+
+```
+pip3 install RTFDE
+
+```
+
+# Usage
+
+## De-encapsulating HTML or TEXT
+
+```python
+from RTFDE.deencapsulate import DeEncapsulator
+
+with open('rtf_file', 'r') as fp:
+ raw_rtf = fp.read()
+ rtf_obj = DeEncapsulator(raw_rtf)
+ rtf_obj.deencapsulate()
+ if rtf_obj.content_type == 'html':
+ print(rtf_obj.html)
+ else:
+ print(rtf_obj.text)
+```
+
+# Contribute
+
+Please check the [contributing guidelines](./CONTRIBUTING.md)
+
+# License
+
+Please see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github.
+
+
+
+
+%package help
+Summary: Development documents and examples for RTFDE
+Provides: python3-RTFDE-doc
+%description help
+# RTFDE: RTF De-Encapsulator
+
+A python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files.
+
+De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content.
+
+# Features
+
+- De-encapsulate HTML from RTF encapsulated HTML.
+- De-encapsulate plain text from RTF encapsulated text.
+
+# Known Issues
+
+- This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped.
+- This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML.
+
+# Anti-Features (I don't intend to have this library do this.)
+
+- Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library.
+
+# Installation
+
+**To install from the pip package.**
+
+```
+pip3 install RTFDE
+
+```
+
+# Usage
+
+## De-encapsulating HTML or TEXT
+
+```python
+from RTFDE.deencapsulate import DeEncapsulator
+
+with open('rtf_file', 'r') as fp:
+ raw_rtf = fp.read()
+ rtf_obj = DeEncapsulator(raw_rtf)
+ rtf_obj.deencapsulate()
+ if rtf_obj.content_type == 'html':
+ print(rtf_obj.html)
+ else:
+ print(rtf_obj.text)
+```
+
+# Contribute
+
+Please check the [contributing guidelines](./CONTRIBUTING.md)
+
+# License
+
+Please see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github.
+
+
+
+
+%prep
+%autosetup -n RTFDE-0.0.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-RTFDE -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.2-1
+- Package Spec generated