summaryrefslogtreecommitdiff
path: root/python-postq.spec
blob: c659ba5087ae397bdd932e5e69681df133958d4c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
%global _empty_manifest_terminate_build 0
Name:		python-postq
Version:	0.2.8
Release:	1
Summary:	Job queue with DAG workflows, PostgreSQL backend, and choice of job executors.
License:	MPL 2.0
URL:		https://github.com/kruxia/postq
Source0:	https://mirrors.aliyun.com/pypi/web/packages/57/5e/39520a3629305167ea9ec4d7dce4504ac50eaba00c98d9d116d881de86ea/postq-0.2.8.tar.gz
BuildArch:	noarch


%description
# PostQ = Cloud-Native Job Queue and DAG Workflow System

PostQ is a job queue system with 

* workflows that are directed acyclic graphs, with tasks that depend on other tasks
* parallel task execution
* shared files among tasks
* a PostgreSQL database backend
* choice of task executors: {shell, docker, [coming soon: kubernetes]}
* easy on-ramp for developers: `git pull https://github.com/kruxia/postq; cd postq; docker-compose up` and you're running PostQ

## Features 

* **A PostQ Job Workflow is a DAG (Directed Acyclic Graph) of Tasks.** 

    Many existing job queue systems define jobs as single tasks, so it's up to the user to define more complex workflows. But many workflows (like CI/CD pipelines, and data applications) need to be able to define workflows at a higher level as a DAG of tasks, in which a given task might depend on earlier tasks that must first be completed, and which might be run in parallel with other tasks in the workflow.

    PostQ defines Job workflows as a DAG of tasks. For each named task, you list the other tasks that must be completed first, and PostQ will work out (using snazzy graph calculations) the simplest and most direct version of the workflow (i.e., the _transitive reduction_ of the graph). It runs the tasks in the order indicated by the graph of their dependencies, and finishes when all tasks have been either completed or cancelled due to a preceding failure.

* **Workflow Tasks Are Executed in Parallel.**

    When a PostQ Job is started, it begins by launching all the tasks that don't depend on other tasks. Then, as each task finishes, it launches all additional tasks for which the predecessors have been successfully completed. 
    
    At any given time, there might be many tasks in a Job running at the same time on different processors. <!-- (and soon, using Kubernetes, on different machines). --> The more you break down your workflows into tasks that can happen in parallel, the more CPUs your tasks can utilize, and the more quickly your jobs can be completed, limited only by the available resources.

* **Tasks in a Job Workflow Can Share Files.**

    For workflows that process large amounts of data that is stored in files, it's important to be able to share these files among all the tasks in a workflow. PostQ creates shared temporary file storage for each job, and each task is run with that directory as the current working directory. 
    
    So, for example, you can start your workflow with a task that pulls files from permanent storage, then other tasks can process the data in those files, create other files, etc. Then, at the end of the work, the files that need to be saved as artifacts of the job can be pushed to permanent storage. 

* **A PostgreSQL Database Is the (Default) Job Queue.** 

    PostgreSQL provides persistence and ACID transaction guarantees. It is the simplest way to ensure that a job is not lost, but is processed exactly once. PostgreSQL is also already running in many web and microservice application clusters, so building on Postgres enables developers to easily add a Job Queue to their application without substantially increasing the necessary complexity of their application. PostgreSQL combines excellent speed with fantastic reliability, durability, and transactional guarantees. 

* **The Docker Executor Runs each Task in a Container Using any Image.** 

    Many existing task queue systems assume that the programming environment in which the queue worker is written is available for the execution of each task. For example, Celery tasks are written and run in python. 
    
    Instead, PostQ has the ability to run tasks in separate containers. This enables a task to use any software, not just the software that is available in the queue worker system.

    (Author's Note: This was one of the primary motivations for writing PostQ. I am building an application that has workflows with tasks requiring NodeJS, or Java, or Python, or Chromium. It's possible to build an image that includes all of these requirements — and weighs in over a gigabyte! It's much more maintainable to separate the different task programs into different images, with each image including only the software it needs to complete its task.)

* **Easy On-ramp for Developers.**
    ```bash
    git pull https://github.com/kruxia/postq.git
    cd postq
    docker-compose up
    ```
    The default docker-compose.yml cluster definition uses the docker executor (so tasks must define an image) with a maximum queue sleep time of 5 seconds and the default qname=''. Note that the default cluster doesn't expose any ports to the outside world, but you can for example shell into the running cluster (using a second terminal) and start pushing tasks into the queue. Or, the more common case is that your PostgreSQL instance is available inside your application cluster, so you can push jobs into postq directly from your application. 

<!-- * [TODO] **Can use a message broker as the Job Queue.** Applications that need higher performance and throughput than PostgreSQL can provide must be able to shift up to something more performant. For example, RabbitMQ is a very high-performance message broker written in Erlang.

* [TODO] **Can run (persistent) Task workers.** Some Tasks or Task environments (images) are anticipated as being needed continually. In such job environments, the Task workers can be made persistent services that listen to the Job queue for their own Jobs. (In essence, this allows a Task to be a complete sub-workflow being handled by its own Workflow Job queue workers, in which the Tasks are enabled to run inside the Job worker container as subprocesses.) -->

## Usage Examples
    
Here is an example in Python using the running postq container itself. The Python stack is [Databases](https://encode.io/databases), [SQL Alchemy Core](https://docs.sqlalchemy.org/en/13/core/), and data models written in [Pydantic](https://pydantic-docs.helpmanual.io/):

```bash
$ docker-compose exec postq ipython
```

```python
# (Using the ipython shell, which allows async/await without an explicit event loop.)
import os
import time
import asyncpg
from postq import models

queue = models.Queue(qname='playq')
database = await asyncpg.create_pool(dsn=os.getenv('DATABASE_URL'))
connection = await database.acquire()
job = models.Job(
    tasks={'a': {'command': 'echo Hey!', 'params': {'image': 'debian:bullseye-slim'}}}
)
job.update(
    **await database.fetchrow(
        *queue.put(job)
    )
)

# Then, wait a few seconds...
time.sleep(5)

joblog = models.Job(
    **await connection.fetchrow(
        *queue.get_log(id=job.id)
    )
)

print(joblog.tasks['a'].results)  # Hey!
```
Now you have a job log entry with the output of your command in the task results. :tada:




%package -n python3-postq
Summary:	Job queue with DAG workflows, PostgreSQL backend, and choice of job executors.
Provides:	python-postq
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-postq
# PostQ = Cloud-Native Job Queue and DAG Workflow System

PostQ is a job queue system with 

* workflows that are directed acyclic graphs, with tasks that depend on other tasks
* parallel task execution
* shared files among tasks
* a PostgreSQL database backend
* choice of task executors: {shell, docker, [coming soon: kubernetes]}
* easy on-ramp for developers: `git pull https://github.com/kruxia/postq; cd postq; docker-compose up` and you're running PostQ

## Features 

* **A PostQ Job Workflow is a DAG (Directed Acyclic Graph) of Tasks.** 

    Many existing job queue systems define jobs as single tasks, so it's up to the user to define more complex workflows. But many workflows (like CI/CD pipelines, and data applications) need to be able to define workflows at a higher level as a DAG of tasks, in which a given task might depend on earlier tasks that must first be completed, and which might be run in parallel with other tasks in the workflow.

    PostQ defines Job workflows as a DAG of tasks. For each named task, you list the other tasks that must be completed first, and PostQ will work out (using snazzy graph calculations) the simplest and most direct version of the workflow (i.e., the _transitive reduction_ of the graph). It runs the tasks in the order indicated by the graph of their dependencies, and finishes when all tasks have been either completed or cancelled due to a preceding failure.

* **Workflow Tasks Are Executed in Parallel.**

    When a PostQ Job is started, it begins by launching all the tasks that don't depend on other tasks. Then, as each task finishes, it launches all additional tasks for which the predecessors have been successfully completed. 
    
    At any given time, there might be many tasks in a Job running at the same time on different processors. <!-- (and soon, using Kubernetes, on different machines). --> The more you break down your workflows into tasks that can happen in parallel, the more CPUs your tasks can utilize, and the more quickly your jobs can be completed, limited only by the available resources.

* **Tasks in a Job Workflow Can Share Files.**

    For workflows that process large amounts of data that is stored in files, it's important to be able to share these files among all the tasks in a workflow. PostQ creates shared temporary file storage for each job, and each task is run with that directory as the current working directory. 
    
    So, for example, you can start your workflow with a task that pulls files from permanent storage, then other tasks can process the data in those files, create other files, etc. Then, at the end of the work, the files that need to be saved as artifacts of the job can be pushed to permanent storage. 

* **A PostgreSQL Database Is the (Default) Job Queue.** 

    PostgreSQL provides persistence and ACID transaction guarantees. It is the simplest way to ensure that a job is not lost, but is processed exactly once. PostgreSQL is also already running in many web and microservice application clusters, so building on Postgres enables developers to easily add a Job Queue to their application without substantially increasing the necessary complexity of their application. PostgreSQL combines excellent speed with fantastic reliability, durability, and transactional guarantees. 

* **The Docker Executor Runs each Task in a Container Using any Image.** 

    Many existing task queue systems assume that the programming environment in which the queue worker is written is available for the execution of each task. For example, Celery tasks are written and run in python. 
    
    Instead, PostQ has the ability to run tasks in separate containers. This enables a task to use any software, not just the software that is available in the queue worker system.

    (Author's Note: This was one of the primary motivations for writing PostQ. I am building an application that has workflows with tasks requiring NodeJS, or Java, or Python, or Chromium. It's possible to build an image that includes all of these requirements — and weighs in over a gigabyte! It's much more maintainable to separate the different task programs into different images, with each image including only the software it needs to complete its task.)

* **Easy On-ramp for Developers.**
    ```bash
    git pull https://github.com/kruxia/postq.git
    cd postq
    docker-compose up
    ```
    The default docker-compose.yml cluster definition uses the docker executor (so tasks must define an image) with a maximum queue sleep time of 5 seconds and the default qname=''. Note that the default cluster doesn't expose any ports to the outside world, but you can for example shell into the running cluster (using a second terminal) and start pushing tasks into the queue. Or, the more common case is that your PostgreSQL instance is available inside your application cluster, so you can push jobs into postq directly from your application. 

<!-- * [TODO] **Can use a message broker as the Job Queue.** Applications that need higher performance and throughput than PostgreSQL can provide must be able to shift up to something more performant. For example, RabbitMQ is a very high-performance message broker written in Erlang.

* [TODO] **Can run (persistent) Task workers.** Some Tasks or Task environments (images) are anticipated as being needed continually. In such job environments, the Task workers can be made persistent services that listen to the Job queue for their own Jobs. (In essence, this allows a Task to be a complete sub-workflow being handled by its own Workflow Job queue workers, in which the Tasks are enabled to run inside the Job worker container as subprocesses.) -->

## Usage Examples
    
Here is an example in Python using the running postq container itself. The Python stack is [Databases](https://encode.io/databases), [SQL Alchemy Core](https://docs.sqlalchemy.org/en/13/core/), and data models written in [Pydantic](https://pydantic-docs.helpmanual.io/):

```bash
$ docker-compose exec postq ipython
```

```python
# (Using the ipython shell, which allows async/await without an explicit event loop.)
import os
import time
import asyncpg
from postq import models

queue = models.Queue(qname='playq')
database = await asyncpg.create_pool(dsn=os.getenv('DATABASE_URL'))
connection = await database.acquire()
job = models.Job(
    tasks={'a': {'command': 'echo Hey!', 'params': {'image': 'debian:bullseye-slim'}}}
)
job.update(
    **await database.fetchrow(
        *queue.put(job)
    )
)

# Then, wait a few seconds...
time.sleep(5)

joblog = models.Job(
    **await connection.fetchrow(
        *queue.get_log(id=job.id)
    )
)

print(joblog.tasks['a'].results)  # Hey!
```
Now you have a job log entry with the output of your command in the task results. :tada:




%package help
Summary:	Development documents and examples for postq
Provides:	python3-postq-doc
%description help
# PostQ = Cloud-Native Job Queue and DAG Workflow System

PostQ is a job queue system with 

* workflows that are directed acyclic graphs, with tasks that depend on other tasks
* parallel task execution
* shared files among tasks
* a PostgreSQL database backend
* choice of task executors: {shell, docker, [coming soon: kubernetes]}
* easy on-ramp for developers: `git pull https://github.com/kruxia/postq; cd postq; docker-compose up` and you're running PostQ

## Features 

* **A PostQ Job Workflow is a DAG (Directed Acyclic Graph) of Tasks.** 

    Many existing job queue systems define jobs as single tasks, so it's up to the user to define more complex workflows. But many workflows (like CI/CD pipelines, and data applications) need to be able to define workflows at a higher level as a DAG of tasks, in which a given task might depend on earlier tasks that must first be completed, and which might be run in parallel with other tasks in the workflow.

    PostQ defines Job workflows as a DAG of tasks. For each named task, you list the other tasks that must be completed first, and PostQ will work out (using snazzy graph calculations) the simplest and most direct version of the workflow (i.e., the _transitive reduction_ of the graph). It runs the tasks in the order indicated by the graph of their dependencies, and finishes when all tasks have been either completed or cancelled due to a preceding failure.

* **Workflow Tasks Are Executed in Parallel.**

    When a PostQ Job is started, it begins by launching all the tasks that don't depend on other tasks. Then, as each task finishes, it launches all additional tasks for which the predecessors have been successfully completed. 
    
    At any given time, there might be many tasks in a Job running at the same time on different processors. <!-- (and soon, using Kubernetes, on different machines). --> The more you break down your workflows into tasks that can happen in parallel, the more CPUs your tasks can utilize, and the more quickly your jobs can be completed, limited only by the available resources.

* **Tasks in a Job Workflow Can Share Files.**

    For workflows that process large amounts of data that is stored in files, it's important to be able to share these files among all the tasks in a workflow. PostQ creates shared temporary file storage for each job, and each task is run with that directory as the current working directory. 
    
    So, for example, you can start your workflow with a task that pulls files from permanent storage, then other tasks can process the data in those files, create other files, etc. Then, at the end of the work, the files that need to be saved as artifacts of the job can be pushed to permanent storage. 

* **A PostgreSQL Database Is the (Default) Job Queue.** 

    PostgreSQL provides persistence and ACID transaction guarantees. It is the simplest way to ensure that a job is not lost, but is processed exactly once. PostgreSQL is also already running in many web and microservice application clusters, so building on Postgres enables developers to easily add a Job Queue to their application without substantially increasing the necessary complexity of their application. PostgreSQL combines excellent speed with fantastic reliability, durability, and transactional guarantees. 

* **The Docker Executor Runs each Task in a Container Using any Image.** 

    Many existing task queue systems assume that the programming environment in which the queue worker is written is available for the execution of each task. For example, Celery tasks are written and run in python. 
    
    Instead, PostQ has the ability to run tasks in separate containers. This enables a task to use any software, not just the software that is available in the queue worker system.

    (Author's Note: This was one of the primary motivations for writing PostQ. I am building an application that has workflows with tasks requiring NodeJS, or Java, or Python, or Chromium. It's possible to build an image that includes all of these requirements — and weighs in over a gigabyte! It's much more maintainable to separate the different task programs into different images, with each image including only the software it needs to complete its task.)

* **Easy On-ramp for Developers.**
    ```bash
    git pull https://github.com/kruxia/postq.git
    cd postq
    docker-compose up
    ```
    The default docker-compose.yml cluster definition uses the docker executor (so tasks must define an image) with a maximum queue sleep time of 5 seconds and the default qname=''. Note that the default cluster doesn't expose any ports to the outside world, but you can for example shell into the running cluster (using a second terminal) and start pushing tasks into the queue. Or, the more common case is that your PostgreSQL instance is available inside your application cluster, so you can push jobs into postq directly from your application. 

<!-- * [TODO] **Can use a message broker as the Job Queue.** Applications that need higher performance and throughput than PostgreSQL can provide must be able to shift up to something more performant. For example, RabbitMQ is a very high-performance message broker written in Erlang.

* [TODO] **Can run (persistent) Task workers.** Some Tasks or Task environments (images) are anticipated as being needed continually. In such job environments, the Task workers can be made persistent services that listen to the Job queue for their own Jobs. (In essence, this allows a Task to be a complete sub-workflow being handled by its own Workflow Job queue workers, in which the Tasks are enabled to run inside the Job worker container as subprocesses.) -->

## Usage Examples
    
Here is an example in Python using the running postq container itself. The Python stack is [Databases](https://encode.io/databases), [SQL Alchemy Core](https://docs.sqlalchemy.org/en/13/core/), and data models written in [Pydantic](https://pydantic-docs.helpmanual.io/):

```bash
$ docker-compose exec postq ipython
```

```python
# (Using the ipython shell, which allows async/await without an explicit event loop.)
import os
import time
import asyncpg
from postq import models

queue = models.Queue(qname='playq')
database = await asyncpg.create_pool(dsn=os.getenv('DATABASE_URL'))
connection = await database.acquire()
job = models.Job(
    tasks={'a': {'command': 'echo Hey!', 'params': {'image': 'debian:bullseye-slim'}}}
)
job.update(
    **await database.fetchrow(
        *queue.put(job)
    )
)

# Then, wait a few seconds...
time.sleep(5)

joblog = models.Job(
    **await connection.fetchrow(
        *queue.get_log(id=job.id)
    )
)

print(joblog.tasks['a'].results)  # Hey!
```
Now you have a job log entry with the output of your command in the task results. :tada:




%prep
%autosetup -n postq-0.2.8

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-postq -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Thu Jun 08 2023 Python_Bot <Python_Bot@openeuler.org> - 0.2.8-1
- Package Spec generated