summaryrefslogtreecommitdiff
path: root/python-dbt-extractor.spec
blob: c21acd67f1c8c2f7aa3231ccd039f8b3ec4b38ea (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
%global _empty_manifest_terminate_build 0
Name:		python-dbt-extractor
Version:	0.4.1
Release:	1
Summary:	A tool to analyze and extract information from Jinja used in dbt projects.
License:	Apache-2.0
URL:		https://github.com/dbt-labs/dbt-parser-generator/
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/af/2e/a110b40212480fd02bff567ff84effea8b9937ccd6ebfad0f10a382183d2/dbt_extractor-0.4.1.tar.gz


%description

# dbt extractor

![demo app](demo/demo.gif)

This repository contains a tool that processes the most common jinja value templates in dbt model files. The tool depends on tree-sitter and the tree-sitter-jinja2 library.

# Strategy

The current strategy is for this processor to be 100% certain when it can accurately extract values from a given model file. Anything less than 100% certainty returns an exception so that the model can be rendered with python Jinja instead. 

There are two cases we want to avoid because they would risk correctness to user's projects:
1. Confidently extracting values that would not be extracted by python jinja (false positives)
2. Confidently extracting a set of values that are missing values that python jinja would have extracted. (misses)

If we instead error when we could have confidently extracted values, there is no correctness risk to the user. Only an opportunity to expand the rules to encompass this class of cases as well.

Even though jinja in dbt is not a typed language, the type checker statically determines whether or not the current implementation can confidently extract values without relying on python jinja rendering, which is when these errors would otherwise surface. This type checker will become more permissive over time as this tool expands to include more dbt and jinja features.

# Architecture

This architecture is optimized for value extraction and for future flexibility. This architecture is expected to change, and is coded in fp-style stages to make those changes easier for the future.

This processor is composed of several stages:
1. parser
2. type checker
3. extractor

Additionally, the following tools utilize the above processor:
1. browser-based demo of dbt extraction as you type

The tree-sitter parser is located in the tree-sitter-jinja2 library. The rust bindings are used to traverse the concrete syntax tree that tree-sitter creates in order to create a typed abstract syntax tree in the type checking stage. The errors in the type checking stage are not raised to the user, and are instead used by developers to debug tests.

The parser is solely responsible for turning text into recognized values, while the type checker does arity checking, and enforces argument list types (e.g. nested function calls like `{{ config(my_ref=ref('table')) }}` will parse but not type check even though it is valid dbt syntax. The tool at this time doesn't have an agreed serialization to communicate refs as config values, but could in the future.)

The extractor uses the typed abstract syntax tree to easily identify all the refs, sources, and configs present and extract them.

## Running The Demo App
To see the full implementation extract dbt values live as you type in a browser, run:
```
make demo
```
It may take a moment for the demo to compile an optimized version of itself.

Kill the server with ctrl+c to end the demo.

## Testing The Project
```
make test
```

## Future Work
- Refactor the tree-sitter jinja parser into its own repository to potentially open source and engage with the community on implementing improvements.
- Remove ref, source, and config type checking as hard coded rules and instead read these function types from external function definition statements.
- Create input path for a manifest file so it can be run on any project without additional pre-processing



%package -n python3-dbt-extractor
Summary:	A tool to analyze and extract information from Jinja used in dbt projects.
Provides:	python-dbt-extractor
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
BuildRequires:	python3-cffi
BuildRequires:	gcc
BuildRequires:	gdb
%description -n python3-dbt-extractor

# dbt extractor

![demo app](demo/demo.gif)

This repository contains a tool that processes the most common jinja value templates in dbt model files. The tool depends on tree-sitter and the tree-sitter-jinja2 library.

# Strategy

The current strategy is for this processor to be 100% certain when it can accurately extract values from a given model file. Anything less than 100% certainty returns an exception so that the model can be rendered with python Jinja instead. 

There are two cases we want to avoid because they would risk correctness to user's projects:
1. Confidently extracting values that would not be extracted by python jinja (false positives)
2. Confidently extracting a set of values that are missing values that python jinja would have extracted. (misses)

If we instead error when we could have confidently extracted values, there is no correctness risk to the user. Only an opportunity to expand the rules to encompass this class of cases as well.

Even though jinja in dbt is not a typed language, the type checker statically determines whether or not the current implementation can confidently extract values without relying on python jinja rendering, which is when these errors would otherwise surface. This type checker will become more permissive over time as this tool expands to include more dbt and jinja features.

# Architecture

This architecture is optimized for value extraction and for future flexibility. This architecture is expected to change, and is coded in fp-style stages to make those changes easier for the future.

This processor is composed of several stages:
1. parser
2. type checker
3. extractor

Additionally, the following tools utilize the above processor:
1. browser-based demo of dbt extraction as you type

The tree-sitter parser is located in the tree-sitter-jinja2 library. The rust bindings are used to traverse the concrete syntax tree that tree-sitter creates in order to create a typed abstract syntax tree in the type checking stage. The errors in the type checking stage are not raised to the user, and are instead used by developers to debug tests.

The parser is solely responsible for turning text into recognized values, while the type checker does arity checking, and enforces argument list types (e.g. nested function calls like `{{ config(my_ref=ref('table')) }}` will parse but not type check even though it is valid dbt syntax. The tool at this time doesn't have an agreed serialization to communicate refs as config values, but could in the future.)

The extractor uses the typed abstract syntax tree to easily identify all the refs, sources, and configs present and extract them.

## Running The Demo App
To see the full implementation extract dbt values live as you type in a browser, run:
```
make demo
```
It may take a moment for the demo to compile an optimized version of itself.

Kill the server with ctrl+c to end the demo.

## Testing The Project
```
make test
```

## Future Work
- Refactor the tree-sitter jinja parser into its own repository to potentially open source and engage with the community on implementing improvements.
- Remove ref, source, and config type checking as hard coded rules and instead read these function types from external function definition statements.
- Create input path for a manifest file so it can be run on any project without additional pre-processing



%package help
Summary:	Development documents and examples for dbt-extractor
Provides:	python3-dbt-extractor-doc
%description help

# dbt extractor

![demo app](demo/demo.gif)

This repository contains a tool that processes the most common jinja value templates in dbt model files. The tool depends on tree-sitter and the tree-sitter-jinja2 library.

# Strategy

The current strategy is for this processor to be 100% certain when it can accurately extract values from a given model file. Anything less than 100% certainty returns an exception so that the model can be rendered with python Jinja instead. 

There are two cases we want to avoid because they would risk correctness to user's projects:
1. Confidently extracting values that would not be extracted by python jinja (false positives)
2. Confidently extracting a set of values that are missing values that python jinja would have extracted. (misses)

If we instead error when we could have confidently extracted values, there is no correctness risk to the user. Only an opportunity to expand the rules to encompass this class of cases as well.

Even though jinja in dbt is not a typed language, the type checker statically determines whether or not the current implementation can confidently extract values without relying on python jinja rendering, which is when these errors would otherwise surface. This type checker will become more permissive over time as this tool expands to include more dbt and jinja features.

# Architecture

This architecture is optimized for value extraction and for future flexibility. This architecture is expected to change, and is coded in fp-style stages to make those changes easier for the future.

This processor is composed of several stages:
1. parser
2. type checker
3. extractor

Additionally, the following tools utilize the above processor:
1. browser-based demo of dbt extraction as you type

The tree-sitter parser is located in the tree-sitter-jinja2 library. The rust bindings are used to traverse the concrete syntax tree that tree-sitter creates in order to create a typed abstract syntax tree in the type checking stage. The errors in the type checking stage are not raised to the user, and are instead used by developers to debug tests.

The parser is solely responsible for turning text into recognized values, while the type checker does arity checking, and enforces argument list types (e.g. nested function calls like `{{ config(my_ref=ref('table')) }}` will parse but not type check even though it is valid dbt syntax. The tool at this time doesn't have an agreed serialization to communicate refs as config values, but could in the future.)

The extractor uses the typed abstract syntax tree to easily identify all the refs, sources, and configs present and extract them.

## Running The Demo App
To see the full implementation extract dbt values live as you type in a browser, run:
```
make demo
```
It may take a moment for the demo to compile an optimized version of itself.

Kill the server with ctrl+c to end the demo.

## Testing The Project
```
make test
```

## Future Work
- Refactor the tree-sitter jinja parser into its own repository to potentially open source and engage with the community on implementing improvements.
- Remove ref, source, and config type checking as hard coded rules and instead read these function types from external function definition statements.
- Create input path for a manifest file so it can be run on any project without additional pre-processing



%prep
%autosetup -n dbt-extractor-0.4.1

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-dbt-extractor -f filelist.lst
%dir %{python3_sitearch}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Fri Apr 21 2023 Python_Bot <Python_Bot@openeuler.org> - 0.4.1-1
- Package Spec generated