1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
|
%global _empty_manifest_terminate_build 0
Name: python-data-warehouse-client
Version: 3.0.2
Release: 1
Summary: This package provides access to the e-Science Central data warehouse that can be used to store, access and analyse data collected in scientific studies, including for healthcare applications
License: Apache Software License
URL: https://github.com/e-science-central/data-warehouse-client
Source0: https://mirrors.nju.edu.cn/pypi/web/packages/6f/a5/c5ec93e3a9a098d245cd20088be16e509bafe9be764b0091dda6fe3fbd2b/data-warehouse-client-3.0.2.tar.gz
BuildArch: noarch
Requires: python3-more-itertools
Requires: python3-matplotlib
Requires: python3-psycopg2
Requires: python3-tabulate
%description
# Data Warehouse Client
This package provides access to the e-Science Central data warehouse that can be used to store, access and analyse
data collected in scientific studies, including for healthcare applications. The primary aim of the warehouse
was to create a general system that enables users to explore data collected in a variety of forms. This might include
data collected through questionnaires, data collected from sensors,
and features extracted from the analysis of sensor data (e.g. activity levels derived from raw accelerometer data).
Researchers might wish to slice, dice, visualise, analyse and explore this data in different ways,
e.g. all results for one participant,
all results for one type of measure in a study,
changes in measurements over time. Others may wish to build models that can then be used in applications
that make predictions about future values.
Traditionally, data collected in studies has been stored in a collection of files,
often with metadata encoded in the filenames.
This makes it difficult, and time consuming, for researchers to explore, interpret and analyse the data.
The data warehouse exploits modern database technology to vastly simplify this effort.
In doing this we have drawn heavily on the best practice for data warehouse design.
However, there is more variety in the types of healthcare data to be stored than there is in a typical warehouse,
and so we have been forced to deviate from a conventional data warehouse in some aspect of the design.
There are three guiding principles behind the design:
1. The data warehouse must be able to store any type of data collected in a study without modifying the schema.
This means that when new types of data are collected in studies (e.g. from a new questionnaire,
a new data analysis program, or a new sensor) they can be stored in the warehouse without any changes to its design.
This has 3 main advantages:
firstly, it enables us to fix and optimise the schema for the tables in which the data is stored;
secondly it means that applications and tools (e.g. for analysis and visualisation)
built on the warehouse do not have to be updated when new types of data are added;
thirdly, a single, multi-tenant database server can support many studies.
This reduces the overall costs, the start-up time for a new study, and the overheads of managing the warehouse.
2. Descriptive information about the types of measurement is stored in the warehouse so that tools or humans
can interpret the data stored there.
3. The design is optimised for query performance. In several cases, this has led to denormalization
(duplication of data) to reduce the need for expensive joins.
4. It must support a security regime to restrict each user’s access
to the data collected in studies.
For more information see:
P. Watson and H. Hiden, "The e-Science Central Study Data Platform"
2022 IEEE 18th International Conference on e-Science (e-Science),
Salt Lake City, UT, USA, 2022, pp. 55-64, doi: 10.1109/eScience55777.2022.00020.
https://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=KQJg3lwAAAAJ&sortby=pubdate&citation_for_view=KQJg3lwAAAAJ:z0_F5_TITjQC
For more documentation see [A Data Warehouse for Storing and Analysing Study Data](docs/data_warehouse_guide.pdf).
# Running Instructions
To install from PyPi, run:
pip install data-warehouse-client
In directory in which your executable is run, create a `db-credentials.json` file containing database
credentials (substituting all `<VARS>`):
```
{"user": "<USER>", "pass": "<PASSWORD>", "IP": "<IP>", "port": <PORT>}
```
%package -n python3-data-warehouse-client
Summary: This package provides access to the e-Science Central data warehouse that can be used to store, access and analyse data collected in scientific studies, including for healthcare applications
Provides: python-data-warehouse-client
BuildRequires: python3-devel
BuildRequires: python3-setuptools
BuildRequires: python3-pip
%description -n python3-data-warehouse-client
# Data Warehouse Client
This package provides access to the e-Science Central data warehouse that can be used to store, access and analyse
data collected in scientific studies, including for healthcare applications. The primary aim of the warehouse
was to create a general system that enables users to explore data collected in a variety of forms. This might include
data collected through questionnaires, data collected from sensors,
and features extracted from the analysis of sensor data (e.g. activity levels derived from raw accelerometer data).
Researchers might wish to slice, dice, visualise, analyse and explore this data in different ways,
e.g. all results for one participant,
all results for one type of measure in a study,
changes in measurements over time. Others may wish to build models that can then be used in applications
that make predictions about future values.
Traditionally, data collected in studies has been stored in a collection of files,
often with metadata encoded in the filenames.
This makes it difficult, and time consuming, for researchers to explore, interpret and analyse the data.
The data warehouse exploits modern database technology to vastly simplify this effort.
In doing this we have drawn heavily on the best practice for data warehouse design.
However, there is more variety in the types of healthcare data to be stored than there is in a typical warehouse,
and so we have been forced to deviate from a conventional data warehouse in some aspect of the design.
There are three guiding principles behind the design:
1. The data warehouse must be able to store any type of data collected in a study without modifying the schema.
This means that when new types of data are collected in studies (e.g. from a new questionnaire,
a new data analysis program, or a new sensor) they can be stored in the warehouse without any changes to its design.
This has 3 main advantages:
firstly, it enables us to fix and optimise the schema for the tables in which the data is stored;
secondly it means that applications and tools (e.g. for analysis and visualisation)
built on the warehouse do not have to be updated when new types of data are added;
thirdly, a single, multi-tenant database server can support many studies.
This reduces the overall costs, the start-up time for a new study, and the overheads of managing the warehouse.
2. Descriptive information about the types of measurement is stored in the warehouse so that tools or humans
can interpret the data stored there.
3. The design is optimised for query performance. In several cases, this has led to denormalization
(duplication of data) to reduce the need for expensive joins.
4. It must support a security regime to restrict each user’s access
to the data collected in studies.
For more information see:
P. Watson and H. Hiden, "The e-Science Central Study Data Platform"
2022 IEEE 18th International Conference on e-Science (e-Science),
Salt Lake City, UT, USA, 2022, pp. 55-64, doi: 10.1109/eScience55777.2022.00020.
https://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=KQJg3lwAAAAJ&sortby=pubdate&citation_for_view=KQJg3lwAAAAJ:z0_F5_TITjQC
For more documentation see [A Data Warehouse for Storing and Analysing Study Data](docs/data_warehouse_guide.pdf).
# Running Instructions
To install from PyPi, run:
pip install data-warehouse-client
In directory in which your executable is run, create a `db-credentials.json` file containing database
credentials (substituting all `<VARS>`):
```
{"user": "<USER>", "pass": "<PASSWORD>", "IP": "<IP>", "port": <PORT>}
```
%package help
Summary: Development documents and examples for data-warehouse-client
Provides: python3-data-warehouse-client-doc
%description help
# Data Warehouse Client
This package provides access to the e-Science Central data warehouse that can be used to store, access and analyse
data collected in scientific studies, including for healthcare applications. The primary aim of the warehouse
was to create a general system that enables users to explore data collected in a variety of forms. This might include
data collected through questionnaires, data collected from sensors,
and features extracted from the analysis of sensor data (e.g. activity levels derived from raw accelerometer data).
Researchers might wish to slice, dice, visualise, analyse and explore this data in different ways,
e.g. all results for one participant,
all results for one type of measure in a study,
changes in measurements over time. Others may wish to build models that can then be used in applications
that make predictions about future values.
Traditionally, data collected in studies has been stored in a collection of files,
often with metadata encoded in the filenames.
This makes it difficult, and time consuming, for researchers to explore, interpret and analyse the data.
The data warehouse exploits modern database technology to vastly simplify this effort.
In doing this we have drawn heavily on the best practice for data warehouse design.
However, there is more variety in the types of healthcare data to be stored than there is in a typical warehouse,
and so we have been forced to deviate from a conventional data warehouse in some aspect of the design.
There are three guiding principles behind the design:
1. The data warehouse must be able to store any type of data collected in a study without modifying the schema.
This means that when new types of data are collected in studies (e.g. from a new questionnaire,
a new data analysis program, or a new sensor) they can be stored in the warehouse without any changes to its design.
This has 3 main advantages:
firstly, it enables us to fix and optimise the schema for the tables in which the data is stored;
secondly it means that applications and tools (e.g. for analysis and visualisation)
built on the warehouse do not have to be updated when new types of data are added;
thirdly, a single, multi-tenant database server can support many studies.
This reduces the overall costs, the start-up time for a new study, and the overheads of managing the warehouse.
2. Descriptive information about the types of measurement is stored in the warehouse so that tools or humans
can interpret the data stored there.
3. The design is optimised for query performance. In several cases, this has led to denormalization
(duplication of data) to reduce the need for expensive joins.
4. It must support a security regime to restrict each user’s access
to the data collected in studies.
For more information see:
P. Watson and H. Hiden, "The e-Science Central Study Data Platform"
2022 IEEE 18th International Conference on e-Science (e-Science),
Salt Lake City, UT, USA, 2022, pp. 55-64, doi: 10.1109/eScience55777.2022.00020.
https://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=KQJg3lwAAAAJ&sortby=pubdate&citation_for_view=KQJg3lwAAAAJ:z0_F5_TITjQC
For more documentation see [A Data Warehouse for Storing and Analysing Study Data](docs/data_warehouse_guide.pdf).
# Running Instructions
To install from PyPi, run:
pip install data-warehouse-client
In directory in which your executable is run, create a `db-credentials.json` file containing database
credentials (substituting all `<VARS>`):
```
{"user": "<USER>", "pass": "<PASSWORD>", "IP": "<IP>", "port": <PORT>}
```
%prep
%autosetup -n data-warehouse-client-3.0.2
%build
%py3_build
%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .
%files -n python3-data-warehouse-client -f filelist.lst
%dir %{python3_sitelib}/*
%files help -f doclist.lst
%{_docdir}/*
%changelog
* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 3.0.2-1
- Package Spec generated
|