summaryrefslogtreecommitdiff
path: root/python-cmlkit.spec
blob: b97674a5d2146e6e4cd6b7b46b95134e4ef58fd9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
%global _empty_manifest_terminate_build 0
Name:		python-cmlkit
Version:	2.0.0a26
Release:	1
Summary:	Machine learning tools for computational chemistry and condensed matter physics
License:	MIT
URL:		https://github.com/sirmarcel/cmlkit
Source0:	https://mirrors.aliyun.com/pypi/web/packages/ea/80/62ec14536f7954f7890534061b4524952b35316e698fa931c5cfacf900d0/cmlkit-2.0.0a26.tar.gz
BuildArch:	noarch

Requires:	python3-hyperopt
Requires:	python3-numpy
Requires:	python3-ase
Requires:	python3-PyYAML
Requires:	python3-joblib
Requires:	python3-pebble
Requires:	python3-dill
Requires:	python3-son

%description
# cmlkit 🐫🧰

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/cmlkit.svg) [![PyPI](https://img.shields.io/pypi/v/cmlkit.svg)](https://pypi.org/project/cmlkit/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black) 

Publications: [`repbench`: Langer, Gößmann, Rupp (2020)](https://marcel.science/repbench)

Plugins: [`cscribe 🐫🖋️`](https://github.com/sirmarcel/cscribe) | [`mortimer 🎩⏰`](https://gitlab.com/sirmarcel/mortimer) | [`skrrt 🚗💨`](https://gitlab.com/sirmarcel/skrrt)

***

`cmlkit` is an extensible `python` package providing clean and concise infrastructure to specify, tune, and evaluate machine learning models for computational chemistry and condensed matter physics. Intended as a common foundation for more specialised systems, not a monolithic user-facing tool, it wants to help you build your own tools! ✨

*If you use this code in any scientific work, please mention it in the publication, cite [the paper](https://marcel.science/repbench) and let me know. Thanks! 🐫*

## What exactly is `cmlkit`?

[💡 A tutorial introduction to `cmlkit` courtesy of the NOMAD Analytics Toolkit 💡](https://www.nomad-coe.eu/index.php?page=bigdata-analyticstoolkit)

*Sidenote*: If you've come across this from outside the "ML for materials and chemistry" world, this will unfortunately be of limited use for you! However, if you're interested in ML infrastructure in general, please take a look at `engine` and `tune`, which are not specific to this domain and might be of interest.

### Features

- Reasonably clean, composable, modern codebase with little magic ✨

#### Representations

`cmlkit` provides a unified interface for:

- Many-Body Tensor Representation by [Huo, Rupp (2017)](https://arxiv.org/abs/1704.06439) (`qmmlpack` and `dscribe` implementation)
- Smooth Overlap of Atomic Positions representaton by [Bartók, Kondor, Csányi (2013)](https://doi.org/10.1103/PhysRevB.87.184115) (`quippy`‡ and `dscribe` implementations)
- Symmetry Functions representation by [Behler (2011)](https://doi.org/10.1063/1.3553717) (`RuNNer` and `dscribe` implementation), with a semi-automatic parametrisation scheme taken from [Gastegger et al. (2018)](https://doi.org/10.1063/1.5019667).

‡ The `quippy` interface was written for an older version that didn't support `python3`.

#### Regression methods

- Kernel Ridge Regression as implemented in [`qmmlpack`](https://gitlab.com/qmml/qmmlpack) (supporting both global and local/atomic representations)

#### Hyper-parameter tuning

- Robust multi-core support (i.e. it can automatically kill timed out external code, even if it ignores `SIGTERM`)
- No `mongodb` required
- Extensions to the `hyperopt` priors (uniform `log` grids)
- Resumable/recoverable runs backed by a readable, atomically written history of the optimisation (backed by [`son`](https://github.com/flokno/son))
- Search spaces can be defined entirely in text, i.e. they're easily writeable, portable and serialisable
- Possibility to implement multi-step optimisation (experimental at the moment)
- Extensible with custom loss functions or training loops

#### Various

- Automated loading of datasets by name
- Seamless conversion of properties into per-atom or per-system quantities. Models can do this automatically!
- Plugin system! ☢️ Isolate one-off nightmares! ☢️
- Canonical, stable hashes of models and datasets!
- Automatically train models and compute losses!

### But what... is it?

At its core, `cmlkit` defines a unified `dict`-based format to specify model components, which can be straightforwardly read and written as `yaml`. Model components are implemented as pure-ish functions, which is conceptually satisfying and opens the door to easy pipelining and caching. Using this format, `cmlkit` provides interfaces to many representations and a fast kernel ridge regression implementation.

Here is an example for a SOAP+KRR model:

```yaml
model:
  per: cell
  regression:
    krr:               # regression method: kernel ridge regression
      kernel:
        kernel_atomic: # soap is a local representation, so we use the appropriate kernel
          kernelf:
            gaussian:  # gaussian kernel
              ls: 80   # ... with length scale 80
      nl: 1.0e-07      # regularisation parameter
  representation:
    ds_soap:           # SOAP representation (dscribe implementation via plugin)
      cutoff: 3	
      elems: [8, 13, 31, 49]
      l_max: 8
      n_max: 2
      sigma: 0.5
```

Having a canonical model format allows `cmlkit` to provide a quite pleasant interface to `hyperopt`. The same mechanism *also* enables a simple plugin system, making `cmlkit` easily exensible, so you can isolate one-off task-specific code into separate projects without any problems, while making use of a solid, if opionated, foundation.

For a gentle, detailed tour please [check out the tutorial]( https://www.nomad-coe.eu/index.php?page=bigdata-analyticstoolkit ).

### Caveats 😬

Okay then, what are the rough parts?

- `cmlkit` is very inconvenient for interactive and non-automated use: Models cannot be saved and caching is not enabled yet, so all computations (representation, kernel matrices, etc.) must be re-run from scratch upon restart. This is not a problem during HP optimisation, as there the point is to try *different* models, but it is annoying for exploring a single model in detail. Fixing this is an *active* consideration, though! After all, the code is written with caching in mind.
- `cmlkit` is and will remain "scientific research software", i.e. it is prone to somewhat haphazard development practices and periods of hibernation. I'll do my best to avoid breaking changes and abandonement, but you know how it is!
- `cmlkit` is currently in an "alpha" state. While it's pretty stable and well-tested for some specific usecases (like writing a [large-scale benchmarking paper](https://marcel.science/repbench)), it's not tested for more everyday use. There's also some internal loose ends that need to be tied up.
- `cmlkit` is not particularly user friendly at the moment, and expects its users to be python developers. See below for notes on documentation! 😀

## Installation and friends

`cmlkit` is available via pip:

```
pip install cmlkit
```

You can also clone this repository! I'd suggest having a look into the codebase in any case, as there is currently no external documentation.

If you want to do any "real" work with `cmlkit`, you'll need to install [`qmmlpack`](https://gitlab.com/qmml/qmmlpack/-/tree/development) **on the development branch**. It's fairly straightforward!

***

In order to compute representations with `dscribe`, you should install the [`cscribe`](https://github.com/sirmarcel/cscribe) plugin:

```
pip install cscribe
```
You need to also export `CML_PLUGINS=cscribe`.

To setup the `quippy` and `RuNNer` interface please consult the readmes in `cmlkit/representation/soap` and `cmlkit/representation/sf`.

***

For details on environment variables and such things, please consult the readme in the `cmlkit` folder.

## "Frequently" Asked Questions

### Where is the documentation?

At the moment, I don't think it's feasible for me to maintain separate written docs, and I believe that purely auto-generated docs are basically a worse version of just looking at the formatted source on Github or in your text editor. So I *highly* encourage to take a look there!

Most submodules in `cmlkit` have their own `README.md` documenting what's going on in them, and all "outside facing" classes have extensive docstrings. I hope that's sufficient! Please feel free to file an issue if you have any questions.

### I don't work in computational chemistry/condensed matter physics. Should I care?

The short answer is regrettably probably no. 

However, I think the architecture of this library is quite neat, so maybe it can provide some marginally interesting reading. The `tune` component is very general and provides, in my opinion, a delightfully clean interface to `hyperopt`. The `engine` is also rather general and provides a nice way to serialise specific kinds of python objects to `yaml`.

### Why should I use this?

Well, maybe if you:

- need to use any of the libraries mentioned above, especially if you want to use them in the same project with the same infrastructure,
- are tired of plain `hyperopt`,
- would like to be able to save your model parameters in a readable format,
- think it's neat?

My goal with this is to make it slightly easier for you to build up your own infrastructure for studying models and applications in our field! If you're just starting out, just take a look around!



%package -n python3-cmlkit
Summary:	Machine learning tools for computational chemistry and condensed matter physics
Provides:	python-cmlkit
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-cmlkit
# cmlkit 🐫🧰

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/cmlkit.svg) [![PyPI](https://img.shields.io/pypi/v/cmlkit.svg)](https://pypi.org/project/cmlkit/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black) 

Publications: [`repbench`: Langer, Gößmann, Rupp (2020)](https://marcel.science/repbench)

Plugins: [`cscribe 🐫🖋️`](https://github.com/sirmarcel/cscribe) | [`mortimer 🎩⏰`](https://gitlab.com/sirmarcel/mortimer) | [`skrrt 🚗💨`](https://gitlab.com/sirmarcel/skrrt)

***

`cmlkit` is an extensible `python` package providing clean and concise infrastructure to specify, tune, and evaluate machine learning models for computational chemistry and condensed matter physics. Intended as a common foundation for more specialised systems, not a monolithic user-facing tool, it wants to help you build your own tools! ✨

*If you use this code in any scientific work, please mention it in the publication, cite [the paper](https://marcel.science/repbench) and let me know. Thanks! 🐫*

## What exactly is `cmlkit`?

[💡 A tutorial introduction to `cmlkit` courtesy of the NOMAD Analytics Toolkit 💡](https://www.nomad-coe.eu/index.php?page=bigdata-analyticstoolkit)

*Sidenote*: If you've come across this from outside the "ML for materials and chemistry" world, this will unfortunately be of limited use for you! However, if you're interested in ML infrastructure in general, please take a look at `engine` and `tune`, which are not specific to this domain and might be of interest.

### Features

- Reasonably clean, composable, modern codebase with little magic ✨

#### Representations

`cmlkit` provides a unified interface for:

- Many-Body Tensor Representation by [Huo, Rupp (2017)](https://arxiv.org/abs/1704.06439) (`qmmlpack` and `dscribe` implementation)
- Smooth Overlap of Atomic Positions representaton by [Bartók, Kondor, Csányi (2013)](https://doi.org/10.1103/PhysRevB.87.184115) (`quippy`‡ and `dscribe` implementations)
- Symmetry Functions representation by [Behler (2011)](https://doi.org/10.1063/1.3553717) (`RuNNer` and `dscribe` implementation), with a semi-automatic parametrisation scheme taken from [Gastegger et al. (2018)](https://doi.org/10.1063/1.5019667).

‡ The `quippy` interface was written for an older version that didn't support `python3`.

#### Regression methods

- Kernel Ridge Regression as implemented in [`qmmlpack`](https://gitlab.com/qmml/qmmlpack) (supporting both global and local/atomic representations)

#### Hyper-parameter tuning

- Robust multi-core support (i.e. it can automatically kill timed out external code, even if it ignores `SIGTERM`)
- No `mongodb` required
- Extensions to the `hyperopt` priors (uniform `log` grids)
- Resumable/recoverable runs backed by a readable, atomically written history of the optimisation (backed by [`son`](https://github.com/flokno/son))
- Search spaces can be defined entirely in text, i.e. they're easily writeable, portable and serialisable
- Possibility to implement multi-step optimisation (experimental at the moment)
- Extensible with custom loss functions or training loops

#### Various

- Automated loading of datasets by name
- Seamless conversion of properties into per-atom or per-system quantities. Models can do this automatically!
- Plugin system! ☢️ Isolate one-off nightmares! ☢️
- Canonical, stable hashes of models and datasets!
- Automatically train models and compute losses!

### But what... is it?

At its core, `cmlkit` defines a unified `dict`-based format to specify model components, which can be straightforwardly read and written as `yaml`. Model components are implemented as pure-ish functions, which is conceptually satisfying and opens the door to easy pipelining and caching. Using this format, `cmlkit` provides interfaces to many representations and a fast kernel ridge regression implementation.

Here is an example for a SOAP+KRR model:

```yaml
model:
  per: cell
  regression:
    krr:               # regression method: kernel ridge regression
      kernel:
        kernel_atomic: # soap is a local representation, so we use the appropriate kernel
          kernelf:
            gaussian:  # gaussian kernel
              ls: 80   # ... with length scale 80
      nl: 1.0e-07      # regularisation parameter
  representation:
    ds_soap:           # SOAP representation (dscribe implementation via plugin)
      cutoff: 3	
      elems: [8, 13, 31, 49]
      l_max: 8
      n_max: 2
      sigma: 0.5
```

Having a canonical model format allows `cmlkit` to provide a quite pleasant interface to `hyperopt`. The same mechanism *also* enables a simple plugin system, making `cmlkit` easily exensible, so you can isolate one-off task-specific code into separate projects without any problems, while making use of a solid, if opionated, foundation.

For a gentle, detailed tour please [check out the tutorial]( https://www.nomad-coe.eu/index.php?page=bigdata-analyticstoolkit ).

### Caveats 😬

Okay then, what are the rough parts?

- `cmlkit` is very inconvenient for interactive and non-automated use: Models cannot be saved and caching is not enabled yet, so all computations (representation, kernel matrices, etc.) must be re-run from scratch upon restart. This is not a problem during HP optimisation, as there the point is to try *different* models, but it is annoying for exploring a single model in detail. Fixing this is an *active* consideration, though! After all, the code is written with caching in mind.
- `cmlkit` is and will remain "scientific research software", i.e. it is prone to somewhat haphazard development practices and periods of hibernation. I'll do my best to avoid breaking changes and abandonement, but you know how it is!
- `cmlkit` is currently in an "alpha" state. While it's pretty stable and well-tested for some specific usecases (like writing a [large-scale benchmarking paper](https://marcel.science/repbench)), it's not tested for more everyday use. There's also some internal loose ends that need to be tied up.
- `cmlkit` is not particularly user friendly at the moment, and expects its users to be python developers. See below for notes on documentation! 😀

## Installation and friends

`cmlkit` is available via pip:

```
pip install cmlkit
```

You can also clone this repository! I'd suggest having a look into the codebase in any case, as there is currently no external documentation.

If you want to do any "real" work with `cmlkit`, you'll need to install [`qmmlpack`](https://gitlab.com/qmml/qmmlpack/-/tree/development) **on the development branch**. It's fairly straightforward!

***

In order to compute representations with `dscribe`, you should install the [`cscribe`](https://github.com/sirmarcel/cscribe) plugin:

```
pip install cscribe
```
You need to also export `CML_PLUGINS=cscribe`.

To setup the `quippy` and `RuNNer` interface please consult the readmes in `cmlkit/representation/soap` and `cmlkit/representation/sf`.

***

For details on environment variables and such things, please consult the readme in the `cmlkit` folder.

## "Frequently" Asked Questions

### Where is the documentation?

At the moment, I don't think it's feasible for me to maintain separate written docs, and I believe that purely auto-generated docs are basically a worse version of just looking at the formatted source on Github or in your text editor. So I *highly* encourage to take a look there!

Most submodules in `cmlkit` have their own `README.md` documenting what's going on in them, and all "outside facing" classes have extensive docstrings. I hope that's sufficient! Please feel free to file an issue if you have any questions.

### I don't work in computational chemistry/condensed matter physics. Should I care?

The short answer is regrettably probably no. 

However, I think the architecture of this library is quite neat, so maybe it can provide some marginally interesting reading. The `tune` component is very general and provides, in my opinion, a delightfully clean interface to `hyperopt`. The `engine` is also rather general and provides a nice way to serialise specific kinds of python objects to `yaml`.

### Why should I use this?

Well, maybe if you:

- need to use any of the libraries mentioned above, especially if you want to use them in the same project with the same infrastructure,
- are tired of plain `hyperopt`,
- would like to be able to save your model parameters in a readable format,
- think it's neat?

My goal with this is to make it slightly easier for you to build up your own infrastructure for studying models and applications in our field! If you're just starting out, just take a look around!



%package help
Summary:	Development documents and examples for cmlkit
Provides:	python3-cmlkit-doc
%description help
# cmlkit 🐫🧰

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/cmlkit.svg) [![PyPI](https://img.shields.io/pypi/v/cmlkit.svg)](https://pypi.org/project/cmlkit/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black) 

Publications: [`repbench`: Langer, Gößmann, Rupp (2020)](https://marcel.science/repbench)

Plugins: [`cscribe 🐫🖋️`](https://github.com/sirmarcel/cscribe) | [`mortimer 🎩⏰`](https://gitlab.com/sirmarcel/mortimer) | [`skrrt 🚗💨`](https://gitlab.com/sirmarcel/skrrt)

***

`cmlkit` is an extensible `python` package providing clean and concise infrastructure to specify, tune, and evaluate machine learning models for computational chemistry and condensed matter physics. Intended as a common foundation for more specialised systems, not a monolithic user-facing tool, it wants to help you build your own tools! ✨

*If you use this code in any scientific work, please mention it in the publication, cite [the paper](https://marcel.science/repbench) and let me know. Thanks! 🐫*

## What exactly is `cmlkit`?

[💡 A tutorial introduction to `cmlkit` courtesy of the NOMAD Analytics Toolkit 💡](https://www.nomad-coe.eu/index.php?page=bigdata-analyticstoolkit)

*Sidenote*: If you've come across this from outside the "ML for materials and chemistry" world, this will unfortunately be of limited use for you! However, if you're interested in ML infrastructure in general, please take a look at `engine` and `tune`, which are not specific to this domain and might be of interest.

### Features

- Reasonably clean, composable, modern codebase with little magic ✨

#### Representations

`cmlkit` provides a unified interface for:

- Many-Body Tensor Representation by [Huo, Rupp (2017)](https://arxiv.org/abs/1704.06439) (`qmmlpack` and `dscribe` implementation)
- Smooth Overlap of Atomic Positions representaton by [Bartók, Kondor, Csányi (2013)](https://doi.org/10.1103/PhysRevB.87.184115) (`quippy`‡ and `dscribe` implementations)
- Symmetry Functions representation by [Behler (2011)](https://doi.org/10.1063/1.3553717) (`RuNNer` and `dscribe` implementation), with a semi-automatic parametrisation scheme taken from [Gastegger et al. (2018)](https://doi.org/10.1063/1.5019667).

‡ The `quippy` interface was written for an older version that didn't support `python3`.

#### Regression methods

- Kernel Ridge Regression as implemented in [`qmmlpack`](https://gitlab.com/qmml/qmmlpack) (supporting both global and local/atomic representations)

#### Hyper-parameter tuning

- Robust multi-core support (i.e. it can automatically kill timed out external code, even if it ignores `SIGTERM`)
- No `mongodb` required
- Extensions to the `hyperopt` priors (uniform `log` grids)
- Resumable/recoverable runs backed by a readable, atomically written history of the optimisation (backed by [`son`](https://github.com/flokno/son))
- Search spaces can be defined entirely in text, i.e. they're easily writeable, portable and serialisable
- Possibility to implement multi-step optimisation (experimental at the moment)
- Extensible with custom loss functions or training loops

#### Various

- Automated loading of datasets by name
- Seamless conversion of properties into per-atom or per-system quantities. Models can do this automatically!
- Plugin system! ☢️ Isolate one-off nightmares! ☢️
- Canonical, stable hashes of models and datasets!
- Automatically train models and compute losses!

### But what... is it?

At its core, `cmlkit` defines a unified `dict`-based format to specify model components, which can be straightforwardly read and written as `yaml`. Model components are implemented as pure-ish functions, which is conceptually satisfying and opens the door to easy pipelining and caching. Using this format, `cmlkit` provides interfaces to many representations and a fast kernel ridge regression implementation.

Here is an example for a SOAP+KRR model:

```yaml
model:
  per: cell
  regression:
    krr:               # regression method: kernel ridge regression
      kernel:
        kernel_atomic: # soap is a local representation, so we use the appropriate kernel
          kernelf:
            gaussian:  # gaussian kernel
              ls: 80   # ... with length scale 80
      nl: 1.0e-07      # regularisation parameter
  representation:
    ds_soap:           # SOAP representation (dscribe implementation via plugin)
      cutoff: 3	
      elems: [8, 13, 31, 49]
      l_max: 8
      n_max: 2
      sigma: 0.5
```

Having a canonical model format allows `cmlkit` to provide a quite pleasant interface to `hyperopt`. The same mechanism *also* enables a simple plugin system, making `cmlkit` easily exensible, so you can isolate one-off task-specific code into separate projects without any problems, while making use of a solid, if opionated, foundation.

For a gentle, detailed tour please [check out the tutorial]( https://www.nomad-coe.eu/index.php?page=bigdata-analyticstoolkit ).

### Caveats 😬

Okay then, what are the rough parts?

- `cmlkit` is very inconvenient for interactive and non-automated use: Models cannot be saved and caching is not enabled yet, so all computations (representation, kernel matrices, etc.) must be re-run from scratch upon restart. This is not a problem during HP optimisation, as there the point is to try *different* models, but it is annoying for exploring a single model in detail. Fixing this is an *active* consideration, though! After all, the code is written with caching in mind.
- `cmlkit` is and will remain "scientific research software", i.e. it is prone to somewhat haphazard development practices and periods of hibernation. I'll do my best to avoid breaking changes and abandonement, but you know how it is!
- `cmlkit` is currently in an "alpha" state. While it's pretty stable and well-tested for some specific usecases (like writing a [large-scale benchmarking paper](https://marcel.science/repbench)), it's not tested for more everyday use. There's also some internal loose ends that need to be tied up.
- `cmlkit` is not particularly user friendly at the moment, and expects its users to be python developers. See below for notes on documentation! 😀

## Installation and friends

`cmlkit` is available via pip:

```
pip install cmlkit
```

You can also clone this repository! I'd suggest having a look into the codebase in any case, as there is currently no external documentation.

If you want to do any "real" work with `cmlkit`, you'll need to install [`qmmlpack`](https://gitlab.com/qmml/qmmlpack/-/tree/development) **on the development branch**. It's fairly straightforward!

***

In order to compute representations with `dscribe`, you should install the [`cscribe`](https://github.com/sirmarcel/cscribe) plugin:

```
pip install cscribe
```
You need to also export `CML_PLUGINS=cscribe`.

To setup the `quippy` and `RuNNer` interface please consult the readmes in `cmlkit/representation/soap` and `cmlkit/representation/sf`.

***

For details on environment variables and such things, please consult the readme in the `cmlkit` folder.

## "Frequently" Asked Questions

### Where is the documentation?

At the moment, I don't think it's feasible for me to maintain separate written docs, and I believe that purely auto-generated docs are basically a worse version of just looking at the formatted source on Github or in your text editor. So I *highly* encourage to take a look there!

Most submodules in `cmlkit` have their own `README.md` documenting what's going on in them, and all "outside facing" classes have extensive docstrings. I hope that's sufficient! Please feel free to file an issue if you have any questions.

### I don't work in computational chemistry/condensed matter physics. Should I care?

The short answer is regrettably probably no. 

However, I think the architecture of this library is quite neat, so maybe it can provide some marginally interesting reading. The `tune` component is very general and provides, in my opinion, a delightfully clean interface to `hyperopt`. The `engine` is also rather general and provides a nice way to serialise specific kinds of python objects to `yaml`.

### Why should I use this?

Well, maybe if you:

- need to use any of the libraries mentioned above, especially if you want to use them in the same project with the same infrastructure,
- are tired of plain `hyperopt`,
- would like to be able to save your model parameters in a readable format,
- think it's neat?

My goal with this is to make it slightly easier for you to build up your own infrastructure for studying models and applications in our field! If you're just starting out, just take a look around!



%prep
%autosetup -n cmlkit-2.0.0a26

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-cmlkit -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 2.0.0a26-1
- Package Spec generated