1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
|
%global _empty_manifest_terminate_build 0
Name: python-doc-warden
Version: 0.7.2
Release: 1
Summary: Doc-Warden is an internal project created by the Azure SDK Team. It is intended to be used by CI Builds to ensure that documentation standards are met. See readme for more details.
License: MIT License
URL: https://github.com/Azure/azure-sdk-tools/
Source0: https://mirrors.nju.edu.cn/pypi/web/packages/f1/21/c4833c48ce08e85e63726fc0dd36926e114bdb3951699630bc30fdc34b16/doc-warden-0.7.2.zip
BuildArch: noarch
Requires: python3-pyyaml
Requires: python3-markdown2
Requires: python3-docutils
Requires: python3-pygments
Requires: python3-beautifulsoup4
Requires: python3-jinja2
Requires: python3-requests
Requires: python3-pathlib2
%description
# Doc Warden [](https://dev.azure.com/azure-sdk/public/_build/latest?definitionId=108&branchName=main)
Every CI build owned by the Azure-SDK team also needs to verify that the documentation within the target repo meets a set of standards. `Doc-warden` is intended to ease the _implementation_ of these checks in CI builds.
Features:
* Enforces Readme Standards
- Readmes present
- Readmes have appropriate content
* Enforces Changelog Standards
- Changelogs Present
- Changelogs contain entry and content for the latest package version
* Generates report for included observed packages
This package is tested on Python 3.4 -> 3.8. This package went python3-only starting with `0.7.0`.
## Prerequisites
This package is intended to be run as part of a pipeline within Azure DevOps. As such, [Python](https://www.python.org/downloads/) must be installed prior to attempting to install or use `Doc-Warden.` While `pip` comes pre-installed on most modern Python installs, if `pip` is an unrecognized command when attempting to install `warden`, run the following command **after** your Python installation is complete.
In addition, `warden` is distributed using `setuptools` and `wheel`, so those packages should also be present prior to install.
```
/:> python -m ensurepip
/:> pip install setuptools wheel
```
## Usage
Right now, `warden` supports two main purposes. 1. Readme and Changelog enforcement (`scan`, `content`, `presence`), and 2. package indexing (`index`).
### Example usage (for any of the above commands):
```
<pre-step, clone target repository>
...
/:> pip install setuptools wheel
/:> pip install doc-warden
...
<next task, because PATH doesn't update without another one>
/:> ward scan -d $(Build.SourcesDirectory)
```
**Notes for example above**
* Assumption is that the `.docsettings` file is placed at the root of the repository.
To provide a different path (like `azure-sdk-for-java` does...), use:
```
/:> ward scan -d $(Build.SourcesDirectory) -c $(Build.SourcesDirectory)/eng/.docsettings.yml
```
##### Parameter Options
`command`
Currently supports 3 commands. Values: `['scan', 'presence', 'content', `index`]` **Required.**
* `scan`
* Run both `content` and `presence` enforcement on the targeted directory.
* `content`
* Run only `content` enforcement on the target directory. Ensures that:
- The content in each readme matches the regex patterns defined in the .docsettings file
- Each changelog contains entry for the latest version.
* `presence`
* Run only `presence` enforcement on the target directory. Ensures readmes and changelogs exist where they should.
* `index`
* Take inventory of the target folder. Attempts to leverage selected docsettings to discover all packages within the directory, and generate a `packages.md` index file.
`--scan-directory`
The target directory `warden` should be scanning. **Required.**
`--repo-root`
The root of the repo. Entries in the config-file should be relative to this directory. **Optional.**
`--scan-language`
`warden` checks for packages by _convention_, so it needs to understand what language it is looking at. This must be populated either in `.docsettings file` or by parameter. **Optional.**
`--config-location`
By default, `warden` looks for the `.docsettings` file in the root of the repository. However, populating this location will override this behavior and instead pull the file from the location in this parameter. **Optional.**
`--pipeline-stage`
The stage of the pipeline. can be `pr`, `ci`, or `release`. **Optional.**
`--target`
Specify what file to run enforcement on `readme` or `changelog`. Used when running `content` or `presence` verification only. **Optional.**
`--package-output`
Override the default location that the generated `packages.md` file is dropped to during execution of the `index` command.
`--verbose-output`
Enable or disable output of an html report. Defaults to false. **Optional.**
##### Notes for Devops Usage
The `-d` argument should be `$(Build.SourcesDirectory)`. This will point `warden` at the repo that has been associated with CI.
## Methodology
### Enforcing Readme Presence
When should we expect a readme and/or changelog to be present?
**Always:**
* At the root of the repo (Readme only)
* Associated with a `package` directory (Readme and Changelog)
#### .Net
A package directory is indicated by:
* a `*,csproj` file under the `sdk` directory
* Note that this is just a proxy. `warden` attempts to omit test projects by convention.
#### Python
A package directory is indicated by:
* the presence of a `setup.py` file
#### Java
A package directory is indicated by:
* the presence of a `pom.xml` file
* The POM `<packaging>` value within is set to `JAR`
#### Node & JS
A package directory is indicated by:
* The presence of a `package.json` file
### Enforcing Readme Content
`doc-warden` has the ability to check discovered readme files to ensure that a set of configured sections is present. How does it work? `doc-warden` will ensure that each regex defined in `required_readme_sections` matches against at least one section header in the readme. If all the patterns match at least one header, the readme will pass content verification.
Other Notes:
* A `section` header is any markdown or RST that will result in a `<h1>` to `<h2>` html tag.
* `warden` will content verify any `readme.rst` or `readme.md` file found outside the `omitted_paths` in the targeted repo.
### Enforcing Changelog Content
`doc-warden` checks the latest entry in the changelog file to make sure it matches the latest version of the package. It also checks to make sure that the entry is not empty.
#### Control, the `.docsettings.yml` File, and You
Special cases often need to be configured. It seems logical that there needs be a central location (per repo) to override conventional settings. To that end, a new `.docsettings.yml` file will be added to each repo.
```
<repo-root>
│ README.md
│ .docsettings.yml
│
└───.azure-pipelines
│ │ <build def>
│
└───<other files and folders>
```
The presence of this file allows each repository to customize how enforcement takes place within their repo.
**Example DocSettings File for Java Repo**
```
omitted_paths:
- archive/*
- sdk/eventhub/
language: java
root_check_enabled: True
required_readme_sections:
- "(Client Library for Azure .*|Microsoft Azure SDK for .*)"
- Getting Started
- Install Package
- Prerequisites
- Authenticate the Client
known_presence_issues:
- ['cognitiveservices/data-plane/language/bingspellcheck/README.md', '#2847']
- ['cognitiveservices/data-plane/language/bingspellcheck/CHANGELOG.md', '#2847']
known_content_issues:
- ['sdk/template/azure-sdk-template/README.md','#1368']
- ['sdk/template/azure-sdk-template/CHANGELOG.md','#1368']
```
The above configuration tells `warden`...
- The language within the repo is `java`
- To ensure that a `README.md` is present at the root of the repository.
- To omit **any** paths under `archive/` from the readme checks.
- To omit paths found **directly** under `sdk/eventhub/`.
- This means that if there is a readme content issue under `sdk/eventhub/azure-messaging/`, it will still throw an error!
Possible values for `language` right now are `['net', 'java', 'js', 'python']`. Greater than one target language is not currently supported.
##### `required_readme_sections` Configuration
This section instructs `warden` to verify that there is at least one matching section title for each provided `section` pattern in any discovered readme. Notice that **nested** specifications are supported. Regex is fully supported.
The two items listed from the example `.docsettings` file will:
- Match a header matched by a simple regex expression
- Match a header exactly titled "Getting Started"
- Under the header "Getting Started" validate that 3 additional headings are present.
- `doc-warden` will search up to the next header of equivalent importantance for the sub-headings.
- This means that when searching under header `# Getting Started`, doc-warden will scan up to the next `H1` header.
Note that the regex is surrounded by quotation marks where the regex will break `yml` parsing of the configuration file.
##### `known_presence_issues` and `known_content_issues` Configuration
`doc-warden` is designed to crash builds if it detects failures. However, the vast majority of the time, these issues cannot be fixed immediately. In the above configuration, there are two paths highlighted as known issues.
The first, `known_presence_issues`, tells warden that a presence failure detected at the specified paths _should be ignored_ and should not result in a crashed build. A `tuple` describing each known issue specifies both what the known issue is, as well as some sort of justification. Having an exception with an issueId attached is a good justification for not failing the build.
> We're aware of this issue, and it is tracked in the following github issue.
The `known_content_issues` parameter functions _identically_ to the `known_presence_issues` check. If a readme is listed as "already known" to have failures, the entire CI build will not be crashed by Warden.
##### `package_indexing_exclusion_list` and `package_indexing_traversal_stops` Configuration
Indexing packages is often done as part of nightly (or triggered) automation. With this being the case, sometimes `warden` may detect a PackageId that users wish to omit from the generated `packages.md` file. The Azure SDK team leverages
the `package_indexing_exclusion_list` array members to enable just this sort of scenario.
`package_indexing_traversal_stops` is used during parse of .NET language repos _only_. This is due to how the discovery logic for readme and changelog is implemented for .NET projects. Specifically, readmes for a .csproj are often a couple directories up from their parent .csproj location!
For .net, `warden` will traverse **up** one directory at a time, looking for the readme and changelog files in each traversed directory. `warden` will continue to traverse until...
1. It discovers a folder with a `.sln` within it
2. It encounters a folder that exactly matches one present in `package_indexing_traversal_stops`
Note that `warden` will not even execute an index against a .NET repo _unless the traversal stops are set_.
[SDK for net .docsettings](https://github.com/Azure/azure-sdk-for-net/blob/main/eng/.docsettings.yml) is a great example for both the exclusion list as well as the traversal stops.
## Provide Feedback
If you encounter any bugs or have suggestions, please file an issue [here](https://github.com/Azure/azure-sdk-tools/issues) and assign to `scbedd`.
%package -n python3-doc-warden
Summary: Doc-Warden is an internal project created by the Azure SDK Team. It is intended to be used by CI Builds to ensure that documentation standards are met. See readme for more details.
Provides: python-doc-warden
BuildRequires: python3-devel
BuildRequires: python3-setuptools
BuildRequires: python3-pip
%description -n python3-doc-warden
# Doc Warden [](https://dev.azure.com/azure-sdk/public/_build/latest?definitionId=108&branchName=main)
Every CI build owned by the Azure-SDK team also needs to verify that the documentation within the target repo meets a set of standards. `Doc-warden` is intended to ease the _implementation_ of these checks in CI builds.
Features:
* Enforces Readme Standards
- Readmes present
- Readmes have appropriate content
* Enforces Changelog Standards
- Changelogs Present
- Changelogs contain entry and content for the latest package version
* Generates report for included observed packages
This package is tested on Python 3.4 -> 3.8. This package went python3-only starting with `0.7.0`.
## Prerequisites
This package is intended to be run as part of a pipeline within Azure DevOps. As such, [Python](https://www.python.org/downloads/) must be installed prior to attempting to install or use `Doc-Warden.` While `pip` comes pre-installed on most modern Python installs, if `pip` is an unrecognized command when attempting to install `warden`, run the following command **after** your Python installation is complete.
In addition, `warden` is distributed using `setuptools` and `wheel`, so those packages should also be present prior to install.
```
/:> python -m ensurepip
/:> pip install setuptools wheel
```
## Usage
Right now, `warden` supports two main purposes. 1. Readme and Changelog enforcement (`scan`, `content`, `presence`), and 2. package indexing (`index`).
### Example usage (for any of the above commands):
```
<pre-step, clone target repository>
...
/:> pip install setuptools wheel
/:> pip install doc-warden
...
<next task, because PATH doesn't update without another one>
/:> ward scan -d $(Build.SourcesDirectory)
```
**Notes for example above**
* Assumption is that the `.docsettings` file is placed at the root of the repository.
To provide a different path (like `azure-sdk-for-java` does...), use:
```
/:> ward scan -d $(Build.SourcesDirectory) -c $(Build.SourcesDirectory)/eng/.docsettings.yml
```
##### Parameter Options
`command`
Currently supports 3 commands. Values: `['scan', 'presence', 'content', `index`]` **Required.**
* `scan`
* Run both `content` and `presence` enforcement on the targeted directory.
* `content`
* Run only `content` enforcement on the target directory. Ensures that:
- The content in each readme matches the regex patterns defined in the .docsettings file
- Each changelog contains entry for the latest version.
* `presence`
* Run only `presence` enforcement on the target directory. Ensures readmes and changelogs exist where they should.
* `index`
* Take inventory of the target folder. Attempts to leverage selected docsettings to discover all packages within the directory, and generate a `packages.md` index file.
`--scan-directory`
The target directory `warden` should be scanning. **Required.**
`--repo-root`
The root of the repo. Entries in the config-file should be relative to this directory. **Optional.**
`--scan-language`
`warden` checks for packages by _convention_, so it needs to understand what language it is looking at. This must be populated either in `.docsettings file` or by parameter. **Optional.**
`--config-location`
By default, `warden` looks for the `.docsettings` file in the root of the repository. However, populating this location will override this behavior and instead pull the file from the location in this parameter. **Optional.**
`--pipeline-stage`
The stage of the pipeline. can be `pr`, `ci`, or `release`. **Optional.**
`--target`
Specify what file to run enforcement on `readme` or `changelog`. Used when running `content` or `presence` verification only. **Optional.**
`--package-output`
Override the default location that the generated `packages.md` file is dropped to during execution of the `index` command.
`--verbose-output`
Enable or disable output of an html report. Defaults to false. **Optional.**
##### Notes for Devops Usage
The `-d` argument should be `$(Build.SourcesDirectory)`. This will point `warden` at the repo that has been associated with CI.
## Methodology
### Enforcing Readme Presence
When should we expect a readme and/or changelog to be present?
**Always:**
* At the root of the repo (Readme only)
* Associated with a `package` directory (Readme and Changelog)
#### .Net
A package directory is indicated by:
* a `*,csproj` file under the `sdk` directory
* Note that this is just a proxy. `warden` attempts to omit test projects by convention.
#### Python
A package directory is indicated by:
* the presence of a `setup.py` file
#### Java
A package directory is indicated by:
* the presence of a `pom.xml` file
* The POM `<packaging>` value within is set to `JAR`
#### Node & JS
A package directory is indicated by:
* The presence of a `package.json` file
### Enforcing Readme Content
`doc-warden` has the ability to check discovered readme files to ensure that a set of configured sections is present. How does it work? `doc-warden` will ensure that each regex defined in `required_readme_sections` matches against at least one section header in the readme. If all the patterns match at least one header, the readme will pass content verification.
Other Notes:
* A `section` header is any markdown or RST that will result in a `<h1>` to `<h2>` html tag.
* `warden` will content verify any `readme.rst` or `readme.md` file found outside the `omitted_paths` in the targeted repo.
### Enforcing Changelog Content
`doc-warden` checks the latest entry in the changelog file to make sure it matches the latest version of the package. It also checks to make sure that the entry is not empty.
#### Control, the `.docsettings.yml` File, and You
Special cases often need to be configured. It seems logical that there needs be a central location (per repo) to override conventional settings. To that end, a new `.docsettings.yml` file will be added to each repo.
```
<repo-root>
│ README.md
│ .docsettings.yml
│
└───.azure-pipelines
│ │ <build def>
│
└───<other files and folders>
```
The presence of this file allows each repository to customize how enforcement takes place within their repo.
**Example DocSettings File for Java Repo**
```
omitted_paths:
- archive/*
- sdk/eventhub/
language: java
root_check_enabled: True
required_readme_sections:
- "(Client Library for Azure .*|Microsoft Azure SDK for .*)"
- Getting Started
- Install Package
- Prerequisites
- Authenticate the Client
known_presence_issues:
- ['cognitiveservices/data-plane/language/bingspellcheck/README.md', '#2847']
- ['cognitiveservices/data-plane/language/bingspellcheck/CHANGELOG.md', '#2847']
known_content_issues:
- ['sdk/template/azure-sdk-template/README.md','#1368']
- ['sdk/template/azure-sdk-template/CHANGELOG.md','#1368']
```
The above configuration tells `warden`...
- The language within the repo is `java`
- To ensure that a `README.md` is present at the root of the repository.
- To omit **any** paths under `archive/` from the readme checks.
- To omit paths found **directly** under `sdk/eventhub/`.
- This means that if there is a readme content issue under `sdk/eventhub/azure-messaging/`, it will still throw an error!
Possible values for `language` right now are `['net', 'java', 'js', 'python']`. Greater than one target language is not currently supported.
##### `required_readme_sections` Configuration
This section instructs `warden` to verify that there is at least one matching section title for each provided `section` pattern in any discovered readme. Notice that **nested** specifications are supported. Regex is fully supported.
The two items listed from the example `.docsettings` file will:
- Match a header matched by a simple regex expression
- Match a header exactly titled "Getting Started"
- Under the header "Getting Started" validate that 3 additional headings are present.
- `doc-warden` will search up to the next header of equivalent importantance for the sub-headings.
- This means that when searching under header `# Getting Started`, doc-warden will scan up to the next `H1` header.
Note that the regex is surrounded by quotation marks where the regex will break `yml` parsing of the configuration file.
##### `known_presence_issues` and `known_content_issues` Configuration
`doc-warden` is designed to crash builds if it detects failures. However, the vast majority of the time, these issues cannot be fixed immediately. In the above configuration, there are two paths highlighted as known issues.
The first, `known_presence_issues`, tells warden that a presence failure detected at the specified paths _should be ignored_ and should not result in a crashed build. A `tuple` describing each known issue specifies both what the known issue is, as well as some sort of justification. Having an exception with an issueId attached is a good justification for not failing the build.
> We're aware of this issue, and it is tracked in the following github issue.
The `known_content_issues` parameter functions _identically_ to the `known_presence_issues` check. If a readme is listed as "already known" to have failures, the entire CI build will not be crashed by Warden.
##### `package_indexing_exclusion_list` and `package_indexing_traversal_stops` Configuration
Indexing packages is often done as part of nightly (or triggered) automation. With this being the case, sometimes `warden` may detect a PackageId that users wish to omit from the generated `packages.md` file. The Azure SDK team leverages
the `package_indexing_exclusion_list` array members to enable just this sort of scenario.
`package_indexing_traversal_stops` is used during parse of .NET language repos _only_. This is due to how the discovery logic for readme and changelog is implemented for .NET projects. Specifically, readmes for a .csproj are often a couple directories up from their parent .csproj location!
For .net, `warden` will traverse **up** one directory at a time, looking for the readme and changelog files in each traversed directory. `warden` will continue to traverse until...
1. It discovers a folder with a `.sln` within it
2. It encounters a folder that exactly matches one present in `package_indexing_traversal_stops`
Note that `warden` will not even execute an index against a .NET repo _unless the traversal stops are set_.
[SDK for net .docsettings](https://github.com/Azure/azure-sdk-for-net/blob/main/eng/.docsettings.yml) is a great example for both the exclusion list as well as the traversal stops.
## Provide Feedback
If you encounter any bugs or have suggestions, please file an issue [here](https://github.com/Azure/azure-sdk-tools/issues) and assign to `scbedd`.
%package help
Summary: Development documents and examples for doc-warden
Provides: python3-doc-warden-doc
%description help
# Doc Warden [](https://dev.azure.com/azure-sdk/public/_build/latest?definitionId=108&branchName=main)
Every CI build owned by the Azure-SDK team also needs to verify that the documentation within the target repo meets a set of standards. `Doc-warden` is intended to ease the _implementation_ of these checks in CI builds.
Features:
* Enforces Readme Standards
- Readmes present
- Readmes have appropriate content
* Enforces Changelog Standards
- Changelogs Present
- Changelogs contain entry and content for the latest package version
* Generates report for included observed packages
This package is tested on Python 3.4 -> 3.8. This package went python3-only starting with `0.7.0`.
## Prerequisites
This package is intended to be run as part of a pipeline within Azure DevOps. As such, [Python](https://www.python.org/downloads/) must be installed prior to attempting to install or use `Doc-Warden.` While `pip` comes pre-installed on most modern Python installs, if `pip` is an unrecognized command when attempting to install `warden`, run the following command **after** your Python installation is complete.
In addition, `warden` is distributed using `setuptools` and `wheel`, so those packages should also be present prior to install.
```
/:> python -m ensurepip
/:> pip install setuptools wheel
```
## Usage
Right now, `warden` supports two main purposes. 1. Readme and Changelog enforcement (`scan`, `content`, `presence`), and 2. package indexing (`index`).
### Example usage (for any of the above commands):
```
<pre-step, clone target repository>
...
/:> pip install setuptools wheel
/:> pip install doc-warden
...
<next task, because PATH doesn't update without another one>
/:> ward scan -d $(Build.SourcesDirectory)
```
**Notes for example above**
* Assumption is that the `.docsettings` file is placed at the root of the repository.
To provide a different path (like `azure-sdk-for-java` does...), use:
```
/:> ward scan -d $(Build.SourcesDirectory) -c $(Build.SourcesDirectory)/eng/.docsettings.yml
```
##### Parameter Options
`command`
Currently supports 3 commands. Values: `['scan', 'presence', 'content', `index`]` **Required.**
* `scan`
* Run both `content` and `presence` enforcement on the targeted directory.
* `content`
* Run only `content` enforcement on the target directory. Ensures that:
- The content in each readme matches the regex patterns defined in the .docsettings file
- Each changelog contains entry for the latest version.
* `presence`
* Run only `presence` enforcement on the target directory. Ensures readmes and changelogs exist where they should.
* `index`
* Take inventory of the target folder. Attempts to leverage selected docsettings to discover all packages within the directory, and generate a `packages.md` index file.
`--scan-directory`
The target directory `warden` should be scanning. **Required.**
`--repo-root`
The root of the repo. Entries in the config-file should be relative to this directory. **Optional.**
`--scan-language`
`warden` checks for packages by _convention_, so it needs to understand what language it is looking at. This must be populated either in `.docsettings file` or by parameter. **Optional.**
`--config-location`
By default, `warden` looks for the `.docsettings` file in the root of the repository. However, populating this location will override this behavior and instead pull the file from the location in this parameter. **Optional.**
`--pipeline-stage`
The stage of the pipeline. can be `pr`, `ci`, or `release`. **Optional.**
`--target`
Specify what file to run enforcement on `readme` or `changelog`. Used when running `content` or `presence` verification only. **Optional.**
`--package-output`
Override the default location that the generated `packages.md` file is dropped to during execution of the `index` command.
`--verbose-output`
Enable or disable output of an html report. Defaults to false. **Optional.**
##### Notes for Devops Usage
The `-d` argument should be `$(Build.SourcesDirectory)`. This will point `warden` at the repo that has been associated with CI.
## Methodology
### Enforcing Readme Presence
When should we expect a readme and/or changelog to be present?
**Always:**
* At the root of the repo (Readme only)
* Associated with a `package` directory (Readme and Changelog)
#### .Net
A package directory is indicated by:
* a `*,csproj` file under the `sdk` directory
* Note that this is just a proxy. `warden` attempts to omit test projects by convention.
#### Python
A package directory is indicated by:
* the presence of a `setup.py` file
#### Java
A package directory is indicated by:
* the presence of a `pom.xml` file
* The POM `<packaging>` value within is set to `JAR`
#### Node & JS
A package directory is indicated by:
* The presence of a `package.json` file
### Enforcing Readme Content
`doc-warden` has the ability to check discovered readme files to ensure that a set of configured sections is present. How does it work? `doc-warden` will ensure that each regex defined in `required_readme_sections` matches against at least one section header in the readme. If all the patterns match at least one header, the readme will pass content verification.
Other Notes:
* A `section` header is any markdown or RST that will result in a `<h1>` to `<h2>` html tag.
* `warden` will content verify any `readme.rst` or `readme.md` file found outside the `omitted_paths` in the targeted repo.
### Enforcing Changelog Content
`doc-warden` checks the latest entry in the changelog file to make sure it matches the latest version of the package. It also checks to make sure that the entry is not empty.
#### Control, the `.docsettings.yml` File, and You
Special cases often need to be configured. It seems logical that there needs be a central location (per repo) to override conventional settings. To that end, a new `.docsettings.yml` file will be added to each repo.
```
<repo-root>
│ README.md
│ .docsettings.yml
│
└───.azure-pipelines
│ │ <build def>
│
└───<other files and folders>
```
The presence of this file allows each repository to customize how enforcement takes place within their repo.
**Example DocSettings File for Java Repo**
```
omitted_paths:
- archive/*
- sdk/eventhub/
language: java
root_check_enabled: True
required_readme_sections:
- "(Client Library for Azure .*|Microsoft Azure SDK for .*)"
- Getting Started
- Install Package
- Prerequisites
- Authenticate the Client
known_presence_issues:
- ['cognitiveservices/data-plane/language/bingspellcheck/README.md', '#2847']
- ['cognitiveservices/data-plane/language/bingspellcheck/CHANGELOG.md', '#2847']
known_content_issues:
- ['sdk/template/azure-sdk-template/README.md','#1368']
- ['sdk/template/azure-sdk-template/CHANGELOG.md','#1368']
```
The above configuration tells `warden`...
- The language within the repo is `java`
- To ensure that a `README.md` is present at the root of the repository.
- To omit **any** paths under `archive/` from the readme checks.
- To omit paths found **directly** under `sdk/eventhub/`.
- This means that if there is a readme content issue under `sdk/eventhub/azure-messaging/`, it will still throw an error!
Possible values for `language` right now are `['net', 'java', 'js', 'python']`. Greater than one target language is not currently supported.
##### `required_readme_sections` Configuration
This section instructs `warden` to verify that there is at least one matching section title for each provided `section` pattern in any discovered readme. Notice that **nested** specifications are supported. Regex is fully supported.
The two items listed from the example `.docsettings` file will:
- Match a header matched by a simple regex expression
- Match a header exactly titled "Getting Started"
- Under the header "Getting Started" validate that 3 additional headings are present.
- `doc-warden` will search up to the next header of equivalent importantance for the sub-headings.
- This means that when searching under header `# Getting Started`, doc-warden will scan up to the next `H1` header.
Note that the regex is surrounded by quotation marks where the regex will break `yml` parsing of the configuration file.
##### `known_presence_issues` and `known_content_issues` Configuration
`doc-warden` is designed to crash builds if it detects failures. However, the vast majority of the time, these issues cannot be fixed immediately. In the above configuration, there are two paths highlighted as known issues.
The first, `known_presence_issues`, tells warden that a presence failure detected at the specified paths _should be ignored_ and should not result in a crashed build. A `tuple` describing each known issue specifies both what the known issue is, as well as some sort of justification. Having an exception with an issueId attached is a good justification for not failing the build.
> We're aware of this issue, and it is tracked in the following github issue.
The `known_content_issues` parameter functions _identically_ to the `known_presence_issues` check. If a readme is listed as "already known" to have failures, the entire CI build will not be crashed by Warden.
##### `package_indexing_exclusion_list` and `package_indexing_traversal_stops` Configuration
Indexing packages is often done as part of nightly (or triggered) automation. With this being the case, sometimes `warden` may detect a PackageId that users wish to omit from the generated `packages.md` file. The Azure SDK team leverages
the `package_indexing_exclusion_list` array members to enable just this sort of scenario.
`package_indexing_traversal_stops` is used during parse of .NET language repos _only_. This is due to how the discovery logic for readme and changelog is implemented for .NET projects. Specifically, readmes for a .csproj are often a couple directories up from their parent .csproj location!
For .net, `warden` will traverse **up** one directory at a time, looking for the readme and changelog files in each traversed directory. `warden` will continue to traverse until...
1. It discovers a folder with a `.sln` within it
2. It encounters a folder that exactly matches one present in `package_indexing_traversal_stops`
Note that `warden` will not even execute an index against a .NET repo _unless the traversal stops are set_.
[SDK for net .docsettings](https://github.com/Azure/azure-sdk-for-net/blob/main/eng/.docsettings.yml) is a great example for both the exclusion list as well as the traversal stops.
## Provide Feedback
If you encounter any bugs or have suggestions, please file an issue [here](https://github.com/Azure/azure-sdk-tools/issues) and assign to `scbedd`.
%prep
%autosetup -n doc-warden-0.7.2
%build
%py3_build
%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .
%files -n python3-doc-warden -f filelist.lst
%dir %{python3_sitelib}/*
%files help -f doclist.lst
%{_docdir}/*
%changelog
* Mon Apr 10 2023 Python_Bot <Python_Bot@openeuler.org> - 0.7.2-1
- Package Spec generated
|