%global _empty_manifest_terminate_build 0 Name: python-sparkaid Version: 1.0.0 Release: 1 Summary: Utils for working with Spark License: GNU Lesser General Public License v3 or later (LGPLv3+) URL: https://github.com/lvhuyen/SparkAid Source0: https://mirrors.nju.edu.cn/pypi/web/packages/c1/1e/92e74e641719430d3d6216e4eef34e90a89876080c2013d8a5aa557b4284/sparkaid-1.0.0.tar.gz BuildArch: noarch Requires: python3-pyspark Requires: python3-pip-tools Requires: python3-pytest %description ## Flattening ### StructType Sample DataFrame: from pyspark.sql import Row from pyspark.sql.functions import col df_struct = spark.createDataFrame([Row(structA=Row(field1=10, field2=1.5), structB=Row(field3="one",field4=False))]) df_struct.printSchema() root |-- structA: struct (nullable = true) | |-- field1: long (nullable = true) | |-- field2: double (nullable = true) |-- structB: struct (nullable = true) | |-- field3: string (nullable = true) | |-- field4: boolean (nullable = true) Spark allows selecting nested columns by using the dot `.` notation: df_struct.select("structA.*", "structB.field3").printSchema() root |-- field1: long (nullable = true) |-- field2: double (nullable = true) |-- field3: string (nullable = true) Please note here that the current Spark implementation (2.4.3 or below) doesn't keep the outer layer fieldname (e.g: structA) in the output dataframe ### ArrayType To select only some elements from an ArrayType column, either *`getItem()`* or using brackets (as selecting elements from a legacy array: `[]` in Python `()` in Scala) would do the trick: df_array = spark.createDataFrame([Row(arrayA=[1,2,3,4,5],fieldB="foo")]) df_array.select(col("arrayA").getItem(0).alias("element0"), col("arrayA")[4].alias("element5"), col("fieldB")).show() +--------+--------+------+ |element0|element5|fieldB| +--------+--------+------+ | 1| 5| foo| +--------+--------+------+ ### MapType Elements from a MapType column can be selected the same way as in the case of ArrayType, but using the key instead of index number. The dot notation (`.`) could also be used instead of `getItem()` or brackets: df_map = spark.createDataFrame([Row(mapA={2: "TWO", 3: "THREE", 0: "ZERO"}, fieldB="foo")]) df_map.select(col("mapA")[3].alias("element3"), col("mapA").getItem(2).alias("element2"), col("mapA.0").alias("element0"), col("mapA").getItem(1).alias("element1")).show() +--------+--------+--------+--------+ |element3|element2|element0|element1| +--------+--------+--------+--------+ | THREE| TWO| ZERO| null| +--------+--------+--------+--------+ ### StructType nested in StructType As Spark DataFrame.select() supports passing an array of columns to be selected, to fully unflatten a multi-layer nested dataframe, a recursive call would do the trick. Here is a detailed discussion on StackOverFlow on how to do this: https://stackoverflow.com/questions/37471346/automatically-and-elegantly-flatten-dataframe-in-spark-sql ### StructType nested in ArrayType df_nested = spark.createDataFrame([ Row( arrayA=[ Row(childStructB=Row(field1=1, field2="foo")), Row(childStructB=Row(field1=2, field2="bar")) ] )]) df_nested.printSchema() root |-- arrayA: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- childStructB: struct (nullable = true) | | | |-- field1: long (nullable = true) | | | |-- field2: string (nullable = true) df_nested.show(1, False) +------------------------+ |arrayA | +------------------------+ |[[[1, foo]], [[2, bar]]]| +------------------------+ Selecting *field1* or *field2* can be done as with normal structs (not nested inside an array), by using that dot `.` annotation. The result would be of the type `ArrayType[ChildFieldType]`, which has been **_vertically sliced_** from the original array df_child = df_nested.select("arrayA.childStructB.field1", "arrayA.childStructB.field2") df_child.printSchema() root |-- field1: array (nullable = true) | |-- element: long (containsNull = true) |-- field2: array (nullable = true) | |-- element: string (containsNull = true) df_child.show() +------+----------+ |field1| field2| +------+----------+ |[1, 2]|[foo, bar]| +------+----------+ ### StructType nested in MapType As each MapType column has two components, the keys and the values, selecting nested column inside a MapType column is not straight forward - we cannot just use that `.` to take the nested fields as that has already been used for denoting the key. df_map_nested = spark.createDataFrame([Row(mapA={"2": Row(type_name="Arabic number", equivalent=2), "THREE": Row(type_name="English Text", equivalent=3)}, fieldB="foo")]) df_map_nested.select(col("mapA.type_name"), col("mapA.THREE.type_name")).show() +---------+------------+ |type_name| type_name| +---------+------------+ | null|English Text| +---------+------------+ A solution for this is to use the builtin function `map_values()` which has been introduced since Spark 2.3. Note the type of the result column: ArrayType from pyspark.sql.functions import map_values result = df_map_nested.select(map_values("mapA")["type_name"], col("mapA.THREE.type_name")) result.show(2,False) result.printSchema() +-----------------------------+------------+ |map_values(mapA).type_name |type_name | +-----------------------------+------------+ |[Arabic number, English Text]|English Text| +-----------------------------+------------+ root |-- map_values(mapA).type_name: array (nullable = true) | |-- element: string (containsNull = true) |-- type_name: string (nullable = true) ## Hurdles The above steps would work well for most of dataframes. The only dataframes that it fails (as of Spark 2.4.3 or lower) are the ones with a StructType nested inside MORE THAN ONE layers of ArrayType. Like this one: df_nested_B = spark.createDataFrame([ Row( arrayA=[[ Row(childStructB=Row(field1=1, field2="foo")), Row(childStructB=Row(field1=2, field2="bar")) ]] )]) df_nested_B.printSchema() root |-- arrayA: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: struct (containsNull = true) | | | |-- childStructB: struct (nullable = true) | | | | |-- field1: long (nullable = true) | | | | |-- field2: string (nullable = true) Or this one df_nested_C = spark.createDataFrame([ Row( arrayA=[ Row(childStructB=Row(childArrayC=[Row(field1=1, field2="foo")])), Row(childStructB=Row(childArrayC=[Row(field1=2, field2="bar")])), ] )]) df_nested_C.printSchema() root |-- arrayA: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- childStructB: struct (nullable = true) | | | |-- childArrayC: array (nullable = true) | | | | |-- element: struct (containsNull = true) | | | | | |-- field1: long (nullable = true) | | | | | |-- field2: string (nullable = true) Selecting `arrayA.childStructB.field1` from `df_nested_B` fails with the error message: `AnalysisException: No such struct field field1 in childStructB`.
While selecting `arrayA.childStructB.childArrayC.field1` from `df_nested_C` throws the `AnalysisException`: `cannot resolve 'arrayA.childStructB.childArrayC['field1']' due to data type mismatch: argument 2 requires integral type, however, ''field1'' is of string type.` ## (More) Solutions With the introduction of the SQL function `transform` in Spark 2.4, the error above can be solved by applying `transform` on every layer of the array. A comprehensive implementation of a flatten function can be found in the Python package `sparkaid`: from sparkaid import flatten flatten(df_nested_B).printSchema() root |-- arrayA__childStructB_field1: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: long (containsNull = true) |-- arrayA__childStructB_field2: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: string (containsNull = true)

flatten(df_nested_B).show() +---------------------------+---------------------------+ |arrayA__childStructB_field1|arrayA__childStructB_field2| +---------------------------+---------------------------+ | [[1, 2]]| [[foo, bar]]| +---------------------------+---------------------------+

flatten(df_nested_C).printSchema() root |-- arrayA_childStructB_childArrayC_field1: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: long (containsNull = true) |-- arrayA_childStructB_childArrayC_field2: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: string (containsNull = true) []: https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/functions.html#flatten-org.apache.spark.sql.Column %package -n python3-sparkaid Summary: Utils for working with Spark Provides: python-sparkaid BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-sparkaid ## Flattening ### StructType Sample DataFrame: from pyspark.sql import Row from pyspark.sql.functions import col df_struct = spark.createDataFrame([Row(structA=Row(field1=10, field2=1.5), structB=Row(field3="one",field4=False))]) df_struct.printSchema() root |-- structA: struct (nullable = true) | |-- field1: long (nullable = true) | |-- field2: double (nullable = true) |-- structB: struct (nullable = true) | |-- field3: string (nullable = true) | |-- field4: boolean (nullable = true) Spark allows selecting nested columns by using the dot `.` notation: df_struct.select("structA.*", "structB.field3").printSchema() root |-- field1: long (nullable = true) |-- field2: double (nullable = true) |-- field3: string (nullable = true) Please note here that the current Spark implementation (2.4.3 or below) doesn't keep the outer layer fieldname (e.g: structA) in the output dataframe ### ArrayType To select only some elements from an ArrayType column, either *`getItem()`* or using brackets (as selecting elements from a legacy array: `[]` in Python `()` in Scala) would do the trick: df_array = spark.createDataFrame([Row(arrayA=[1,2,3,4,5],fieldB="foo")]) df_array.select(col("arrayA").getItem(0).alias("element0"), col("arrayA")[4].alias("element5"), col("fieldB")).show() +--------+--------+------+ |element0|element5|fieldB| +--------+--------+------+ | 1| 5| foo| +--------+--------+------+ ### MapType Elements from a MapType column can be selected the same way as in the case of ArrayType, but using the key instead of index number. The dot notation (`.`) could also be used instead of `getItem()` or brackets: df_map = spark.createDataFrame([Row(mapA={2: "TWO", 3: "THREE", 0: "ZERO"}, fieldB="foo")]) df_map.select(col("mapA")[3].alias("element3"), col("mapA").getItem(2).alias("element2"), col("mapA.0").alias("element0"), col("mapA").getItem(1).alias("element1")).show() +--------+--------+--------+--------+ |element3|element2|element0|element1| +--------+--------+--------+--------+ | THREE| TWO| ZERO| null| +--------+--------+--------+--------+ ### StructType nested in StructType As Spark DataFrame.select() supports passing an array of columns to be selected, to fully unflatten a multi-layer nested dataframe, a recursive call would do the trick. Here is a detailed discussion on StackOverFlow on how to do this: https://stackoverflow.com/questions/37471346/automatically-and-elegantly-flatten-dataframe-in-spark-sql ### StructType nested in ArrayType df_nested = spark.createDataFrame([ Row( arrayA=[ Row(childStructB=Row(field1=1, field2="foo")), Row(childStructB=Row(field1=2, field2="bar")) ] )]) df_nested.printSchema() root |-- arrayA: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- childStructB: struct (nullable = true) | | | |-- field1: long (nullable = true) | | | |-- field2: string (nullable = true) df_nested.show(1, False) +------------------------+ |arrayA | +------------------------+ |[[[1, foo]], [[2, bar]]]| +------------------------+ Selecting *field1* or *field2* can be done as with normal structs (not nested inside an array), by using that dot `.` annotation. The result would be of the type `ArrayType[ChildFieldType]`, which has been **_vertically sliced_** from the original array df_child = df_nested.select("arrayA.childStructB.field1", "arrayA.childStructB.field2") df_child.printSchema() root |-- field1: array (nullable = true) | |-- element: long (containsNull = true) |-- field2: array (nullable = true) | |-- element: string (containsNull = true) df_child.show() +------+----------+ |field1| field2| +------+----------+ |[1, 2]|[foo, bar]| +------+----------+ ### StructType nested in MapType As each MapType column has two components, the keys and the values, selecting nested column inside a MapType column is not straight forward - we cannot just use that `.` to take the nested fields as that has already been used for denoting the key. df_map_nested = spark.createDataFrame([Row(mapA={"2": Row(type_name="Arabic number", equivalent=2), "THREE": Row(type_name="English Text", equivalent=3)}, fieldB="foo")]) df_map_nested.select(col("mapA.type_name"), col("mapA.THREE.type_name")).show() +---------+------------+ |type_name| type_name| +---------+------------+ | null|English Text| +---------+------------+ A solution for this is to use the builtin function `map_values()` which has been introduced since Spark 2.3. Note the type of the result column: ArrayType from pyspark.sql.functions import map_values result = df_map_nested.select(map_values("mapA")["type_name"], col("mapA.THREE.type_name")) result.show(2,False) result.printSchema() +-----------------------------+------------+ |map_values(mapA).type_name |type_name | +-----------------------------+------------+ |[Arabic number, English Text]|English Text| +-----------------------------+------------+ root |-- map_values(mapA).type_name: array (nullable = true) | |-- element: string (containsNull = true) |-- type_name: string (nullable = true) ## Hurdles The above steps would work well for most of dataframes. The only dataframes that it fails (as of Spark 2.4.3 or lower) are the ones with a StructType nested inside MORE THAN ONE layers of ArrayType. Like this one: df_nested_B = spark.createDataFrame([ Row( arrayA=[[ Row(childStructB=Row(field1=1, field2="foo")), Row(childStructB=Row(field1=2, field2="bar")) ]] )]) df_nested_B.printSchema() root |-- arrayA: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: struct (containsNull = true) | | | |-- childStructB: struct (nullable = true) | | | | |-- field1: long (nullable = true) | | | | |-- field2: string (nullable = true) Or this one df_nested_C = spark.createDataFrame([ Row( arrayA=[ Row(childStructB=Row(childArrayC=[Row(field1=1, field2="foo")])), Row(childStructB=Row(childArrayC=[Row(field1=2, field2="bar")])), ] )]) df_nested_C.printSchema() root |-- arrayA: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- childStructB: struct (nullable = true) | | | |-- childArrayC: array (nullable = true) | | | | |-- element: struct (containsNull = true) | | | | | |-- field1: long (nullable = true) | | | | | |-- field2: string (nullable = true) Selecting `arrayA.childStructB.field1` from `df_nested_B` fails with the error message: `AnalysisException: No such struct field field1 in childStructB`.
While selecting `arrayA.childStructB.childArrayC.field1` from `df_nested_C` throws the `AnalysisException`: `cannot resolve 'arrayA.childStructB.childArrayC['field1']' due to data type mismatch: argument 2 requires integral type, however, ''field1'' is of string type.` ## (More) Solutions With the introduction of the SQL function `transform` in Spark 2.4, the error above can be solved by applying `transform` on every layer of the array. A comprehensive implementation of a flatten function can be found in the Python package `sparkaid`: from sparkaid import flatten flatten(df_nested_B).printSchema() root |-- arrayA__childStructB_field1: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: long (containsNull = true) |-- arrayA__childStructB_field2: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: string (containsNull = true)

flatten(df_nested_B).show() +---------------------------+---------------------------+ |arrayA__childStructB_field1|arrayA__childStructB_field2| +---------------------------+---------------------------+ | [[1, 2]]| [[foo, bar]]| +---------------------------+---------------------------+

flatten(df_nested_C).printSchema() root |-- arrayA_childStructB_childArrayC_field1: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: long (containsNull = true) |-- arrayA_childStructB_childArrayC_field2: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: string (containsNull = true) []: https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/functions.html#flatten-org.apache.spark.sql.Column %package help Summary: Development documents and examples for sparkaid Provides: python3-sparkaid-doc %description help ## Flattening ### StructType Sample DataFrame: from pyspark.sql import Row from pyspark.sql.functions import col df_struct = spark.createDataFrame([Row(structA=Row(field1=10, field2=1.5), structB=Row(field3="one",field4=False))]) df_struct.printSchema() root |-- structA: struct (nullable = true) | |-- field1: long (nullable = true) | |-- field2: double (nullable = true) |-- structB: struct (nullable = true) | |-- field3: string (nullable = true) | |-- field4: boolean (nullable = true) Spark allows selecting nested columns by using the dot `.` notation: df_struct.select("structA.*", "structB.field3").printSchema() root |-- field1: long (nullable = true) |-- field2: double (nullable = true) |-- field3: string (nullable = true) Please note here that the current Spark implementation (2.4.3 or below) doesn't keep the outer layer fieldname (e.g: structA) in the output dataframe ### ArrayType To select only some elements from an ArrayType column, either *`getItem()`* or using brackets (as selecting elements from a legacy array: `[]` in Python `()` in Scala) would do the trick: df_array = spark.createDataFrame([Row(arrayA=[1,2,3,4,5],fieldB="foo")]) df_array.select(col("arrayA").getItem(0).alias("element0"), col("arrayA")[4].alias("element5"), col("fieldB")).show() +--------+--------+------+ |element0|element5|fieldB| +--------+--------+------+ | 1| 5| foo| +--------+--------+------+ ### MapType Elements from a MapType column can be selected the same way as in the case of ArrayType, but using the key instead of index number. The dot notation (`.`) could also be used instead of `getItem()` or brackets: df_map = spark.createDataFrame([Row(mapA={2: "TWO", 3: "THREE", 0: "ZERO"}, fieldB="foo")]) df_map.select(col("mapA")[3].alias("element3"), col("mapA").getItem(2).alias("element2"), col("mapA.0").alias("element0"), col("mapA").getItem(1).alias("element1")).show() +--------+--------+--------+--------+ |element3|element2|element0|element1| +--------+--------+--------+--------+ | THREE| TWO| ZERO| null| +--------+--------+--------+--------+ ### StructType nested in StructType As Spark DataFrame.select() supports passing an array of columns to be selected, to fully unflatten a multi-layer nested dataframe, a recursive call would do the trick. Here is a detailed discussion on StackOverFlow on how to do this: https://stackoverflow.com/questions/37471346/automatically-and-elegantly-flatten-dataframe-in-spark-sql ### StructType nested in ArrayType df_nested = spark.createDataFrame([ Row( arrayA=[ Row(childStructB=Row(field1=1, field2="foo")), Row(childStructB=Row(field1=2, field2="bar")) ] )]) df_nested.printSchema() root |-- arrayA: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- childStructB: struct (nullable = true) | | | |-- field1: long (nullable = true) | | | |-- field2: string (nullable = true) df_nested.show(1, False) +------------------------+ |arrayA | +------------------------+ |[[[1, foo]], [[2, bar]]]| +------------------------+ Selecting *field1* or *field2* can be done as with normal structs (not nested inside an array), by using that dot `.` annotation. The result would be of the type `ArrayType[ChildFieldType]`, which has been **_vertically sliced_** from the original array df_child = df_nested.select("arrayA.childStructB.field1", "arrayA.childStructB.field2") df_child.printSchema() root |-- field1: array (nullable = true) | |-- element: long (containsNull = true) |-- field2: array (nullable = true) | |-- element: string (containsNull = true) df_child.show() +------+----------+ |field1| field2| +------+----------+ |[1, 2]|[foo, bar]| +------+----------+ ### StructType nested in MapType As each MapType column has two components, the keys and the values, selecting nested column inside a MapType column is not straight forward - we cannot just use that `.` to take the nested fields as that has already been used for denoting the key. df_map_nested = spark.createDataFrame([Row(mapA={"2": Row(type_name="Arabic number", equivalent=2), "THREE": Row(type_name="English Text", equivalent=3)}, fieldB="foo")]) df_map_nested.select(col("mapA.type_name"), col("mapA.THREE.type_name")).show() +---------+------------+ |type_name| type_name| +---------+------------+ | null|English Text| +---------+------------+ A solution for this is to use the builtin function `map_values()` which has been introduced since Spark 2.3. Note the type of the result column: ArrayType from pyspark.sql.functions import map_values result = df_map_nested.select(map_values("mapA")["type_name"], col("mapA.THREE.type_name")) result.show(2,False) result.printSchema() +-----------------------------+------------+ |map_values(mapA).type_name |type_name | +-----------------------------+------------+ |[Arabic number, English Text]|English Text| +-----------------------------+------------+ root |-- map_values(mapA).type_name: array (nullable = true) | |-- element: string (containsNull = true) |-- type_name: string (nullable = true) ## Hurdles The above steps would work well for most of dataframes. The only dataframes that it fails (as of Spark 2.4.3 or lower) are the ones with a StructType nested inside MORE THAN ONE layers of ArrayType. Like this one: df_nested_B = spark.createDataFrame([ Row( arrayA=[[ Row(childStructB=Row(field1=1, field2="foo")), Row(childStructB=Row(field1=2, field2="bar")) ]] )]) df_nested_B.printSchema() root |-- arrayA: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: struct (containsNull = true) | | | |-- childStructB: struct (nullable = true) | | | | |-- field1: long (nullable = true) | | | | |-- field2: string (nullable = true) Or this one df_nested_C = spark.createDataFrame([ Row( arrayA=[ Row(childStructB=Row(childArrayC=[Row(field1=1, field2="foo")])), Row(childStructB=Row(childArrayC=[Row(field1=2, field2="bar")])), ] )]) df_nested_C.printSchema() root |-- arrayA: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- childStructB: struct (nullable = true) | | | |-- childArrayC: array (nullable = true) | | | | |-- element: struct (containsNull = true) | | | | | |-- field1: long (nullable = true) | | | | | |-- field2: string (nullable = true) Selecting `arrayA.childStructB.field1` from `df_nested_B` fails with the error message: `AnalysisException: No such struct field field1 in childStructB`.
While selecting `arrayA.childStructB.childArrayC.field1` from `df_nested_C` throws the `AnalysisException`: `cannot resolve 'arrayA.childStructB.childArrayC['field1']' due to data type mismatch: argument 2 requires integral type, however, ''field1'' is of string type.` ## (More) Solutions With the introduction of the SQL function `transform` in Spark 2.4, the error above can be solved by applying `transform` on every layer of the array. A comprehensive implementation of a flatten function can be found in the Python package `sparkaid`: from sparkaid import flatten flatten(df_nested_B).printSchema() root |-- arrayA__childStructB_field1: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: long (containsNull = true) |-- arrayA__childStructB_field2: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: string (containsNull = true)

flatten(df_nested_B).show() +---------------------------+---------------------------+ |arrayA__childStructB_field1|arrayA__childStructB_field2| +---------------------------+---------------------------+ | [[1, 2]]| [[foo, bar]]| +---------------------------+---------------------------+

flatten(df_nested_C).printSchema() root |-- arrayA_childStructB_childArrayC_field1: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: long (containsNull = true) |-- arrayA_childStructB_childArrayC_field2: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: string (containsNull = true) []: https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/functions.html#flatten-org.apache.spark.sql.Column %prep %autosetup -n sparkaid-1.0.0 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-sparkaid -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Mon Apr 10 2023 Python_Bot - 1.0.0-1 - Package Spec generated