%global _empty_manifest_terminate_build 0 Name: python-PySparkIP Version: 1.2.4 Release: 1 Summary: An API for working with IP addresses in Apache Spark. License: Apache Software License URL: https://github.com/jshalaby510/PySparkIP Source0: https://mirrors.aliyun.com/pypi/web/packages/12/25/1c587c11b9316b8ecf13c8ec16ddd5c331071702ade401adda4f6b609aeb/PySparkIP-1.2.4.tar.gz BuildArch: noarch %description [![license](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](https://github.com/jshalaby510/PySparkIP/blob/main/LICENSE) # PySparkIP An API for working with IP addresses in Apache Spark. Built on top of [ipaddress](https://docs.python.org/3/library/ipaddress.html). ## Usage * pip install PySparkIP * from PySparkIP import * ## License This project is licensed under the Apache License. Please see [LICENSE](LICENSE) file for more details. ## Tutorial ### Initialize Before using in SparkSQL, initialize PySparkIP by passing `spark` to `PySparkIP`, then define `IPAddressUDT()` in the schema.
Optionally pass the log level as well (if left unspecified, `PySparkIP` resets the log level to "WARN" and gives a warning message).
NOTE: Values that are unsuccessfully converted to IP addresses will be converted to "::" ```python from PySparkIP import * # Initialize for SparkSQL use (not needed for pure PySpark) PySparkIP(spark) # or PySparkIP(spark, "DEBUG"), PySparkIP(spark, "FATAL"), etc if specifying a log level schema = StructType([StructField("IPAddress", IPAddressUDT())]) ipDF = spark.read.json("ipFile.json", schema=schema) ipDF.createOrReplaceTempView("IPAddresses") ``` ### Functions **Cast StringType() to IPAddressUDT()** ```python # PySpark ipDF = ipDF.select(to_ip('ip_string')) # SparkSQL spark.sql("SELECT to_ip(ip_string) FROM IPAddresses") ``` **Check address type** ```python # Multicast ipDF.select('*').withColumn("IPColumn", isMulticast("IPAddress")) spark.sql("SELECT * FROM IPAddresses WHERE isMulticast(IPAddress)") """ Other address types: isPrivate, isGlobal, isUnspecified, isReserved, isLoopback, isLinkLocal, isIPv4Mapped, is6to4, isTeredo, isIPv4, isIPv6 """ ``` **Output address in different formats** ```python # Exploded spark.sql("SELECT explodedIP(IPAddress) FROM IPAddresses") ipDF.select(explodedIP("IPAddress")) # Compressed spark.sql("SELECT compressedIP(IPAddress) FROM IPAddresses") ipDF.select(compressedIP("IPAddress")) ``` **Sort IP Addresses** ```python # SparkSQL doesn't support values > LONG_MAX # To sort IPv6 addresses, use ipAsBinary # To sort IPv4 addresses, use either ipv4AsNum or ipAsBinary, but ipv4AsNum is more efficient # Sort IPv4 and IPv6 spark.sql("SELECT * FROM IPAddresses SORT BY ipAsBinary(IPAddress)") ipDF.select('*').sort(ipAsBinary("IPAddress")) # Sort ONLY IPv4 spark.sql("SELECT * FROM IPv4 SORT BY ipv4AsNum(IPAddress)") ipv4DF.select('*').sort(ipv4AsNum("IPAddress")) ``` **IP network functions** ```python # Network contains spark.sql("SELECT * FROM IPAddresses WHERE networkContains(IPAddress, '195.0.0.0/16')") ipDF.select('*').filter("networkContains(IPAddress, '195.0.0.0/16')") ipDF.select('*').withColumn("netCol", networkContains("192.0.0.0/16")("IPAddress")) # Or use ipaddress.ip_network objects net1 = ipaddress.ip_network('::/10') ipDF.select('*').filter(networkContains(net1)("IPAddress")) ``` **IP Set** #### Create IP Sets (Note: This functionality also works with add and remove): ```python # Strings ipStr = '192.0.0.0' netStr = '225.0.0.0' # Tuples, lists, or sets ip_net_mix = ('::5', '5.0.0.0/8', '111.8.9.7') # ipaddress objects ipAddr = ipaddress.ip_address('::') # Dataframes ipMulticastDF = spark.sql("SELECT IPAddress FROM IPAddresses WHERE isMulticast(IPAddress)") """ Or use our predefined networks (multicastIPs, privateIPs, publicIPs, reservedIPs, unspecifiedIPs, linkLocalIPs, loopBackIPs, ipv4MappedIPs, ipv4TranslatedIPs, ipv4ipv6TranslatedIPs, teredoIPs, sixToFourIPs, or siteLocalIPs) """ # Mix them together ipSet = IPSet(ipStr, '::/16', '2001::', netStr, ip_net_mix, privateIPs) ipSet2 = IPSet("6::", "9.0.8.7", ipAddr, ipMulticastDF) # Use other IPSets ipSet3 = IPSet(ipSet, ipSet2) # Or just make an empty set ipSet4 = IPSet() ``` #### Use IP Sets: ```python # Initialize an IP Set setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"} ipSet = IPSet(setOfIPs) # Use it! ipDF.select('*').filter("setContains(IPAddress, 'ipSet')") ipDF.select('*').withColumn("setCol", setContains(ipSet)("IPAddress")) ``` #### Register IP Sets for use in SparkSQL: Before using IP Sets in SparkSQL, register it by passing it to `PySparkIPSets` ```python ipSet = IPSet('::') ipSet2 = IPSet() # Pass the set, then the set name PySparkIPSets.add(ipSet, 'ipSet') PySparkIPSets.add(ipSet2, 'ipSet2') ``` #### Remove IP Sets from registered sets in SparkSQL: ```python PySparkIPSets.remove('ipSet', 'ipSet2') ``` #### Use IP Sets in SparkSQL: ```python # Note you have to pass the variable name using SparkSQL, not the actual variable # Initialize an IP Set setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"} ipSet = IPSet(setOfIPs) # Register it PySparkIPSets.add(ipSet, 'ipSet') #Use it! # Set Contains spark.sql("SELECT * FROM IPAddresses WHERE setContains(IPAddress, 'ipSet')") # Show sets available to use PySparkIPSets.setsAvailable() # Remove a set PySparkIPSets.remove('ipSet') # Clear sets available PySparkIPSets.clear() ``` #### IP Set functions (outside Spark): ```python ipSet = IPSet() # Add ipSet.add('0.0.0.0', '::/16') # Remove ipSet.remove('::/16') # Contains ipSet.contains('0.0.0.0') # Clear ipSet.clear() # Show all ipSet.showAll() # Union ipSet2 = ('2001::', '::33', 'ffff::f') ipSet.union(ipSet2) # Intersection ipSet.intersects(ipSet2) # Diff ipSet.diff(ipSet2) # Show All ipSet.showAll() # Return All ipSet.returnAll() # Is empty ipSet.isEmpty() # Compare IPSets ipSet2 = ('2001::', '::33', 'ffff::f') ipSet == ipSet2 ipSet != ipSet2 # Return the # of elements in the set len(ipSet) ``` #### Other operations (outside Spark): ```python # Nets intersect net1 = '192.0.0.0/16' net2 = '192.0.0.0/8' # or ipaddress.ip_network('192.0.0.0/8') netsIntersect(net1, net2) ``` %package -n python3-PySparkIP Summary: An API for working with IP addresses in Apache Spark. Provides: python-PySparkIP BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-PySparkIP [![license](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](https://github.com/jshalaby510/PySparkIP/blob/main/LICENSE) # PySparkIP An API for working with IP addresses in Apache Spark. Built on top of [ipaddress](https://docs.python.org/3/library/ipaddress.html). ## Usage * pip install PySparkIP * from PySparkIP import * ## License This project is licensed under the Apache License. Please see [LICENSE](LICENSE) file for more details. ## Tutorial ### Initialize Before using in SparkSQL, initialize PySparkIP by passing `spark` to `PySparkIP`, then define `IPAddressUDT()` in the schema.
Optionally pass the log level as well (if left unspecified, `PySparkIP` resets the log level to "WARN" and gives a warning message).
NOTE: Values that are unsuccessfully converted to IP addresses will be converted to "::" ```python from PySparkIP import * # Initialize for SparkSQL use (not needed for pure PySpark) PySparkIP(spark) # or PySparkIP(spark, "DEBUG"), PySparkIP(spark, "FATAL"), etc if specifying a log level schema = StructType([StructField("IPAddress", IPAddressUDT())]) ipDF = spark.read.json("ipFile.json", schema=schema) ipDF.createOrReplaceTempView("IPAddresses") ``` ### Functions **Cast StringType() to IPAddressUDT()** ```python # PySpark ipDF = ipDF.select(to_ip('ip_string')) # SparkSQL spark.sql("SELECT to_ip(ip_string) FROM IPAddresses") ``` **Check address type** ```python # Multicast ipDF.select('*').withColumn("IPColumn", isMulticast("IPAddress")) spark.sql("SELECT * FROM IPAddresses WHERE isMulticast(IPAddress)") """ Other address types: isPrivate, isGlobal, isUnspecified, isReserved, isLoopback, isLinkLocal, isIPv4Mapped, is6to4, isTeredo, isIPv4, isIPv6 """ ``` **Output address in different formats** ```python # Exploded spark.sql("SELECT explodedIP(IPAddress) FROM IPAddresses") ipDF.select(explodedIP("IPAddress")) # Compressed spark.sql("SELECT compressedIP(IPAddress) FROM IPAddresses") ipDF.select(compressedIP("IPAddress")) ``` **Sort IP Addresses** ```python # SparkSQL doesn't support values > LONG_MAX # To sort IPv6 addresses, use ipAsBinary # To sort IPv4 addresses, use either ipv4AsNum or ipAsBinary, but ipv4AsNum is more efficient # Sort IPv4 and IPv6 spark.sql("SELECT * FROM IPAddresses SORT BY ipAsBinary(IPAddress)") ipDF.select('*').sort(ipAsBinary("IPAddress")) # Sort ONLY IPv4 spark.sql("SELECT * FROM IPv4 SORT BY ipv4AsNum(IPAddress)") ipv4DF.select('*').sort(ipv4AsNum("IPAddress")) ``` **IP network functions** ```python # Network contains spark.sql("SELECT * FROM IPAddresses WHERE networkContains(IPAddress, '195.0.0.0/16')") ipDF.select('*').filter("networkContains(IPAddress, '195.0.0.0/16')") ipDF.select('*').withColumn("netCol", networkContains("192.0.0.0/16")("IPAddress")) # Or use ipaddress.ip_network objects net1 = ipaddress.ip_network('::/10') ipDF.select('*').filter(networkContains(net1)("IPAddress")) ``` **IP Set** #### Create IP Sets (Note: This functionality also works with add and remove): ```python # Strings ipStr = '192.0.0.0' netStr = '225.0.0.0' # Tuples, lists, or sets ip_net_mix = ('::5', '5.0.0.0/8', '111.8.9.7') # ipaddress objects ipAddr = ipaddress.ip_address('::') # Dataframes ipMulticastDF = spark.sql("SELECT IPAddress FROM IPAddresses WHERE isMulticast(IPAddress)") """ Or use our predefined networks (multicastIPs, privateIPs, publicIPs, reservedIPs, unspecifiedIPs, linkLocalIPs, loopBackIPs, ipv4MappedIPs, ipv4TranslatedIPs, ipv4ipv6TranslatedIPs, teredoIPs, sixToFourIPs, or siteLocalIPs) """ # Mix them together ipSet = IPSet(ipStr, '::/16', '2001::', netStr, ip_net_mix, privateIPs) ipSet2 = IPSet("6::", "9.0.8.7", ipAddr, ipMulticastDF) # Use other IPSets ipSet3 = IPSet(ipSet, ipSet2) # Or just make an empty set ipSet4 = IPSet() ``` #### Use IP Sets: ```python # Initialize an IP Set setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"} ipSet = IPSet(setOfIPs) # Use it! ipDF.select('*').filter("setContains(IPAddress, 'ipSet')") ipDF.select('*').withColumn("setCol", setContains(ipSet)("IPAddress")) ``` #### Register IP Sets for use in SparkSQL: Before using IP Sets in SparkSQL, register it by passing it to `PySparkIPSets` ```python ipSet = IPSet('::') ipSet2 = IPSet() # Pass the set, then the set name PySparkIPSets.add(ipSet, 'ipSet') PySparkIPSets.add(ipSet2, 'ipSet2') ``` #### Remove IP Sets from registered sets in SparkSQL: ```python PySparkIPSets.remove('ipSet', 'ipSet2') ``` #### Use IP Sets in SparkSQL: ```python # Note you have to pass the variable name using SparkSQL, not the actual variable # Initialize an IP Set setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"} ipSet = IPSet(setOfIPs) # Register it PySparkIPSets.add(ipSet, 'ipSet') #Use it! # Set Contains spark.sql("SELECT * FROM IPAddresses WHERE setContains(IPAddress, 'ipSet')") # Show sets available to use PySparkIPSets.setsAvailable() # Remove a set PySparkIPSets.remove('ipSet') # Clear sets available PySparkIPSets.clear() ``` #### IP Set functions (outside Spark): ```python ipSet = IPSet() # Add ipSet.add('0.0.0.0', '::/16') # Remove ipSet.remove('::/16') # Contains ipSet.contains('0.0.0.0') # Clear ipSet.clear() # Show all ipSet.showAll() # Union ipSet2 = ('2001::', '::33', 'ffff::f') ipSet.union(ipSet2) # Intersection ipSet.intersects(ipSet2) # Diff ipSet.diff(ipSet2) # Show All ipSet.showAll() # Return All ipSet.returnAll() # Is empty ipSet.isEmpty() # Compare IPSets ipSet2 = ('2001::', '::33', 'ffff::f') ipSet == ipSet2 ipSet != ipSet2 # Return the # of elements in the set len(ipSet) ``` #### Other operations (outside Spark): ```python # Nets intersect net1 = '192.0.0.0/16' net2 = '192.0.0.0/8' # or ipaddress.ip_network('192.0.0.0/8') netsIntersect(net1, net2) ``` %package help Summary: Development documents and examples for PySparkIP Provides: python3-PySparkIP-doc %description help [![license](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](https://github.com/jshalaby510/PySparkIP/blob/main/LICENSE) # PySparkIP An API for working with IP addresses in Apache Spark. Built on top of [ipaddress](https://docs.python.org/3/library/ipaddress.html). ## Usage * pip install PySparkIP * from PySparkIP import * ## License This project is licensed under the Apache License. Please see [LICENSE](LICENSE) file for more details. ## Tutorial ### Initialize Before using in SparkSQL, initialize PySparkIP by passing `spark` to `PySparkIP`, then define `IPAddressUDT()` in the schema.
Optionally pass the log level as well (if left unspecified, `PySparkIP` resets the log level to "WARN" and gives a warning message).
NOTE: Values that are unsuccessfully converted to IP addresses will be converted to "::" ```python from PySparkIP import * # Initialize for SparkSQL use (not needed for pure PySpark) PySparkIP(spark) # or PySparkIP(spark, "DEBUG"), PySparkIP(spark, "FATAL"), etc if specifying a log level schema = StructType([StructField("IPAddress", IPAddressUDT())]) ipDF = spark.read.json("ipFile.json", schema=schema) ipDF.createOrReplaceTempView("IPAddresses") ``` ### Functions **Cast StringType() to IPAddressUDT()** ```python # PySpark ipDF = ipDF.select(to_ip('ip_string')) # SparkSQL spark.sql("SELECT to_ip(ip_string) FROM IPAddresses") ``` **Check address type** ```python # Multicast ipDF.select('*').withColumn("IPColumn", isMulticast("IPAddress")) spark.sql("SELECT * FROM IPAddresses WHERE isMulticast(IPAddress)") """ Other address types: isPrivate, isGlobal, isUnspecified, isReserved, isLoopback, isLinkLocal, isIPv4Mapped, is6to4, isTeredo, isIPv4, isIPv6 """ ``` **Output address in different formats** ```python # Exploded spark.sql("SELECT explodedIP(IPAddress) FROM IPAddresses") ipDF.select(explodedIP("IPAddress")) # Compressed spark.sql("SELECT compressedIP(IPAddress) FROM IPAddresses") ipDF.select(compressedIP("IPAddress")) ``` **Sort IP Addresses** ```python # SparkSQL doesn't support values > LONG_MAX # To sort IPv6 addresses, use ipAsBinary # To sort IPv4 addresses, use either ipv4AsNum or ipAsBinary, but ipv4AsNum is more efficient # Sort IPv4 and IPv6 spark.sql("SELECT * FROM IPAddresses SORT BY ipAsBinary(IPAddress)") ipDF.select('*').sort(ipAsBinary("IPAddress")) # Sort ONLY IPv4 spark.sql("SELECT * FROM IPv4 SORT BY ipv4AsNum(IPAddress)") ipv4DF.select('*').sort(ipv4AsNum("IPAddress")) ``` **IP network functions** ```python # Network contains spark.sql("SELECT * FROM IPAddresses WHERE networkContains(IPAddress, '195.0.0.0/16')") ipDF.select('*').filter("networkContains(IPAddress, '195.0.0.0/16')") ipDF.select('*').withColumn("netCol", networkContains("192.0.0.0/16")("IPAddress")) # Or use ipaddress.ip_network objects net1 = ipaddress.ip_network('::/10') ipDF.select('*').filter(networkContains(net1)("IPAddress")) ``` **IP Set** #### Create IP Sets (Note: This functionality also works with add and remove): ```python # Strings ipStr = '192.0.0.0' netStr = '225.0.0.0' # Tuples, lists, or sets ip_net_mix = ('::5', '5.0.0.0/8', '111.8.9.7') # ipaddress objects ipAddr = ipaddress.ip_address('::') # Dataframes ipMulticastDF = spark.sql("SELECT IPAddress FROM IPAddresses WHERE isMulticast(IPAddress)") """ Or use our predefined networks (multicastIPs, privateIPs, publicIPs, reservedIPs, unspecifiedIPs, linkLocalIPs, loopBackIPs, ipv4MappedIPs, ipv4TranslatedIPs, ipv4ipv6TranslatedIPs, teredoIPs, sixToFourIPs, or siteLocalIPs) """ # Mix them together ipSet = IPSet(ipStr, '::/16', '2001::', netStr, ip_net_mix, privateIPs) ipSet2 = IPSet("6::", "9.0.8.7", ipAddr, ipMulticastDF) # Use other IPSets ipSet3 = IPSet(ipSet, ipSet2) # Or just make an empty set ipSet4 = IPSet() ``` #### Use IP Sets: ```python # Initialize an IP Set setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"} ipSet = IPSet(setOfIPs) # Use it! ipDF.select('*').filter("setContains(IPAddress, 'ipSet')") ipDF.select('*').withColumn("setCol", setContains(ipSet)("IPAddress")) ``` #### Register IP Sets for use in SparkSQL: Before using IP Sets in SparkSQL, register it by passing it to `PySparkIPSets` ```python ipSet = IPSet('::') ipSet2 = IPSet() # Pass the set, then the set name PySparkIPSets.add(ipSet, 'ipSet') PySparkIPSets.add(ipSet2, 'ipSet2') ``` #### Remove IP Sets from registered sets in SparkSQL: ```python PySparkIPSets.remove('ipSet', 'ipSet2') ``` #### Use IP Sets in SparkSQL: ```python # Note you have to pass the variable name using SparkSQL, not the actual variable # Initialize an IP Set setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"} ipSet = IPSet(setOfIPs) # Register it PySparkIPSets.add(ipSet, 'ipSet') #Use it! # Set Contains spark.sql("SELECT * FROM IPAddresses WHERE setContains(IPAddress, 'ipSet')") # Show sets available to use PySparkIPSets.setsAvailable() # Remove a set PySparkIPSets.remove('ipSet') # Clear sets available PySparkIPSets.clear() ``` #### IP Set functions (outside Spark): ```python ipSet = IPSet() # Add ipSet.add('0.0.0.0', '::/16') # Remove ipSet.remove('::/16') # Contains ipSet.contains('0.0.0.0') # Clear ipSet.clear() # Show all ipSet.showAll() # Union ipSet2 = ('2001::', '::33', 'ffff::f') ipSet.union(ipSet2) # Intersection ipSet.intersects(ipSet2) # Diff ipSet.diff(ipSet2) # Show All ipSet.showAll() # Return All ipSet.returnAll() # Is empty ipSet.isEmpty() # Compare IPSets ipSet2 = ('2001::', '::33', 'ffff::f') ipSet == ipSet2 ipSet != ipSet2 # Return the # of elements in the set len(ipSet) ``` #### Other operations (outside Spark): ```python # Nets intersect net1 = '192.0.0.0/16' net2 = '192.0.0.0/8' # or ipaddress.ip_network('192.0.0.0/8') netsIntersect(net1, net2) ``` %prep %autosetup -n PySparkIP-1.2.4 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-PySparkIP -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue Jun 20 2023 Python_Bot - 1.2.4-1 - Package Spec generated