summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-nvgpu.spec645
-rw-r--r--sources1
3 files changed, 647 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..96e4aed 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/nvgpu-0.10.0.tar.gz
diff --git a/python-nvgpu.spec b/python-nvgpu.spec
new file mode 100644
index 0000000..9edbb20
--- /dev/null
+++ b/python-nvgpu.spec
@@ -0,0 +1,645 @@
+%global _empty_manifest_terminate_build 0
+Name: python-nvgpu
+Version: 0.10.0
+Release: 1
+Summary: NVIDIA GPU tools
+License: MIT
+URL: https://github.com/rossumai/nvgpu
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/1a/95/5b99a5798b366ab242fe0b2190f3814b9321eb98c6e1e9c6b599b2b4ce84/nvgpu-0.10.0.tar.gz
+BuildArch: noarch
+
+
+%description
+# `nvgpu` - NVIDIA GPU tools
+
+It provides information about GPUs and their availability for computation.
+
+Often we want to train a ML model on one of GPUs installed on a multi-GPU
+machine. Since TensorFlow allocates all memory, only one such process can
+use the GPU at a time. Unfortunately `nvidia-smi` provides only a text
+interface with information about GPUs. This packages wraps it with an
+easier to use CLI and Python interface.
+
+It's a quick and dirty solution calling `nvidia-smi` and parsing its output.
+We can take one or more GPUs availabile for computation based on relative
+memory usage, ie. it is OK with Xorg taking a few MB.
+
+In addition we have a fancy table of GPU with more information taken by
+python binding to NVML.
+
+For easier monitoring of multiple machines it's possible to deploy agents (that
+provide the GPU information in JSON over a REST API) and show the aggregated
+status in a web application.
+
+## Installing
+
+For a user:
+
+```bash
+pip install nvgpu
+```
+
+or to the system:
+
+```bash
+sudo -H pip install nvgpu
+```
+
+## Usage examples
+
+Command-line interface:
+
+```bash
+# grab all available GPUs
+CUDA_VISIBLE_DEVICES=$(nvgpu available)
+
+# grab at most available GPU
+CUDA_VISIBLE_DEVICES=$(nvgpu available -l 1)
+```
+
+Print pretty colored table of devices, availability, users, processes:
+
+```
+$ nvgpu list
+ status type util. temp. MHz users since pids cmd
+-- -------- ------------------- ------- ------- ----- ------- --------------- ------ --------
+ 0 [ ] GeForce GTX 1070 0 % 44 139
+ 1 [~] GeForce GTX 1080 Ti 0 % 44 139 alice 2 days ago 19028 jupyter
+ 2 [~] GeForce GTX 1080 Ti 0 % 44 139 bob 14 hours ago 8479 jupyter
+ 3 [~] GeForce GTX 1070 46 % 54 1506 bob 7 days ago 20883 train.py
+ 4 [~] GeForce GTX 1070 35 % 64 1480 bob 7 days ago 26228 evaluate.py
+ 5 [!] GeForce GTX 1080 Ti 0 % 44 139 ? 9305
+ 6 [ ] GeForce GTX 1080 Ti 0 % 44 139
+```
+
+Or shortcut:
+
+```
+$ nvl
+```
+
+Python API:
+
+```python
+import nvgpu
+
+nvgpu.available_gpus()
+# ['0', '2']
+
+nvgpu.gpu_info()
+[{'index': '0',
+ 'mem_total': 8119,
+ 'mem_used': 7881,
+ 'mem_used_percent': 97.06860450794433,
+ 'type': 'GeForce GTX 1070',
+ 'uuid': 'GPU-3aa99ee6-4a9f-470e-3798-70aaed942689'},
+ {'index': '1',
+ 'mem_total': 11178,
+ 'mem_used': 10795,
+ 'mem_used_percent': 96.57362676686348,
+ 'type': 'GeForce GTX 1080 Ti',
+ 'uuid': 'GPU-60410ded-5218-7b06-9c7a-124b77a22447'},
+ {'index': '2',
+ 'mem_total': 11178,
+ 'mem_used': 10789,
+ 'mem_used_percent': 96.51994990159241,
+ 'type': 'GeForce GTX 1080 Ti',
+ 'uuid': 'GPU-d0a77bd4-cc70-ca82-54d6-4e2018cfdca6'},
+ ...
+]
+```
+
+## Web application with agents
+
+There are multiple nodes. Agents take info from GPU and provide it in JSON via
+REST API. Master gathers info from other nodes and displays it in a HTML page.
+Agents can also display their status by default.
+
+### Agent
+
+```bash
+FLASK_APP=nvgpu.webapp flask run --host 0.0.0.0 --port 1080
+```
+
+### Master
+
+Set agents into a config file. Agent is specified either via a URL to a remote
+machine or `'self'` for direct access to local machine. Remove `'self'` if the
+machine itself does not have any GPU. Default is `AGENTS = ['self']`, so that
+agents also display their own status. Set `AGENTS = []` to avoid this.
+
+```
+# nvgpu_master.cfg
+AGENTS = [
+ 'self', # node01 - master - direct access without using HTTP
+ 'http://node02:1080',
+ 'http://node03:1080',
+ 'http://node04:1080',
+]
+```
+
+```bash
+NVGPU_CLUSTER_CFG=/path/to/nvgpu_master.cfg FLASK_APP=nvgpu.webapp flask run --host 0.0.0.0 --port 1080
+```
+
+Open the master in the web browser: http://node01:1080.
+
+## Installing as a service
+
+On Ubuntu with `systemd` we can install the agents/master as as service to be
+ran automatically on system start.
+
+```bash
+# create an unprivileged system user
+sudo useradd -r nvgpu
+```
+
+Copy [nvgpu-agent.service](nvgpu-agent.service) to:
+
+```bash
+sudo vi /etc/systemd/system/nvgpu-agent.service
+```
+
+Set agents to the configuration file for the master:
+
+```bash
+sudo vi /etc/nvgpu.conf
+```
+
+```python
+AGENTS = [
+ # direct access without using HTTP
+ 'self',
+ 'http://node01:1080',
+ 'http://node02:1080',
+ 'http://node03:1080',
+ 'http://node04:1080',
+]
+```
+
+Set up and start the service:
+
+```bash
+# enable for automatic startup at boot
+sudo systemctl enable nvgpu-agent.service
+# start
+sudo systemctl start nvgpu-agent.service
+# check the status
+sudo systemctl status nvgpu-agent.service
+```
+
+```bash
+# check the service
+open http://localhost:1080
+```
+
+## Author
+
+- Bohumír Zámečník, [Rossum, Ltd.](https://rossum.ai/)
+- License: MIT
+
+## TODO
+
+- order GPUs by priority (decreasing power, decreasing free memory)
+
+
+%package -n python3-nvgpu
+Summary: NVIDIA GPU tools
+Provides: python-nvgpu
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-nvgpu
+# `nvgpu` - NVIDIA GPU tools
+
+It provides information about GPUs and their availability for computation.
+
+Often we want to train a ML model on one of GPUs installed on a multi-GPU
+machine. Since TensorFlow allocates all memory, only one such process can
+use the GPU at a time. Unfortunately `nvidia-smi` provides only a text
+interface with information about GPUs. This packages wraps it with an
+easier to use CLI and Python interface.
+
+It's a quick and dirty solution calling `nvidia-smi` and parsing its output.
+We can take one or more GPUs availabile for computation based on relative
+memory usage, ie. it is OK with Xorg taking a few MB.
+
+In addition we have a fancy table of GPU with more information taken by
+python binding to NVML.
+
+For easier monitoring of multiple machines it's possible to deploy agents (that
+provide the GPU information in JSON over a REST API) and show the aggregated
+status in a web application.
+
+## Installing
+
+For a user:
+
+```bash
+pip install nvgpu
+```
+
+or to the system:
+
+```bash
+sudo -H pip install nvgpu
+```
+
+## Usage examples
+
+Command-line interface:
+
+```bash
+# grab all available GPUs
+CUDA_VISIBLE_DEVICES=$(nvgpu available)
+
+# grab at most available GPU
+CUDA_VISIBLE_DEVICES=$(nvgpu available -l 1)
+```
+
+Print pretty colored table of devices, availability, users, processes:
+
+```
+$ nvgpu list
+ status type util. temp. MHz users since pids cmd
+-- -------- ------------------- ------- ------- ----- ------- --------------- ------ --------
+ 0 [ ] GeForce GTX 1070 0 % 44 139
+ 1 [~] GeForce GTX 1080 Ti 0 % 44 139 alice 2 days ago 19028 jupyter
+ 2 [~] GeForce GTX 1080 Ti 0 % 44 139 bob 14 hours ago 8479 jupyter
+ 3 [~] GeForce GTX 1070 46 % 54 1506 bob 7 days ago 20883 train.py
+ 4 [~] GeForce GTX 1070 35 % 64 1480 bob 7 days ago 26228 evaluate.py
+ 5 [!] GeForce GTX 1080 Ti 0 % 44 139 ? 9305
+ 6 [ ] GeForce GTX 1080 Ti 0 % 44 139
+```
+
+Or shortcut:
+
+```
+$ nvl
+```
+
+Python API:
+
+```python
+import nvgpu
+
+nvgpu.available_gpus()
+# ['0', '2']
+
+nvgpu.gpu_info()
+[{'index': '0',
+ 'mem_total': 8119,
+ 'mem_used': 7881,
+ 'mem_used_percent': 97.06860450794433,
+ 'type': 'GeForce GTX 1070',
+ 'uuid': 'GPU-3aa99ee6-4a9f-470e-3798-70aaed942689'},
+ {'index': '1',
+ 'mem_total': 11178,
+ 'mem_used': 10795,
+ 'mem_used_percent': 96.57362676686348,
+ 'type': 'GeForce GTX 1080 Ti',
+ 'uuid': 'GPU-60410ded-5218-7b06-9c7a-124b77a22447'},
+ {'index': '2',
+ 'mem_total': 11178,
+ 'mem_used': 10789,
+ 'mem_used_percent': 96.51994990159241,
+ 'type': 'GeForce GTX 1080 Ti',
+ 'uuid': 'GPU-d0a77bd4-cc70-ca82-54d6-4e2018cfdca6'},
+ ...
+]
+```
+
+## Web application with agents
+
+There are multiple nodes. Agents take info from GPU and provide it in JSON via
+REST API. Master gathers info from other nodes and displays it in a HTML page.
+Agents can also display their status by default.
+
+### Agent
+
+```bash
+FLASK_APP=nvgpu.webapp flask run --host 0.0.0.0 --port 1080
+```
+
+### Master
+
+Set agents into a config file. Agent is specified either via a URL to a remote
+machine or `'self'` for direct access to local machine. Remove `'self'` if the
+machine itself does not have any GPU. Default is `AGENTS = ['self']`, so that
+agents also display their own status. Set `AGENTS = []` to avoid this.
+
+```
+# nvgpu_master.cfg
+AGENTS = [
+ 'self', # node01 - master - direct access without using HTTP
+ 'http://node02:1080',
+ 'http://node03:1080',
+ 'http://node04:1080',
+]
+```
+
+```bash
+NVGPU_CLUSTER_CFG=/path/to/nvgpu_master.cfg FLASK_APP=nvgpu.webapp flask run --host 0.0.0.0 --port 1080
+```
+
+Open the master in the web browser: http://node01:1080.
+
+## Installing as a service
+
+On Ubuntu with `systemd` we can install the agents/master as as service to be
+ran automatically on system start.
+
+```bash
+# create an unprivileged system user
+sudo useradd -r nvgpu
+```
+
+Copy [nvgpu-agent.service](nvgpu-agent.service) to:
+
+```bash
+sudo vi /etc/systemd/system/nvgpu-agent.service
+```
+
+Set agents to the configuration file for the master:
+
+```bash
+sudo vi /etc/nvgpu.conf
+```
+
+```python
+AGENTS = [
+ # direct access without using HTTP
+ 'self',
+ 'http://node01:1080',
+ 'http://node02:1080',
+ 'http://node03:1080',
+ 'http://node04:1080',
+]
+```
+
+Set up and start the service:
+
+```bash
+# enable for automatic startup at boot
+sudo systemctl enable nvgpu-agent.service
+# start
+sudo systemctl start nvgpu-agent.service
+# check the status
+sudo systemctl status nvgpu-agent.service
+```
+
+```bash
+# check the service
+open http://localhost:1080
+```
+
+## Author
+
+- Bohumír Zámečník, [Rossum, Ltd.](https://rossum.ai/)
+- License: MIT
+
+## TODO
+
+- order GPUs by priority (decreasing power, decreasing free memory)
+
+
+%package help
+Summary: Development documents and examples for nvgpu
+Provides: python3-nvgpu-doc
+%description help
+# `nvgpu` - NVIDIA GPU tools
+
+It provides information about GPUs and their availability for computation.
+
+Often we want to train a ML model on one of GPUs installed on a multi-GPU
+machine. Since TensorFlow allocates all memory, only one such process can
+use the GPU at a time. Unfortunately `nvidia-smi` provides only a text
+interface with information about GPUs. This packages wraps it with an
+easier to use CLI and Python interface.
+
+It's a quick and dirty solution calling `nvidia-smi` and parsing its output.
+We can take one or more GPUs availabile for computation based on relative
+memory usage, ie. it is OK with Xorg taking a few MB.
+
+In addition we have a fancy table of GPU with more information taken by
+python binding to NVML.
+
+For easier monitoring of multiple machines it's possible to deploy agents (that
+provide the GPU information in JSON over a REST API) and show the aggregated
+status in a web application.
+
+## Installing
+
+For a user:
+
+```bash
+pip install nvgpu
+```
+
+or to the system:
+
+```bash
+sudo -H pip install nvgpu
+```
+
+## Usage examples
+
+Command-line interface:
+
+```bash
+# grab all available GPUs
+CUDA_VISIBLE_DEVICES=$(nvgpu available)
+
+# grab at most available GPU
+CUDA_VISIBLE_DEVICES=$(nvgpu available -l 1)
+```
+
+Print pretty colored table of devices, availability, users, processes:
+
+```
+$ nvgpu list
+ status type util. temp. MHz users since pids cmd
+-- -------- ------------------- ------- ------- ----- ------- --------------- ------ --------
+ 0 [ ] GeForce GTX 1070 0 % 44 139
+ 1 [~] GeForce GTX 1080 Ti 0 % 44 139 alice 2 days ago 19028 jupyter
+ 2 [~] GeForce GTX 1080 Ti 0 % 44 139 bob 14 hours ago 8479 jupyter
+ 3 [~] GeForce GTX 1070 46 % 54 1506 bob 7 days ago 20883 train.py
+ 4 [~] GeForce GTX 1070 35 % 64 1480 bob 7 days ago 26228 evaluate.py
+ 5 [!] GeForce GTX 1080 Ti 0 % 44 139 ? 9305
+ 6 [ ] GeForce GTX 1080 Ti 0 % 44 139
+```
+
+Or shortcut:
+
+```
+$ nvl
+```
+
+Python API:
+
+```python
+import nvgpu
+
+nvgpu.available_gpus()
+# ['0', '2']
+
+nvgpu.gpu_info()
+[{'index': '0',
+ 'mem_total': 8119,
+ 'mem_used': 7881,
+ 'mem_used_percent': 97.06860450794433,
+ 'type': 'GeForce GTX 1070',
+ 'uuid': 'GPU-3aa99ee6-4a9f-470e-3798-70aaed942689'},
+ {'index': '1',
+ 'mem_total': 11178,
+ 'mem_used': 10795,
+ 'mem_used_percent': 96.57362676686348,
+ 'type': 'GeForce GTX 1080 Ti',
+ 'uuid': 'GPU-60410ded-5218-7b06-9c7a-124b77a22447'},
+ {'index': '2',
+ 'mem_total': 11178,
+ 'mem_used': 10789,
+ 'mem_used_percent': 96.51994990159241,
+ 'type': 'GeForce GTX 1080 Ti',
+ 'uuid': 'GPU-d0a77bd4-cc70-ca82-54d6-4e2018cfdca6'},
+ ...
+]
+```
+
+## Web application with agents
+
+There are multiple nodes. Agents take info from GPU and provide it in JSON via
+REST API. Master gathers info from other nodes and displays it in a HTML page.
+Agents can also display their status by default.
+
+### Agent
+
+```bash
+FLASK_APP=nvgpu.webapp flask run --host 0.0.0.0 --port 1080
+```
+
+### Master
+
+Set agents into a config file. Agent is specified either via a URL to a remote
+machine or `'self'` for direct access to local machine. Remove `'self'` if the
+machine itself does not have any GPU. Default is `AGENTS = ['self']`, so that
+agents also display their own status. Set `AGENTS = []` to avoid this.
+
+```
+# nvgpu_master.cfg
+AGENTS = [
+ 'self', # node01 - master - direct access without using HTTP
+ 'http://node02:1080',
+ 'http://node03:1080',
+ 'http://node04:1080',
+]
+```
+
+```bash
+NVGPU_CLUSTER_CFG=/path/to/nvgpu_master.cfg FLASK_APP=nvgpu.webapp flask run --host 0.0.0.0 --port 1080
+```
+
+Open the master in the web browser: http://node01:1080.
+
+## Installing as a service
+
+On Ubuntu with `systemd` we can install the agents/master as as service to be
+ran automatically on system start.
+
+```bash
+# create an unprivileged system user
+sudo useradd -r nvgpu
+```
+
+Copy [nvgpu-agent.service](nvgpu-agent.service) to:
+
+```bash
+sudo vi /etc/systemd/system/nvgpu-agent.service
+```
+
+Set agents to the configuration file for the master:
+
+```bash
+sudo vi /etc/nvgpu.conf
+```
+
+```python
+AGENTS = [
+ # direct access without using HTTP
+ 'self',
+ 'http://node01:1080',
+ 'http://node02:1080',
+ 'http://node03:1080',
+ 'http://node04:1080',
+]
+```
+
+Set up and start the service:
+
+```bash
+# enable for automatic startup at boot
+sudo systemctl enable nvgpu-agent.service
+# start
+sudo systemctl start nvgpu-agent.service
+# check the status
+sudo systemctl status nvgpu-agent.service
+```
+
+```bash
+# check the service
+open http://localhost:1080
+```
+
+## Author
+
+- Bohumír Zámečník, [Rossum, Ltd.](https://rossum.ai/)
+- License: MIT
+
+## TODO
+
+- order GPUs by priority (decreasing power, decreasing free memory)
+
+
+%prep
+%autosetup -n nvgpu-0.10.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-nvgpu -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed Apr 12 2023 Python_Bot <Python_Bot@openeuler.org> - 0.10.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..32fc83f
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+83b892a015995031111df47561962709 nvgpu-0.10.0.tar.gz