nvbandwidth Version: v0.4 Built from Git version: NOTE: This tool reports current measured bandwidth on your system. Additional system-specific tuning may be required to achieve maximal peak bandwidth. CUDA Runtime Version: 12040 CUDA Driver Version: 12040 Driver Version: 550.90.07 Device 0: NVIDIA GeForce RTX 4090 Device 1: NVIDIA GeForce RTX 4090 Device 2: NVIDIA GeForce RTX 4090 Device 3: NVIDIA GeForce RTX 4090 Device 4: NVIDIA GeForce RTX 4090 Device 5: NVIDIA GeForce RTX 4090 Device 6: NVIDIA GeForce RTX 4090 Device 7: NVIDIA GeForce RTX 4090 Running device_to_device_memcpy_read_ce. memcpy CE GPU(row) -> GPU(column) bandwidth (GB/s) 0 1 2 3 4 5 6 7 0 N/A 22.60 22.59 22.60 22.59 22.60 22.59 22.59 1 22.59 N/A 22.60 22.60 22.59 22.60 22.60 22.60 2 22.60 22.60 N/A 22.60 22.59 22.60 22.60 22.60 3 22.60 22.60 22.60 N/A 22.60 22.60 22.60 22.60 4 22.60 22.60 22.60 22.60 N/A 22.59 22.60 22.60 5 22.60 22.60 22.60 22.60 22.59 N/A 22.60 22.60 6 22.59 22.60 22.60 22.60 22.60 22.60 N/A 22.60 7 22.60 22.60 22.60 22.60 22.60 22.60 22.59 N/A SUM device_to_device_memcpy_read_ce 1265.39 nvbandwidth Version: v0.4 Built from Git version: NOTE: This tool reports current measured bandwidth on your system. Additional system-specific tuning may be required to achieve maximal peak bandwidth. CUDA Runtime Version: 12040 CUDA Driver Version: 12040 Driver Version: 550.90.07 Device 0: NVIDIA GeForce RTX 4090 Device 1: NVIDIA GeForce RTX 4090 Device 2: NVIDIA GeForce RTX 4090 Device 3: NVIDIA GeForce RTX 4090 Device 4: NVIDIA GeForce RTX 4090 Device 5: NVIDIA GeForce RTX 4090 Device 6: NVIDIA GeForce RTX 4090 Device 7: NVIDIA GeForce RTX 4090 Running device_to_device_memcpy_write_ce. memcpy CE GPU(row) <- GPU(column) bandwidth (GB/s) 0 1 2 3 4 5 6 7 0 N/A 22.68 22.68 22.68 22.68 22.68 22.68 22.68 1 22.68 N/A 22.68 22.68 22.68 22.68 22.68 22.68 2 22.68 22.68 N/A 22.68 22.67 22.68 22.67 22.68 3 22.68 22.68 22.68 N/A 22.68 22.68 22.68 22.68 4 22.68 22.67 22.68 22.68 N/A 22.68 22.68 22.68 5 22.68 22.67 22.68 22.68 22.68 N/A 22.68 22.68 6 22.68 22.68 22.68 22.68 22.68 22.68 N/A 22.68 7 22.68 22.68 22.68 22.67 22.68 22.68 22.68 N/A SUM device_to_device_memcpy_write_ce 1269.93 nvbandwidth Version: v0.4 Built from Git version: NOTE: This tool reports current measured bandwidth on your system. Additional system-specific tuning may be required to achieve maximal peak bandwidth. CUDA Runtime Version: 12040 CUDA Driver Version: 12040 Driver Version: 550.90.07 Device 0: NVIDIA GeForce RTX 4090 Device 1: NVIDIA GeForce RTX 4090 Device 2: NVIDIA GeForce RTX 4090 Device 3: NVIDIA GeForce RTX 4090 Device 4: NVIDIA GeForce RTX 4090 Device 5: NVIDIA GeForce RTX 4090 Device 6: NVIDIA GeForce RTX 4090 Device 7: NVIDIA GeForce RTX 4090 Running host_to_device_memcpy_ce. memcpy CE CPU(row) -> GPU(column) bandwidth (GB/s) 0 1 2 3 4 5 6 7 0 25.05 24.98 24.99 24.99 25.07 24.94 25.14 25.01 SUM host_to_device_memcpy_ce 200.16 nvbandwidth Version: v0.4 Built from Git version: NOTE: This tool reports current measured bandwidth on your system. Additional system-specific tuning may be required to achieve maximal peak bandwidth. CUDA Runtime Version: 12040 CUDA Driver Version: 12040 Driver Version: 550.90.07 Device 0: NVIDIA GeForce RTX 4090 Device 1: NVIDIA GeForce RTX 4090 Device 2: NVIDIA GeForce RTX 4090 Device 3: NVIDIA GeForce RTX 4090 Device 4: NVIDIA GeForce RTX 4090 Device 5: NVIDIA GeForce RTX 4090 Device 6: NVIDIA GeForce RTX 4090 Device 7: NVIDIA GeForce RTX 4090 Running device_to_host_memcpy_ce. memcpy CE CPU(row) <- GPU(column) bandwidth (GB/s) 0 1 2 3 4 5 6 7 0 26.30 26.30 26.30 26.30 26.30 26.30 26.30 26.30 SUM device_to_host_memcpy_ce 210.43