AI Controller
Dashboard
Monitor nodes, hardware, GPU history, and AI packages in the Qubrid AI Controller
The Dashboard section in the Qubrid AI Controller software provides administrators with a comprehensive overview of all the nodes that have been added to the system. From here, admins can monitor, manage, and access detailed information about each node. The Action button next to each node provides deeper insights into the node's status and configuration.
System Information
Admins can view detailed System Information for each node by clicking the Action button next to a node. This includes critical details about the node's hardware and software setup:
- RAM: Total and available RAM
- CPU: Information about the processor, including the number of cores and clock speed
- GPU: Details on the available GPUs, including model and memory capacity
- Python Version: The version of Python installed on the node
- NVIDIA Driver Versions: The version of the NVIDIA drivers running on the node (important for GPU utilization)
Admins can view and access the SSH terminal for the server.
This SSH feature is used for troubleshooting the server. To access it, click on the primary node - the terminal will only open when the primary node is selected. Through this terminal, you can directly access and manage your clusters.
Network Information
The Network Information section provides an overview of all the NIC (Network Interface Cards) available on the node:
- List of NIC Cards: All network interfaces present on the node
- Status: Displays the current status of each NIC card, including whether it's active or inactive
Hardware Monitoring
Admins can access real-time monitoring information for the node by selecting the Monitoring section. This section displays live utilization statistics for key resources:
- RAM Utilization: Displays the current memory usage on the node
- Disk Utilization: Shows the current disk space usage on the node
- GPU Utilization: Provides live statistics on GPU usage, including GPU memory usage and GPU load
GPU History
The GPU History feature allows administrators to view historical data for GPU performance:
- GPU Memory: Memory usage of the GPU over the past 24 hours
- GPU Temperature: Temperature readings for the GPU over the past 24 hours
- GPU Power Usage: Power consumption data for the GPU over the past 24 hours
This historical data helps in understanding long-term GPU performance and can be valuable for troubleshooting and system optimization.
AI Package Installation
Admins can install AI packages directly to the bare-metal server from the AI Package section. This is useful for installing machine learning frameworks or other AI-related packages to support AI workloads.
Locate the package
In the AI Package section, locate the package you want to install.
Select the package
Check the checkbox next to the AI package you want to install.
Install
Click the Install button to begin the installation process.
This section provides a simple and efficient way for admins to enhance the capabilities of the server by installing the necessary AI packages.