AI Controller
Model Studio
Deploy LLMs on-premises and download Hugging Face and NIM models with Qubrid AI Controller
Welcome to the documentation for deploying Large Language Models (LLMs) with Qubrid AI Controller. In this guide, we will explore the process of deploying LLMs into our premises, including inferencing, to leverage the power of GPU acceleration for natural language processing tasks.
What are Large Language Models (LLMs)?
Large Language Models (LLMs) are a class of artificial intelligence models capable of understanding and generating human-like text, images, audio, and videos at scale. These models, often based on deep learning architectures like Transformers, have revolutionized natural language processing (NLP) tasks by exhibiting remarkable capabilities in language understanding, generation, translation, and more.
Inferencing with LLMs
Inferencing with LLMs involves using trained models to generate text or make predictions based on input data. Whether it's text completion, translation, summarization, or sentiment analysis, LLMs excel at understanding and generating human-like text in various contexts. The inferencing process typically involves:
- Model Deployment: Deploying the trained LLM on production environments or inference servers to handle incoming requests
- Input Processing: Preprocessing input data to ensure compatibility with the LLM's input format, such as tokenization or encoding
- Model Inference: Passing the preprocessed input through the LLM to generate predictions or text outputs
- Output Post-processing: Optionally, post-processing the model outputs to enhance readability or usability, depending on the application requirements
Integrating inferencing with LLMs into our GPU infrastructure platform offers scalable and high-performance solutions for real-time NLP applications, enabling rapid and efficient processing of natural language inputs.
Hugging Face Models
The admin can download the models from the Hugging Face repository and store them on the local server in the desired directory.
Download a Model from Hugging Face
The following are the steps to download a model from the Hugging Face repository.
Start the download
Click the download button against the models on the list and fill in the Hugging Face token and directory path (on the desired path - it can be a directory or NFS path).
Wait for completion
The model will get downloaded successfully, depending on the internet speed.
Please make sure the Hugging Face token provided is correct and has permission to download the desired model; otherwise, the downloading process will fail.
Currently, this version only supports Instruct-type language models from Hugging Face to be deployed for inference.
NIM (Nvidia Inference Microservice) Models
The admin can download the models from the NIM and store them on the local server in the desired directory.
Download a Model from the Nvidia NIM Repository
The following are the steps to download a model from the Nvidia NIM repository.
Add the NIM API key
Add the NIM API key before starting to download the models.
Download the model
Click the download button against the NIM models available on the list.
Wait for completion
The model will get downloaded successfully, depending on the internet speed.