AI Controller

Model Studio

Deploy LLMs on-premises and download Hugging Face and NIM models with Qubrid AI Controller

Welcome to the documentation for deploying Large Language Models (LLMs) with Qubrid AI Controller. In this guide, we will explore the process of deploying LLMs into our premises, including inferencing, to leverage the power of GPU acceleration for natural language processing tasks.

What are Large Language Models (LLMs)?

Large Language Models (LLMs) are a class of artificial intelligence models capable of understanding and generating human-like text, images, audio, and videos at scale. These models, often based on deep learning architectures like Transformers, have revolutionized natural language processing (NLP) tasks by exhibiting remarkable capabilities in language understanding, generation, translation, and more.

Inferencing with LLMs

Inferencing with LLMs involves using trained models to generate text or make predictions based on input data. Whether it's text completion, translation, summarization, or sentiment analysis, LLMs excel at understanding and generating human-like text in various contexts. The inferencing process typically involves:

Model Deployment: Deploying the trained LLM on production environments or inference servers to handle incoming requests
Input Processing: Preprocessing input data to ensure compatibility with the LLM's input format, such as tokenization or encoding
Model Inference: Passing the preprocessed input through the LLM to generate predictions or text outputs
Output Post-processing: Optionally, post-processing the model outputs to enhance readability or usability, depending on the application requirements

Integrating inferencing with LLMs into our GPU infrastructure platform offers scalable and high-performance solutions for real-time NLP applications, enabling rapid and efficient processing of natural language inputs.

Hugging Face Models

The admin can download the models from the Hugging Face repository and store them on the local server in the desired directory.

Download a Model from Hugging Face

The following are the steps to download a model from the Hugging Face repository.

Start the download

Click the download button against the models on the list and fill in the Hugging Face token and directory path (on the desired path - it can be a directory or NFS path).

Wait for completion

The model will get downloaded successfully, depending on the internet speed.

Please make sure the Hugging Face token provided is correct and has permission to download the desired model; otherwise, the downloading process will fail.

Currently, this version only supports Instruct-type language models from Hugging Face to be deployed for inference.

NIM (Nvidia Inference Microservice) Models

The admin can download the models from the NIM and store them on the local server in the desired directory.

Model Studio

What are Large Language Models (LLMs)?

Inferencing with LLMs

Hugging Face Models

Download a Model from Hugging Face

Start the download

Wait for completion

NIM (Nvidia Inference Microservice) Models

Download a Model from the Nvidia NIM Repository

Add the NIM API key

Download the model

Wait for completion