User guide

This guide provides a detailed process to deploy and interact with Hugging Face text-generation models using Qubrid AI’s cloud platform with on-demand GPUs.

Pre-requisites

Before you start, ensure the following conditions are met:

1. Hugging Face Account

You must have a Hugging Face account to access public or private models.

2. Model Access

You can deploy:
- Your own custom-trained models, or
- Official open-source models from Hugging Face, given proper licensing.

Finding the Model Name

Model names are crucial for deployment.
To find the model name:
- Log in to Hugging Face.
- Go to the "Models" section.
- Search or browse for your model; click on it.
- The URL displays the name in the form: username/model-name (e.g., deepseek-ai/DeepSeek-R1).

3. API Token

Obtain an authenticated Hugging Face access token (required for private models or programmatic deployment):
- Log into Hugging Face.
- Click your profile → "Settings" → "Access Tokens".
- Create a new token (give it a name/description, set correct permissions).
- Copy and keep your token secure.

warning

Do not share your token. If lost, revoke and create a new one.

4. Certification Requirements

Ensure your model meets any required certifications or compliance standards needed for deployment.

Step-by-Step Deployment on Qubrid AI

1. Access Model Studio

Log into Qubrid AI.
Navigate to the Model Studio and select Hugging Face Models.

2. Add Your Model

STEP_1

Click “Add Model”.
Enter a clear, unique Name and (optionally) a description for easy identification.

3. Link Your Model

STEP_2

Paste the Hugging Face model link (e.g., https://huggingface.co/deepseek-ai/DeepSeek-R1) and the API token for access.
Wait for token verification to ensure successful connection.
- If access fails, confirm you copied the whole model name and used a valid token with necessary permissions.

Guidance on Model Size & GPU Selection (A10G)

GPU Configuration	Max Model Size	Typical Usage
1 × A10G (24GB)	Up to 20B parameters	Small-to-midsize models, fast deployment
4 × A10G (96GB)	Up to 90B parameters	Large, advanced models
8 × A10G (192GB)	Up to 180B parameters	Largest models, maximal compute capacity

Choose GPU count based on your model’s parameter size.

4. Select GPU Resources

STEP_3

Select desired GPU count and configure the auto-stop timer (to auto-shutdown after a specified number of hours—helps control costs).
Click “Deploy”. Deployment time depends on model size (plan for at least 15 minutes for large models).
Note: Each A10G GPU instance is billed at $1.41/hour.

5. Monitor Deployment

Track status from your dashboard. The process will display progress stages such as:
- Deployment in progress
- Model loading
- Deployment ready
Only begin further actions once the status shows Deployment Ready.

6. Interact with Your Model

Monitor

Click the Actions button (available for each deployed model).
Access options like:
- Edit Auto-Stop: Modify shutdown time.
- Stop: Temporarily halt deployment.
- Get Code: Retrieve code for CLI, Python, or React integration.
- Interactive UI: Browser-based prompt/parameter testing.
- RAG UI: Retrieval-augmented document chat.
- View Logs: Review deployment logs.
- Delete: Remove deployment.

7. Retrieve Integration Code ("Get Code")

Under Actions, select Get Code:
- Generate ready-to-use snippets for command-line (cURL), Python scripts, or React apps.
- Use these for quick model integration into your pipelines or apps.

8. Interactive UI

Use the Interactive UI:
- Fine-tune model prompts and adjust parameters via the left panel.
- No coding required—ideal for iterative experimentation.

9. Retrieval-Augmented Generation (RAG UI)

Use the RAG UI to upload document(s) (PDF, etc.) and query them with your deployed model:
- Under Actions, pick RAG UI, upload documents, and wait for parsing/indexing.
- Use the chat interface to ask document-related questions.
- Get real-time, context-aware answers.

10. Edit Auto-Stop

Adjust instance auto-shutdown timer anytime to control usage.

11. View Logs

Access View Log to debug, troubleshoot, or track deployment actions.

12. Stop/Start Deployment

Stop: Immediately pause deployment to reduce costs.
Start again when you need to interact with the model.

13. Delete Deployment

Delete the model (action irreversible) to free resources and remove it from your workspace. Confirm deletion when prompted.

Additional Tips

Keep credential tokens confidential.
Monitor instance duration and cost using the dashboard.
Regularly check deployment logs for possible errors or warnings.
Always shut down or delete idle instances to optimize costs.
- Join our Discord Group to connect with the community and share your feedback directly or email us at digital@qubrid.com.

By following these elaborated steps, you’ll be able to deploy, manage, and interact with Hugging Face instruct-type models on Qubrid AI, leveraging powerful on-demand GPUs with full integration flexibility and easy post-deployment workflows.

Pre-requisites​

1. Hugging Face Account​

2. Model Access​

Finding the Model Name​

3. API Token​

4. Certification Requirements​

Step-by-Step Deployment on Qubrid AI​

1. Access Model Studio​

2. Add Your Model​

3. Link Your Model​

Guidance on Model Size & GPU Selection (A10G)​

4. Select GPU Resources​

5. Monitor Deployment​

6. Interact with Your Model​

7. Retrieve Integration Code ("Get Code")​

8. Interactive UI​

9. Retrieval-Augmented Generation (RAG UI)​

10. Edit Auto-Stop​

11. View Logs​

12. Stop/Start Deployment​

13. Delete Deployment​

Additional Tips​