
DeepSeek · Chat / LLM · 70B Parameters · 64K Context

Streaming Reasoning Chain-of-Thought Code Long Context ChatOverview
DeepSeek R1 Distill LLaMA 70B is a knowledge-distilled reasoning model built on the LLaMA-3.1-70B architecture, trained on high-quality reasoning outputs from DeepSeek R1. It delivers near frontier-level analytical performance while running on significantly smaller hardware than the full R1 model — making it ideal for teams that need powerful chain-of-thought reasoning without the infrastructure overhead of a 671B parameter system. Served instantly via the Qubrid AI Serverless API.🧠 Frontier reasoning. Distilled efficiency. Run DeepSeek R1 intelligence on Qubrid AI — no GPUs, no setup, no ops.
Model Specifications
| Field | Details |
|---|---|
| Model ID | deepseek-ai/deepseek-r1-distill-llama-70b |
| Provider | DeepSeek |
| Kind | Chat / LLM |
| Architecture | LLaMA-3.1-70B (Distilled) |
| Parameters | 70B |
| Context Length | 64,000 Tokens |
| MoE | No |
| Release Date | January 2025 |
| License | DeepSeek R1 License (MIT) |
| Training Data | Distilled from DeepSeek R1 high-quality reasoning outputs with Llama 70B |
| Function Calling | Not Supported |
| Image Support | N/A |
| Serverless API | Available |
| Fine-tuning | Coming Soon |
| On-demand | Coming Soon |
| State | 🟢 Ready |
Pricing
💳 Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
| Token Type | Price per 1M Tokens |
|---|---|
| Input Tokens | $1.20 |
| Output Tokens | $1.80 |
Quickstart
Prerequisites
- Create a free account at platform.qubrid.com
- Generate your API key from the API Keys section
- Replace
QUBRID_API_KEYin the code below with your actual key
Python
JavaScript
Go
cURL
Live Example
Prompt: Explain quantum computing in simple terms
Response:
Playground Features
The Qubrid AI Playground lets you interact with DeepSeek R1 Distill LLaMA 70B directly in your browser — no setup, no code, no cost to explore.🧠 System Prompt
Shape the model’s reasoning approach, output format, and domain focus before the conversation begins — ideal for technical assistants, structured analysis pipelines, and multi-turn problem-solving workflows.Set your system prompt once in the Qubrid Playground and it applies across every turn of the conversation.
🎯 Few-Shot Examples
Guide the model’s reasoning depth and output structure with concrete examples — no fine-tuning, no retraining required.| User Input | Assistant Response |
|---|---|
What is the time complexity of merge sort? | Merge sort has O(n log n) time complexity in all cases — best, average, and worst. This is because the array is divided log n times and each division requires O(n) work to merge. |
Solve: if 3x + 7 = 22, what is x? | Step 1: Subtract 7 from both sides → 3x = 15. Step 2: Divide by 3 → x = 5. |
💡 Stack multiple few-shot examples in the Qubrid Playground to shape reasoning style and output format — no fine-tuning required.
Inference Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Streaming | boolean | true | Enable streaming responses for real-time output |
| Temperature | number | 0.3 | Controls creativity and randomness. Higher values produce more diverse output |
| Max Tokens | number | 10000 | Defines the maximum number of tokens the model is allowed to generate |
| Top P | number | 1 | Nucleus sampling: limits token selection to a subset of top probability mass |
| Reasoning Effort | select | medium | Adjusts the depth of reasoning and problem-solving effort. Higher settings yield more thorough responses at the cost of latency |
| Reasoning Summary | select | auto | Controls verbosity of reasoning explanations. auto lets the model decide; concise gives brief summaries; detailed offers in-depth explanations |
Use Cases
- Advanced reasoning and problem solving
- Conversational AI
- Technical and coding assistance
- Long-form text generation
- Math and logic tasks
- Research and analysis
Strengths & Limitations
| Strengths | Limitations |
|---|---|
| Excellent reasoning and chain-of-thought capability | Slightly slower than smaller distilled models |
| Lower GPU memory requirement compared to the full R1 model | Reasoning quality may vary in very complex tasks |
| Strong performance across technical and multilingual tasks | Function calling not supported |
| Open-source and suitable for on-prem deployment |
Why Qubrid AI?
- 🚀 No infrastructure setup — serverless API, pay only for what you use
- 🔁 OpenAI-compatible — drop-in replacement using the same SDK, just swap the base URL
- 🧠 Reasoning at scale — distilled R1 intelligence served with Qubrid’s low-latency infrastructure
- 🧪 Built-in Playground — prototype with system prompts and few-shot examples instantly at platform.qubrid.com
- 📊 Full observability — API logs and usage tracking built into the Qubrid dashboard
- 🌐 Multi-language support — Python, JavaScript, Go, cURL out of the box
Resources
| Resource | Link |
|---|---|
| 📖 Qubrid Docs | docs.platform.qubrid.com |
| 🎮 Playground | Try DeepSeek R1 Distill LLaMA 70B live |
| 🔑 API Keys | Get your API Key |
| 🤗 Hugging Face | deepseek-ai/deepseek-r1-distill-llama-70b |
| 💬 Discord | Join the Qubrid Community |
Built with ❤️ by Qubrid AI
Frontier models. Serverless infrastructure. Zero friction.
Frontier models. Serverless infrastructure. Zero friction.