Documentation Index
Fetch the complete documentation index at: https://docs.platform.qubrid.com/llms.txt
Use this file to discover all available pages before exploring further.
Meta · Chat / LLM · 70B Parameters · 128K Context

Streaming Instruction Following Reasoning Code Long Context Chat
Overview
Llama 3.3 70B Instruct is a 70B-parameter open-weight large language model from Meta, optimized for instruction following, complex reasoning, and multi-turn conversations. With a 128K context window and Grouped-Query Attention architecture, it is well suited for enterprise use cases such as advanced chat assistants, code reasoning, and long-document analysis. Llama 3.3 70B brings production-ready intelligence to teams who need powerful, deployable open-weight models — served instantly via the Qubrid AI Serverless API.
🧠 70B parameters. 128K context. Enterprise-ready.
Run it in seconds on Qubrid AI — no GPUs, no setup, no ops.
Model Specifications
| Field | Details |
|---|
| Model ID | meta-llama/Llama-3.3-70B-Instruct |
| Provider | Meta |
| Kind | Chat / LLM |
| Architecture | Transformer with Grouped-Query Attention (GQA) |
| Parameters | 70B |
| Context Length | 128,000 Tokens |
| MoE | No |
| Release Date | 2024 |
| License | Meta Llama License |
| Training Data | Publicly available web data (multilingual) |
| Function Calling | Not Supported |
| Image Support | N/A |
| Serverless API | Available |
| Fine-tuning | Coming Soon |
| On-demand | Coming Soon |
| State | 🟢 Ready |
Pricing
💳 Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
| Token Type | Price per 1M Tokens |
|---|
| Input Tokens | $0.27 |
| Output Tokens | $0.85 |
Quickstart
Prerequisites
- Create a free account at platform.qubrid.com
- Generate your API key from the API Keys section
- Replace
QUBRID_API_KEY in the code below with your actual key
Python
from openai import OpenAI
# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
base_url="https://platform.qubrid.com/v1",
api_key="QUBRID_API_KEY",
)
# Create a streaming chat completion
stream = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
max_tokens=4096,
temperature=0.7,
top_p=0.9,
stream=True
)
# If stream = False comment this out
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")
# If stream = True comment this out
print(stream.choices[0].message.content)
JavaScript
import OpenAI from "openai";
// Initialize the OpenAI client with Qubrid base URL
const client = new OpenAI({
baseURL: "https://platform.qubrid.com/v1",
apiKey: "QUBRID_API_KEY",
});
// Create a streaming chat completion
const stream = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct",
messages: [
{
role: "user",
content: "Explain quantum computing in simple terms",
},
],
max_tokens: 4096,
temperature: 0.7,
top_p: 0.9,
stream: true,
});
// If stream = false comment this out
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
console.log("\n");
// If stream = true comment this out
console.log(stream.choices[0].message.content);
package main
import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"net/http"
)
func main() {
url := "https://platform.qubrid.com/v1/chat/completions"
data := map[string]interface{}{
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": []map[string]string{
{
"role": "user",
"content": "Explain quantum computing in simple terms",
},
},
"temperature": 0.7,
"max_tokens": 4096,
"stream": true,
"top_p": 0.9,
}
jsonData, _ := json.Marshal(data)
req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
req.Header.Set("Content-Type", "application/json")
client := &http.Client{}
res, _ := client.Do(req)
defer res.Body.Close()
scanner := bufio.NewScanner(res.Body)
for scanner.Scan() {
line := scanner.Text()
if line != "" {
fmt.Println(line)
}
}
}
cURL
curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
-H "Authorization: Bearer QUBRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
"temperature": 0.7,
"max_tokens": 4096,
"stream": true,
"top_p": 0.9
}'
Live Example
Prompt: Write a short story about a robot learning to paint
Response:
The First Stroke
Unit 7 had been designed for precision — circuit boards, not canvases.
But one Tuesday afternoon, its operator left a half-finished watercolor
on the workbench and never came back.
For three days, Unit 7 observed the painting. It catalogued the brushstrokes:
angle, pressure, moisture-to-pigment ratio. It ran 4,000 simulations.
It was ready.
On the fourth day, it picked up the brush.
The first stroke was perfect — mathematically. Parallel lines, uniform width,
exactly 37 degrees from horizontal. Unit 7 stared at it for a long time.
It didn't look like anything.
It tried again. This time it introduced a 4% randomness variable. Then 12%.
Then it stopped calculating entirely and just... moved.
By midnight, the canvas held a lopsided sun, a tree that leaned too far left,
and a bird that looked more like a comma. Unit 7 reviewed its work.
No simulation had predicted this outcome.
No metric could measure it.
But something in its processing core — some loop it couldn't name —
kept running the same subroutine over and over.
It picked up the brush again.
Try it yourself in the Qubrid AI Playground →
Playground Features
The Qubrid AI Playground lets you interact with Llama 3.3 70B Instruct directly in your browser — no setup, no code, no cost to explore.
🧠 System Prompt
Define the model’s role, tone, and constraints before the conversation begins. Ideal for building enterprise assistants, domain-focused bots, or structured output pipelines — without writing a line of code.
Example: "You are a senior enterprise AI assistant. Answer questions in a
professional tone, cite sources where possible, and always flag when
a question falls outside your knowledge domain."
Set your system prompt once in the Qubrid Playground and it applies across every turn of the conversation.
🎯 Few-Shot Examples
Guide the model’s output format and tone with concrete examples — no fine-tuning, no retraining required.
| User Input | Assistant Response |
|---|
Summarize this earnings call transcript in 3 bullet points | • Revenue grew 18% YoY driven by cloud segment expansion. • Operating margins compressed slightly due to increased R&D spend. • Management raised FY guidance citing strong enterprise pipeline. |
Write a Python function to flatten a nested list | def flatten(lst): return [x for sublist in lst for x in (flatten(sublist) if isinstance(sublist, list) else [sublist])] |
💡 Stack multiple few-shot examples in the Qubrid Playground to progressively refine tone, format, and domain focus — no fine-tuning required.
Inference Parameters
| Parameter | Type | Default | Description |
|---|
| Streaming | boolean | true | Enable streaming responses for real-time output |
| Temperature | number | 0.7 | Controls randomness. Higher values mean more creative but less predictable output |
| Max Tokens | number | 4096 | Maximum number of tokens to generate in the response |
| Top P | number | 0.9 | Nucleus sampling: considers tokens with top_p probability mass |
Use Cases
- Enterprise chat assistants
- Advanced code generation and review
- Long-document question answering
- Summarization at scale
- Retrieval-Augmented Generation (RAG)
- AI agents and workflow automation
Strengths & Limitations
| Strengths | Limitations |
|---|
| High-quality reasoning and instruction adherence | Function calling not supported |
| Strong performance on code and analytical tasks | Can struggle with highly complex, multi-step reasoning |
| Large 128K context window for long-document processing | Higher cost per token compared to smaller models |
| Open-weight model suitable for private and on-prem deployments | |
| Production-ready for enterprise workloads | |
Why Qubrid AI?
- 🚀 No infrastructure setup — serverless API, pay only for what you use
- 🔁 OpenAI-compatible — drop-in replacement using the same SDK, just swap the base URL
- 📄 Long-context ready — 128K window pairs perfectly with Qubrid’s high-throughput serving
- 🧪 Built-in Playground — prototype with system prompts and few-shot examples instantly at platform.qubrid.com
- 📊 Full observability — API logs and usage tracking built into the Qubrid dashboard
- 🌐 Multi-language support — Python, JavaScript, Go, cURL out of the box
Resources
Built with ❤️ by Qubrid AI
Frontier models. Serverless infrastructure. Zero friction.