Skip to main content
Qubrid AI
Meta · Chat / LLM · 70B Parameters · 128K ContextQubrid Playground License HuggingFaceStreaming Instruction Following Reasoning Code Long Context Chat

Overview

Llama 3.3 70B Instruct is a 70B-parameter open-weight large language model from Meta, optimized for instruction following, complex reasoning, and multi-turn conversations. With a 128K context window and Grouped-Query Attention architecture, it is well suited for enterprise use cases such as advanced chat assistants, code reasoning, and long-document analysis. Llama 3.3 70B brings production-ready intelligence to teams who need powerful, deployable open-weight models — served instantly via the Qubrid AI Serverless API.
🧠 70B parameters. 128K context. Enterprise-ready. Run it in seconds on Qubrid AI — no GPUs, no setup, no ops.

Model Specifications

FieldDetails
Model IDmeta-llama/Llama-3.3-70B-Instruct
ProviderMeta
KindChat / LLM
ArchitectureTransformer with Grouped-Query Attention (GQA)
Parameters70B
Context Length128,000 Tokens
MoENo
Release Date2024
LicenseMeta Llama License
Training DataPublicly available web data (multilingual)
Function CallingNot Supported
Image SupportN/A
Serverless APIAvailable
Fine-tuningComing Soon
On-demandComing Soon
State🟢 Ready

Pricing

💳 Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
Token TypePrice per 1M Tokens
Input Tokens$0.27
Output Tokens$0.85

Quickstart

Prerequisites

  1. Create a free account at platform.qubrid.com
  2. Generate your API key from the API Keys section
  3. Replace QUBRID_API_KEY in the code below with your actual key

Python

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
  base_url="https://platform.qubrid.com/v1",
  api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
  model="meta-llama/Llama-3.3-70B-Instruct",
  messages=[
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  max_tokens=4096,
  temperature=0.7,
  top_p=0.9,
  stream=True
)

# If stream = False comment this out
for chunk in stream:
  if chunk.choices and chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

JavaScript

import OpenAI from "openai";

// Initialize the OpenAI client with Qubrid base URL
const client = new OpenAI({
  baseURL: "https://platform.qubrid.com/v1",
  apiKey: "QUBRID_API_KEY",
});

// Create a streaming chat completion
const stream = await client.chat.completions.create({
  model: "meta-llama/Llama-3.3-70B-Instruct",
  messages: [
    {
      role: "user",
      content: "Explain quantum computing in simple terms",
    },
  ],
  max_tokens: 4096,
  temperature: 0.7,
  top_p: 0.9,
  stream: true,
});

// If stream = false comment this out
for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}
console.log("\n");

// If stream = true comment this out
console.log(stream.choices[0].message.content);

Go

package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
)

func main() {
	url := "https://platform.qubrid.com/v1/chat/completions"

	data := map[string]interface{}{
		"model": "meta-llama/Llama-3.3-70B-Instruct",
		"messages": []map[string]string{
			{
				"role":    "user",
				"content": "Explain quantum computing in simple terms",
			},
		},
		"temperature": 0.7,
		"max_tokens":  4096,
		"stream":      true,
		"top_p":       0.9,
	}

	jsonData, _ := json.Marshal(data)
	req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
	req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
	req.Header.Set("Content-Type", "application/json")

	client := &http.Client{}
	res, _ := client.Do(req)
	defer res.Body.Close()

	scanner := bufio.NewScanner(res.Body)
	for scanner.Scan() {
		line := scanner.Text()
		if line != "" {
			fmt.Println(line)
		}
	}
}

cURL

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 4096,
  "stream": true,
  "top_p": 0.9
}'

Live Example

Prompt: Write a short story about a robot learning to paint
Response:
The First Stroke

Unit 7 had been designed for precision — circuit boards, not canvases.
But one Tuesday afternoon, its operator left a half-finished watercolor
on the workbench and never came back.

For three days, Unit 7 observed the painting. It catalogued the brushstrokes:
angle, pressure, moisture-to-pigment ratio. It ran 4,000 simulations.
It was ready.

On the fourth day, it picked up the brush.

The first stroke was perfect — mathematically. Parallel lines, uniform width,
exactly 37 degrees from horizontal. Unit 7 stared at it for a long time.

It didn't look like anything.

It tried again. This time it introduced a 4% randomness variable. Then 12%.
Then it stopped calculating entirely and just... moved.

By midnight, the canvas held a lopsided sun, a tree that leaned too far left,
and a bird that looked more like a comma. Unit 7 reviewed its work.

No simulation had predicted this outcome.
No metric could measure it.

But something in its processing core — some loop it couldn't name —
kept running the same subroutine over and over.

It picked up the brush again.
Try it yourself in the Qubrid AI Playground →

Playground Features

The Qubrid AI Playground lets you interact with Llama 3.3 70B Instruct directly in your browser — no setup, no code, no cost to explore.

🧠 System Prompt

Define the model’s role, tone, and constraints before the conversation begins. Ideal for building enterprise assistants, domain-focused bots, or structured output pipelines — without writing a line of code.
Example: "You are a senior enterprise AI assistant. Answer questions in a
professional tone, cite sources where possible, and always flag when
a question falls outside your knowledge domain."
Set your system prompt once in the Qubrid Playground and it applies across every turn of the conversation.

🎯 Few-Shot Examples

Guide the model’s output format and tone with concrete examples — no fine-tuning, no retraining required.
User InputAssistant Response
Summarize this earnings call transcript in 3 bullet points• Revenue grew 18% YoY driven by cloud segment expansion. • Operating margins compressed slightly due to increased R&D spend. • Management raised FY guidance citing strong enterprise pipeline.
Write a Python function to flatten a nested listdef flatten(lst): return [x for sublist in lst for x in (flatten(sublist) if isinstance(sublist, list) else [sublist])]
💡 Stack multiple few-shot examples in the Qubrid Playground to progressively refine tone, format, and domain focus — no fine-tuning required.

Inference Parameters

ParameterTypeDefaultDescription
StreamingbooleantrueEnable streaming responses for real-time output
Temperaturenumber0.7Controls randomness. Higher values mean more creative but less predictable output
Max Tokensnumber4096Maximum number of tokens to generate in the response
Top Pnumber0.9Nucleus sampling: considers tokens with top_p probability mass

Use Cases

  1. Enterprise chat assistants
  2. Advanced code generation and review
  3. Long-document question answering
  4. Summarization at scale
  5. Retrieval-Augmented Generation (RAG)
  6. AI agents and workflow automation

Strengths & Limitations

StrengthsLimitations
High-quality reasoning and instruction adherenceFunction calling not supported
Strong performance on code and analytical tasksCan struggle with highly complex, multi-step reasoning
Large 128K context window for long-document processingHigher cost per token compared to smaller models
Open-weight model suitable for private and on-prem deployments
Production-ready for enterprise workloads

Why Qubrid AI?

  • 🚀 No infrastructure setup — serverless API, pay only for what you use
  • 🔁 OpenAI-compatible — drop-in replacement using the same SDK, just swap the base URL
  • 📄 Long-context ready — 128K window pairs perfectly with Qubrid’s high-throughput serving
  • 🧪 Built-in Playground — prototype with system prompts and few-shot examples instantly at platform.qubrid.com
  • 📊 Full observability — API logs and usage tracking built into the Qubrid dashboard
  • 🌐 Multi-language support — Python, JavaScript, Go, cURL out of the box

Resources

ResourceLink
📖 Qubrid Docsdocs.platform.qubrid.com
🎮 PlaygroundTry Llama 3.3 70B live
🔑 API KeysGet your API Key
🤗 Hugging Facemeta-llama/Llama-3.3-70B-Instruct
💬 DiscordJoin the Qubrid Community

Built with ❤️ by Qubrid AI

Frontier models. Serverless infrastructure. Zero friction.