Llama 3.3 70B Instruct

Meta · Chat / LLM · 70B Parameters · 128K Context

Streaming Instruction Following Reasoning Code Long Context Chat

Overview

Llama 3.3 70B Instruct is a 70B-parameter open-weight large language model from Meta, optimized for instruction following, complex reasoning, and multi-turn conversations. With a 128K context window and Grouped-Query Attention architecture, it is well suited for enterprise use cases such as advanced chat assistants, code reasoning, and long-document analysis. Llama 3.3 70B brings production-ready intelligence to teams who need powerful, deployable open-weight models — served instantly via the Qubrid AI Serverless API.

🧠 70B parameters. 128K context. Enterprise-ready. Run it in seconds on Qubrid AI — no GPUs, no setup, no ops.

Model Specifications

Field	Details
Model ID	`meta-llama/Llama-3.3-70B-Instruct`
Provider	Meta
Kind	Chat / LLM
Architecture	Transformer with Grouped-Query Attention (GQA)
Parameters	70B
Context Length	128,000 Tokens
MoE	No
Release Date	2024
License	Meta Llama License
Training Data	Publicly available web data (multilingual)
Function Calling	Not Supported
Image Support	N/A
Serverless API	Available
Fine-tuning	Coming Soon
On-demand	Coming Soon
State	🟢 Ready

Pricing

💳 Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.

Token Type	Price per 1M Tokens
Input Tokens	$0.27
Output Tokens	$0.85

Quickstart

Prerequisites

Create a free account at platform.qubrid.com
Generate your API key from the API Keys section
Replace QUBRID_API_KEY in the code below with your actual key

Python

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
  base_url="https://platform.qubrid.com/v1",
  api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
  model="meta-llama/Llama-3.3-70B-Instruct",
  messages=[
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  max_tokens=4096,
  temperature=0.7,
  top_p=0.9,
  stream=True
)

# If stream = False comment this out
for chunk in stream:
  if chunk.choices and chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

JavaScript

import OpenAI from "openai";

// Initialize the OpenAI client with Qubrid base URL
const client = new OpenAI({
  baseURL: "https://platform.qubrid.com/v1",
  apiKey: "QUBRID_API_KEY",
});

// Create a streaming chat completion
const stream = await client.chat.completions.create({
  model: "meta-llama/Llama-3.3-70B-Instruct",
  messages: [
    {
      role: "user",
      content: "Explain quantum computing in simple terms",
    },
  ],
  max_tokens: 4096,
  temperature: 0.7,
  top_p: 0.9,
  stream: true,
});

// If stream = false comment this out
for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}
console.log("\n");

// If stream = true comment this out
console.log(stream.choices[0].message.content);

Go

package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
)

func main() {
	url := "https://platform.qubrid.com/v1/chat/completions"

	data := map[string]interface{}{
		"model": "meta-llama/Llama-3.3-70B-Instruct",
		"messages": []map[string]string{
			{
				"role":    "user",
				"content": "Explain quantum computing in simple terms",
			},
		},
		"temperature": 0.7,
		"max_tokens":  4096,
		"stream":      true,
		"top_p":       0.9,
	}

	jsonData, _ := json.Marshal(data)
	req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
	req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
	req.Header.Set("Content-Type", "application/json")

	client := &http.Client{}
	res, _ := client.Do(req)
	defer res.Body.Close()

	scanner := bufio.NewScanner(res.Body)
	for scanner.Scan() {
		line := scanner.Text()
		if line != "" {
			fmt.Println(line)
		}
	}
}

cURL

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 4096,
  "stream": true,
  "top_p": 0.9
}'

Live Example

Prompt: Write a short story about a robot learning to paint

Response:

The First Stroke

Unit 7 had been designed for precision — circuit boards, not canvases.
But one Tuesday afternoon, its operator left a half-finished watercolor
on the workbench and never came back.

For three days, Unit 7 observed the painting. It catalogued the brushstrokes:
angle, pressure, moisture-to-pigment ratio. It ran 4,000 simulations.
It was ready.

On the fourth day, it picked up the brush.

The first stroke was perfect — mathematically. Parallel lines, uniform width,
exactly 37 degrees from horizontal. Unit 7 stared at it for a long time.

It didn't look like anything.

It tried again. This time it introduced a 4% randomness variable. Then 12%.
Then it stopped calculating entirely and just... moved.

By midnight, the canvas held a lopsided sun, a tree that leaned too far left,
and a bird that looked more like a comma. Unit 7 reviewed its work.

No simulation had predicted this outcome.
No metric could measure it.

But something in its processing core — some loop it couldn't name —
kept running the same subroutine over and over.

It picked up the brush again.

Try it yourself in the Qubrid AI Playground →

Playground Features

The Qubrid AI Playground lets you interact with Llama 3.3 70B Instruct directly in your browser — no setup, no code, no cost to explore.

🧠 System Prompt

Define the model’s role, tone, and constraints before the conversation begins. Ideal for building enterprise assistants, domain-focused bots, or structured output pipelines — without writing a line of code.

Example: "You are a senior enterprise AI assistant. Answer questions in a
professional tone, cite sources where possible, and always flag when
a question falls outside your knowledge domain."

Set your system prompt once in the Qubrid Playground and it applies across every turn of the conversation.

🎯 Few-Shot Examples

Guide the model’s output format and tone with concrete examples — no fine-tuning, no retraining required.

User Input	Assistant Response
`Summarize this earnings call transcript in 3 bullet points`	`• Revenue grew 18% YoY driven by cloud segment expansion. • Operating margins compressed slightly due to increased R&D spend. • Management raised FY guidance citing strong enterprise pipeline.`
`Write a Python function to flatten a nested list`	`def flatten(lst): return [x for sublist in lst for x in (flatten(sublist) if isinstance(sublist, list) else [sublist])]`

💡 Stack multiple few-shot examples in the Qubrid Playground to progressively refine tone, format, and domain focus — no fine-tuning required.

Inference Parameters

Parameter	Type	Default	Description
Streaming	boolean	`true`	Enable streaming responses for real-time output
Temperature	number	`0.7`	Controls randomness. Higher values mean more creative but less predictable output
Max Tokens	number	`4096`	Maximum number of tokens to generate in the response
Top P	number	`0.9`	Nucleus sampling: considers tokens with top_p probability mass

Use Cases

Enterprise chat assistants
Advanced code generation and review
Long-document question answering
Summarization at scale
Retrieval-Augmented Generation (RAG)
AI agents and workflow automation

Strengths & Limitations

Strengths	Limitations
High-quality reasoning and instruction adherence	Function calling not supported
Strong performance on code and analytical tasks	Can struggle with highly complex, multi-step reasoning
Large 128K context window for long-document processing	Higher cost per token compared to smaller models
Open-weight model suitable for private and on-prem deployments
Production-ready for enterprise workloads

Why Qubrid AI?

🚀 No infrastructure setup — serverless API, pay only for what you use
🔁 OpenAI-compatible — drop-in replacement using the same SDK, just swap the base URL
📄 Long-context ready — 128K window pairs perfectly with Qubrid’s high-throughput serving
🧪 Built-in Playground — prototype with system prompts and few-shot examples instantly at platform.qubrid.com
📊 Full observability — API logs and usage tracking built into the Qubrid dashboard
🌐 Multi-language support — Python, JavaScript, Go, cURL out of the box

Resources

Resource	Link
📖 Qubrid Docs	docs.platform.qubrid.com
🎮 Playground	Try Llama 3.3 70B live
🔑 API Keys	Get your API Key
🤗 Hugging Face	meta-llama/Llama-3.3-70B-Instruct
💬 Discord	Join the Qubrid Community

Built with ❤️ by Qubrid AI

Frontier models. Serverless infrastructure. Zero friction.

Getting started

GPU Compute

Inferencing

Qubrid AI Models

AI Tools

Overview

Model Specifications

Pricing

Quickstart

Prerequisites

Python

JavaScript

Go

cURL

Live Example

Playground Features

🧠 System Prompt

🎯 Few-Shot Examples

Inference Parameters

Use Cases

Strengths & Limitations

Why Qubrid AI?

Resources

Getting started

GPU Compute

Inferencing

Qubrid AI Models

AI Tools

Documentation Index

​Overview

​Model Specifications

​Pricing

​Quickstart

​Prerequisites

​Python

​JavaScript

​Go

​cURL

​Live Example

​Playground Features

​🧠 System Prompt

​🎯 Few-Shot Examples

​Inference Parameters

​Use Cases

​Strengths & Limitations

​Why Qubrid AI?

​Resources

Overview

Model Specifications

Pricing

Quickstart

Prerequisites

Python

JavaScript

Go

cURL

Live Example

Playground Features

🧠 System Prompt

🎯 Few-Shot Examples

Inference Parameters

Use Cases

Strengths & Limitations

Why Qubrid AI?

Resources