Skip to main content
Qubrid AI
MiniMax ยท Chat / LLM ยท 230B Parameters (10B Active) ยท 200K ContextQubrid Playground License HuggingFaceStreaming Agentic Coding Long Context Code Tool Use Polyglot

Overview

MiniMax M2.1 is the flagship open-source coding and agentic model from MiniMax โ€” a Chinese AI research company focused on building large-scale open-source foundation models for coding, reasoning, and agentic workflows. With 230B total parameters and only 10B active per token (23:1 sparsity ratio), it achieves 74% on SWE-bench Verified โ€” competitive with Claude Sonnet 4.5 โ€” at a fraction of the cost. It delivers best-in-class polyglot coding across Python, Java, Go, Rust, C++, TypeScript, and Kotlin, with a 200K context window and FP8 native quantization for production-grade efficiency. Served instantly via the Qubrid AI Serverless API.
๐Ÿ’ป 74% SWE-bench Verified. 23:1 sparsity. Claude Sonnet 4.5-level coding at open-source cost. Deploy on Qubrid AI โ€” no multi-GPU cluster required.

Model Specifications

FieldDetails
Model IDMiniMaxAI/MiniMax-M2.1
ProviderMiniMax
KindChat / LLM
ArchitectureSparse MoE Transformer โ€” 230B total / 10B active per token, FP8 quantization
Parameters230B total (10B active per forward pass)
Context Length200,000 Tokens
MoENo
Release DateDecember 2025
LicenseModified MIT License
Training DataLarge-scale multilingual code and instruction datasets across major programming languages
Function CallingNot Supported
Image SupportN/A
Serverless APIAvailable
Fine-tuningComing Soon
On-demandComing Soon
State๐ŸŸข Ready

Pricing

๐Ÿ’ณ Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
Token TypePrice per 1M Tokens
Input Tokens$0.30
Input Tokens (Cached)$0.03
Output Tokens$1.20

Quickstart

Prerequisites

  1. Create a free account at platform.qubrid.com
  2. Generate your API key from the API Keys section
  3. Replace QUBRID_API_KEY in the code below with your actual key
๐Ÿ’ก Recommended parameters: Use temperature=1.0, top_p=0.95, top_k=40 for best performance โ€” as specified by MiniMax on the official model card.

Python

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
    base_url="https://platform.qubrid.com/v1",
    api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M2.1",
    messages=[
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    max_tokens=8192,
    temperature=1,
    top_p=0.95,
    stream=True
)

# If stream = False comment this out
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

JavaScript

import OpenAI from "openai";

// Initialize the OpenAI client with Qubrid base URL
const client = new OpenAI({
  baseURL: "https://platform.qubrid.com/v1",
  apiKey: "QUBRID_API_KEY",
});

// Create a streaming chat completion
const stream = await client.chat.completions.create({
  model: "MiniMaxAI/MiniMax-M2.1",
  messages: [
    {
      role: "user",
      content: "Explain quantum computing in simple terms",
    },
  ],
  max_tokens: 8192,
  temperature: 1,
  top_p: 0.95,
  stream: true,
});

// If stream = false comment this out
for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}
console.log("\n");

// If stream = true comment this out
console.log(stream.choices[0].message.content);

Go

package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
)

func main() {
	url := "https://platform.qubrid.com/v1/chat/completions"

	data := map[string]interface{}{
		"model": "MiniMaxAI/MiniMax-M2.1",
		"messages": []map[string]string{
			{
				"role":    "user",
				"content": "Explain quantum computing in simple terms",
			},
		},
		"temperature": 1,
		"max_tokens":  8192,
		"stream":      true,
		"top_p":       0.95,
	}

	jsonData, _ := json.Marshal(data)
	req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
	req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
	req.Header.Set("Content-Type", "application/json")

	client := &http.Client{}
	res, _ := client.Do(req)
	defer res.Body.Close()

	scanner := bufio.NewScanner(res.Body)
	for scanner.Scan() {
		line := scanner.Text()
		if line != "" {
			fmt.Println(line)
		}
	}
}

cURL

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "MiniMaxAI/MiniMax-M2.1",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  "temperature": 1,
  "max_tokens": 8192,
  "stream": true,
  "top_p": 0.95
}'

Live Example

Prompt: Write a type-safe REST API client in TypeScript with error handling and retry logic
Response:
import axios, { AxiosInstance, AxiosRequestConfig, AxiosResponse } from "axios";

interface RetryConfig {
  maxRetries: number;
  delayMs: number;
  backoffMultiplier: number;
}

interface ApiClientConfig {
  baseURL: string;
  timeout?: number;
  retry?: RetryConfig;
  headers?: Record<string, string>;
}

class ApiError extends Error {
  constructor(
    public statusCode: number,
    public message: string,
    public data?: unknown,
  ) {
    super(message);
    this.name = "ApiError";
  }
}

async function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

class RestApiClient {
  private client: AxiosInstance;
  private retryConfig: RetryConfig;

  constructor(config: ApiClientConfig) {
    this.client = axios.create({
      baseURL: config.baseURL,
      timeout: config.timeout ?? 10000,
      headers: {
        "Content-Type": "application/json",
        ...config.headers,
      },
    });

    this.retryConfig = config.retry ?? {
      maxRetries: 3,
      delayMs: 500,
      backoffMultiplier: 2,
    };
  }

  private async executeWithRetry<T>(
    requestFn: () => Promise<AxiosResponse<T>>,
    attempt = 0,
  ): Promise<T> {
    try {
      const response = await requestFn();
      return response.data;
    } catch (error) {
      if (axios.isAxiosError(error)) {
        const status = error.response?.status ?? 0;

        // Don't retry client errors (4xx) except 429 (rate limit)
        if (status >= 400 && status < 500 && status !== 429) {
          throw new ApiError(status, error.message, error.response?.data);
        }

        if (attempt < this.retryConfig.maxRetries) {
          const delay =
            this.retryConfig.delayMs *
            Math.pow(this.retryConfig.backoffMultiplier, attempt);
          await sleep(delay);
          return this.executeWithRetry(requestFn, attempt + 1);
        }

        throw new ApiError(status, `Max retries exceeded: ${error.message}`);
      }
      throw error;
    }
  }

  async get<T>(path: string, config?: AxiosRequestConfig): Promise<T> {
    return this.executeWithRetry(() => this.client.get<T>(path, config));
  }

  async post<T, B = unknown>(
    path: string,
    body: B,
    config?: AxiosRequestConfig,
  ): Promise<T> {
    return this.executeWithRetry(() => this.client.post<T>(path, body, config));
  }
}

// Usage example
const api = new RestApiClient({
  baseURL: "https://api.example.com",
  retry: { maxRetries: 3, delayMs: 500, backoffMultiplier: 2 },
});

const data = await api.get<{ id: number; name: string }>("/users/1");
console.log(data.name);
Try it yourself in the Qubrid AI Playground โ†’

Playground Features

The Qubrid AI Playground lets you interact with MiniMax M2.1 directly in your browser โ€” no setup, no code, no cost to explore.

๐Ÿง  System Prompt

Define the modelโ€™s coding language, style, and workflow constraints before the conversation begins โ€” ideal for polyglot development sessions and long-horizon agentic coding workflows.
Example: "You are a senior full-stack engineer. For every coding task:
1. Write production-ready code with proper error handling and type safety.
2. Include inline comments for non-obvious logic.
3. Add a brief complexity analysis and note any edge cases."
Set your system prompt once in the Qubrid Playground and it applies across every turn of the conversation.

๐ŸŽฏ Few-Shot Examples

Establish your preferred code style and language conventions with concrete examples โ€” no fine-tuning required.
User InputAssistant Response
Write a Go function to check if a number is primefunc isPrime(n int) bool { if n < 2 { return false }; for i := 2; i*i <= n; i++ { if n%i == 0 { return false } }; return true }
Refactor: nested for loops checking duplicates in a listUse a hash set: seen := make(map[int]bool); for _, v := range list { if seen[v] { return true }; seen[v] = true }; return false โ€” O(n) vs O(nยฒ)
๐Ÿ’ก Stack multiple few-shot examples in the Qubrid Playground to lock in language preference, code style, and output format โ€” no fine-tuning required.

Inference Parameters

ParameterTypeDefaultDescription
StreamingbooleantrueEnable streaming responses for real-time output
Temperaturenumber1Recommended at 1.0 for best performance
Max Tokensnumber8192Maximum number of tokens the model can generate
Top Pnumber0.95Controls nucleus sampling
Top Knumber40Limits token sampling to top-k tokens

Use Cases

  1. Multilingual software development
  2. Long-horizon agentic coding
  3. Code review and optimization
  4. Full-stack app generation
  5. Office automation workflows
  6. Complex multi-step tool use

Strengths & Limitations

StrengthsLimitations
74% SWE-bench Verified โ€” competitive with Claude Sonnet 4.5Less reliable than frontier closed models for deep debugging
230B MoE with only 10B active โ€” 23:1 sparsity for extreme efficiencySparse activation may miss niche language idioms
200K context window for full-codebase analysisVery large model size requires multi-GPU setup for self-hosting
Best-in-class polyglot coding across 7 major languagesFunction calling not supported via API
FP8 native quantization for production-grade efficiency
Open weights โ€” fully available for local and on-premise deployment

Why Qubrid AI?

  • ๐Ÿš€ No infrastructure setup โ€” 230B MoE served serverlessly at just $0.30/1M input tokens
  • ๐Ÿ” OpenAI-compatible โ€” drop-in replacement using the same SDK, just swap the base URL
  • ๐Ÿ’ฐ Cached input pricing โ€” $0.03/1M for cached tokens โ€” the lowest cached rate across all models on the platform
  • ๐Ÿ’ป Polyglot by design โ€” MiniMax M2.1โ€™s 7-language coding strength pairs with Qubridโ€™s low-latency infrastructure for fast development loops
  • ๐Ÿงช Built-in Playground โ€” prototype with system prompts and few-shot examples instantly at platform.qubrid.com
  • ๐Ÿ“Š Full observability โ€” API logs and usage tracking built into the Qubrid dashboard

Resources

ResourceLink
๐Ÿ“– Qubrid Docsdocs.platform.qubrid.com
๐ŸŽฎ PlaygroundTry MiniMax M2.1 live
๐Ÿ”‘ API KeysGet your API Key
๐Ÿค— Hugging FaceMiniMaxAI/MiniMax-M2.1
๐Ÿ’ฌ DiscordJoin the Qubrid Community

Built with โค๏ธ by Qubrid AI

Frontier models. Serverless infrastructure. Zero friction.