Models

102 of 102 models

M

MiniMax M3

MiniMax

MiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.

1MFree/M in
G

DiffusionGemma 26B

Gemma

Diffusion-based 26B parameter LLM enabling parallel token generation for real-time text apps.

8KFree/M in
N

Nemotron 3 Ultra 550B

Nemotron

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more.

1MFree/M in
N

Nemotron 3.5 Content Safety

Nemotron

Multilingual, multimodal model for detecting unsafe and toxic content.

8KFree/M in
N

Cosmos 3 Nano

Cosmos

Generates physics-aware videos from text prompts or an image prompt for physical AI development.

8KFree/M in
N

Cosmos 3 Nano Reasoner

Cosmos

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

131KFree/M in
S

Step 3.7 Flash

Step

A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.

131KFree/M in
M

Kimi K2.6

Kimi

1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.

2MFree/M in
M

Mistral Medium 3.5 128B

Mistral

A high performing model for text generation, coding and agentic use cases.

128KFree/M in
N

Nemotron 3 Nano Omni 30B

Nemotron

Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.

131KFree/M in
D

DeepSeek V4 Flash

DeepSeek

DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.

1MFree/M in
D

DeepSeek V4 Pro

DeepSeek

DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.

1MFree/M in
Z

GLM-5.1

GLM

GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.

131KFree/M in
N

Nemotron 3 Content Safety

Nemotron

Multilingual, multimodal model for detecting unsafe and toxic content.

8KFree/M in
N

Synthetic Video Detector

NVIDIA

NVIDIA Synthetic Video Detector is an AI-powered micro-service for detecting AI-generated (synthetic) videos.

8KFree/M in
N

Active Speaker Detection

NVIDIA

Detect and track speaker identities across video frames.

8KFree/M in
N

Ising Calibration 1 35B

NVIDIA

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

8KFree/M in
M

MiniMax M2.7

MiniMax

MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

131KFree/M in
G

Gemma 4 31B

Gemma

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

131KFree/M in
N

Nemotron VoiceChat

Nemotron

Nemotron 3 Voicechat — real-time voice conversation AI.

8KFree/M in
M

Mistral Small 4 119B

Mistral

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context.

256KFree/M in
N

Nemotron 3 Super 120B

Nemotron

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more.

1MFree/M in
Q

Qwen 3.5 122B

Qwen

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

131KFree/M in
N

GLiNER PII

NVIDIA

GLiNER PII detects Personally Identifiable Information in text.

8KFree/M in
N

Cosmos Transfer 2.5 2B

Cosmos

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

8KFree/M in
Q

Qwen 3.5 397B

Qwen

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

131KFree/M in
S

Step 3.5 Flash

Step

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

131KFree/M in
N

Nemotron Content Safety Reasoning 4B

Nemotron

A context-aware safety model that applies reasoning to enforce domain-specific policies.

8KFree/M in
N

Nemotron 3 Nano 30B

Nemotron

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more.

1MFree/M in
N

Riva Translate 4B

Riva

Translation model in 12 languages with few-shots example prompts capability.

8KFree/M in
M

Mistral Large 3 675B

Mistral

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

128KFree/M in
M

Ministral 14B

Mistral

A general purpose VLM ideal for chat and instruction based use cases.

128KFree/M in
N

StreamPETR

NVIDIA

StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.

8KFree/M in
N

Llama 3.1 Nemotron Safety Guard 8B v3

Nemotron

Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs.

8KFree/M in
N

Nemotron Nano 12B v2 VL

Nemotron

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

131KFree/M in
S

Stockmark 2 100B

Stockmark

Japanese-specialized large-language-model for enterprises to read and understand complex business documents.

131KFree/M in
Q

Qwen 3 Next 80B

Qwen

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

256KFree/M in
B

Seed OSS 36B

Seed

ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.

131KFree/M in
N

Nemotron Nano 9B v2

Nemotron

High-efficiency LLM with hybrid Transformer-Mamba design, excelling in reasoning and agentic tasks.

131KFree/M in
O

GPT-OSS 120B

GPT-OSS

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

131KFree/M in
O

GPT-OSS 20B

GPT-OSS

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math.

131KFree/M in
N

Llama 3.3 Nemotron Super 49B v1.5

Nemotron

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

131KFree/M in
S

Sarvam M

Sarvam

Multilingual, hybrid-reasoning model optimized for Indian language tasks, programming, mathematical reasoning capabilities.

131KFree/M in
N

NV-Embed v1

NVIDIA

Generates high-quality numerical embeddings from text inputs.

8KFree/M in
N

Llama 3.3 Nemotron Super 49B v1

Nemotron

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

131KFree/M in
N

SparseDrive

NVIDIA

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

8KFree/M in
N

BEVFormer

NVIDIA

Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.

8KFree/M in
M

Mixtral 8x7B Instruct v0.1

Mixtral

An MOE LLM that follows instructions, completes requests, and generates creative text.

33KFree/M in
M

Llama 4 Maverick 17B

Llama

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

131KFree/M in
G

Gemma 3n E4B

Gemma

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments.

33KFree/M in
G

Gemma 3n E2B

Gemma

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments.

33KFree/M in
M

Llama 3.1 8B Instruct

Llama

Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.

131KFree/M in
N

Llama 3.1 Nemotron Nano VL 8B v1

Nemotron

Multi-modal vision-language model that understands text/img and creates informative responses.

131KFree/M in
N

Cosmos Transfer 1 7B

Cosmos

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

8KFree/M in
M

Llama Guard 4 12B

Llama Guard

Multi-modal model to classify safety for input prompts as well output responses.

131KFree/M in
N

Llama 3.1 Nemotron Nano 8B v1

Nemotron

Leading reasoning and agentic AI accuracy model for PC and edge.

131KFree/M in
N

Background Noise Removal

NVIDIA

Removes unwanted noises from audio improving speech intelligibility.

8KFree/M in
N

Studio Voice

NVIDIA

Enhance input speech recorded with low-quality microphones in noisy or reverberant environments, producing studio-quality speech.

8KFree/M in
N

Magpie TTS Zero-shot

NVIDIA

Expressive and engaging text-to-speech, generated from a short audio sample.

8KFree/M in
M

Llama 3.3 70B Instruct

Llama

Advanced LLM for reasoning, math, general knowledge, and function calling.

131KFree/M in
M

Llama 3.1 70B Instruct

Llama

Powers complex conversations with superior contextual understanding, reasoning and text generation.

131KFree/M in
M

ESMFold

ESM

Predicts the 3D structure of a protein from its amino acid sequence.

8KFree/M in
M

Mistral Nemotron

Mistral

Built for agentic workflows, this model excels in coding, instruction following, and function calling.

128KFree/M in
M

Llama 3.2 90B Vision

Llama

Cutting-edge vision-language model exceling in high-quality reasoning from images.

131KFree/M in
M

Llama 3.2 11B Vision

Llama

Cutting-edge vision-language model exceling in high-quality reasoning from images.

131KFree/M in
N

NV-EmbedCode 7B

NVIDIA

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

8KFree/M in
M

Phi-4 Multimodal

Phi

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

8KFree/M in
G

Gemma 2 2B

Gemma

Advanced small language generative AI model for edge applications.

8KFree/M in
M

Llama 3.2 3B Instruct

Llama

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

131KFree/M in
M

Phi-4 Mini

Phi

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments.

8KFree/M in
A

Dracarys Llama 3.1 70B

Dracarys

Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.

131KFree/M in
M

Llama 3.2 1B Instruct

Llama

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

131KFree/M in
U

Solar 10.7B Instruct

Solar

Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.

8KFree/M in
M

ESM2 650M

ESM

Generates embeddings of proteins from their amino acid sequences.

8KFree/M in
A

Claude 3.7 Sonnet

Claude

Most intelligent Claude model with extended thinking mode. Best for complex reasoning, coding, and agentic tasks.

200K$3/M in
x

Grok 3

Grok

Frontier model from xAI with real-time knowledge access. Strong at math, reasoning, and coding.

131K$3/M in
x

Grok 3 Mini

Grok

Reasoning-focused mini model. Faster and cheaper than Grok 3 while maintaining strong reasoning capability.

131K$0.30/M in
P

Sonar Pro

Sonar

Online model with real-time web search. Returns cited answers with up-to-date information.

200K$3/M in
G

Gemini 2.0 Flash

Gemini

Fast, capable multimodal model with native tool use, spatial understanding, and real-time audio/video.

1M$0.10/M in
O

o3-mini

o-series

Reasoning model optimized for STEM and coding. Delivers high capability at a fraction of o1 pricing.

200K$1/M in
G

Gemini 2.0 Flash Thinking

Gemini

Reasoning variant of Gemini 2.0 Flash with visible thought process. Excels at complex multi-step problems.

1M$0.10/M in
D

DeepSeek R1

DeepSeek

Open-weight reasoning model rivalling o1 at a fraction of the cost. Excellent for math and coding.

64K$0.55/M in
N

Rerank QA Mistral 4B

NVIDIA

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

8KFree/M in
M

MiniMax-01

MiniMax

Large Mixture-of-Experts model with 456B parameters and 45.9B active. Massive 1M token context window.

1M$0.20/M in
D

DeepSeek V3

DeepSeek

671B parameter Mixture-of-Experts model with 37B active. Strong general-purpose performance at low cost.

64K$0.27/M in
Q

QwQ 32B

Qwen

Reasoning model from Alibaba with strong math and coding performance. Open-weight and cost-effective.

131K$0.15/M in
O

o1

o-series

Frontier reasoning model that thinks before answering. Excels at complex math, science, and coding.

200K$15/M in
Q

Qwen 2.5 72B Instruct

Qwen

Versatile multilingual model with 72B parameters. Strong at code generation and multilingual tasks.

131K$0.23/M in
Q

Qwen 2.5 Coder 32B

Qwen

Code-specialized model matching GPT-4o on coding benchmarks. Supports 29+ programming languages.

131K$0.10/M in
A

Claude 3.5 Haiku

Claude

Fast, affordable model with strong performance. Great for high-volume, latency-sensitive workloads.

200K$0.80/M in
G

PaliGemma

PaliGemma

Vision language model adept at comprehending text and visual inputs to produce informative responses.

8KFree/M in
N

Nemotron Mini 4B

Nemotron

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling.

8KFree/M in
N

Hermes 3 405B

Hermes

Fine-tune of Llama 3.1 405B with enhanced roleplay, creative writing, and agentic capabilities.

131K$0.80/M in
O

GPT-4o Mini

GPT

Affordable, intelligent small model for fast, lightweight tasks. Supports vision and function calling.

128K$0.15/M in
O

GPT-4o

GPT

Multimodal flagship model with vision, audio, and text capabilities. Faster and cheaper than GPT-4 Turbo.

128K$3/M in
N

Llama 3 Lumimaid 8B

Lumimaid

Llama 3 8B fine-tune trained on curated roleplay data. Excellent for creative and conversational use.

25K$0.04/M in
O

GPT-4 Turbo

GPT

Previous-generation flagship with 128K context. Strong general-purpose model with vision support.

128K$10/M in
A

Claude 3 Opus

Claude

Powerful model for highly complex tasks. Top-tier performance on reasoning, analysis, and creative writing.

200K$15/M in
A

Claude 3 Sonnet

Claude

Balanced model for enterprise workloads. Good balance of speed and intelligence.

200K$3/M in
G

Gemini 1.5 Pro

Gemini

Mid-generation multimodal model with 2M token context. Strong at long-document and video understanding.

2M$1/M in
G

Gemini 1.5 Flash

Gemini

Fast and versatile model across modalities. Optimized for high-volume, cost-sensitive tasks.

1M$0.07/M in
O

GPT-3.5 Turbo

GPT

Fast, inexpensive model for simple tasks. Good for classification, summarization, and basic chat.

16K$0.50/M in