Models

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

Step 3.5 Flash

Step

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

Nemotron Content Safety Reasoning 4B

A context-aware safety model that applies reasoning to enforce domain-specific policies.

Nemotron 3 Nano 30B

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more.

Riva Translate 4B

Riva

Translation model in 12 languages with few-shots example prompts capability.

Mistral Large 3 675B

Mistral

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

Ministral 14B

Mistral

A general purpose VLM ideal for chat and instruction based use cases.

StreamPETR

StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.

Llama 3.1 Nemotron Safety Guard 8B v3

Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs.

Nemotron Nano 12B v2 VL

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

Stockmark 2 100B

Stockmark

Japanese-specialized large-language-model for enterprises to read and understand complex business documents.

Qwen 3 Next 80B

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

Seed OSS 36B

Seed

ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.

Nemotron Nano 9B v2

High-efficiency LLM with hybrid Transformer-Mamba design, excelling in reasoning and agentic tasks.

GPT-OSS 120B

GPT-OSS

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

GPT-OSS 20B

GPT-OSS

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math.

Llama 3.3 Nemotron Super 49B v1.5

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

Sarvam M

Sarvam

Multilingual, hybrid-reasoning model optimized for Indian language tasks, programming, mathematical reasoning capabilities.

NV-Embed v1

Generates high-quality numerical embeddings from text inputs.

Llama 3.3 Nemotron Super 49B v1

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

SparseDrive

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

BEVFormer

Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.

Mixtral 8x7B Instruct v0.1

Mixtral

An MOE LLM that follows instructions, completes requests, and generates creative text.

Llama 4 Maverick 17B

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

Gemma 3n E4B

Gemma

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments.

Gemma 3n E2B

Gemma

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments.

Llama 3.1 8B Instruct

Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.

Llama 3.1 Nemotron Nano VL 8B v1

Multi-modal vision-language model that understands text/img and creates informative responses.

Cosmos Transfer 1 7B

Cosmos

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Llama Guard 4 12B

Llama Guard

Multi-modal model to classify safety for input prompts as well output responses.

Llama 3.1 Nemotron Nano 8B v1

Leading reasoning and agentic AI accuracy model for PC and edge.

Background Noise Removal

Removes unwanted noises from audio improving speech intelligibility.

Studio Voice

Enhance input speech recorded with low-quality microphones in noisy or reverberant environments, producing studio-quality speech.

Magpie TTS Zero-shot

Expressive and engaging text-to-speech, generated from a short audio sample.

Llama 3.3 70B Instruct

Advanced LLM for reasoning, math, general knowledge, and function calling.

Llama 3.1 70B Instruct

Powers complex conversations with superior contextual understanding, reasoning and text generation.

ESMFold

ESM

Predicts the 3D structure of a protein from its amino acid sequence.

Mistral Nemotron

Mistral

Built for agentic workflows, this model excels in coding, instruction following, and function calling.

Llama 3.2 90B Vision

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Llama 3.2 11B Vision

Cutting-edge vision-language model exceling in high-quality reasoning from images.

NV-EmbedCode 7B

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

Phi-4 Multimodal

Phi

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Gemma 2 2B

Gemma

Advanced small language generative AI model for edge applications.

Llama 3.2 3B Instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

Phi-4 Mini

Phi

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments.

Dracarys Llama 3.1 70B

Dracarys

Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.

Llama 3.2 1B Instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

Solar 10.7B Instruct

Solar

Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.

ESM2 650M

ESM

Generates embeddings of proteins from their amino acid sequences.

Claude 3.7 Sonnet

Most intelligent Claude model with extended thinking mode. Best for complex reasoning, coding, and agentic tasks.

Grok 3

Grok

Frontier model from xAI with real-time knowledge access. Strong at math, reasoning, and coding.

Grok 3 Mini

Grok

Reasoning-focused mini model. Faster and cheaper than Grok 3 while maintaining strong reasoning capability.

Sonar Pro

Sonar

Online model with real-time web search. Returns cited answers with up-to-date information.

Gemini 2.0 Flash

Fast, capable multimodal model with native tool use, spatial understanding, and real-time audio/video.

o3-mini

o-series

Reasoning model optimized for STEM and coding. Delivers high capability at a fraction of o1 pricing.

Gemini 2.0 Flash Thinking

Reasoning variant of Gemini 2.0 Flash with visible thought process. Excels at complex multi-step problems.

DeepSeek R1

DeepSeek

Open-weight reasoning model rivalling o1 at a fraction of the cost. Excellent for math and coding.

Rerank QA Mistral 4B

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

MiniMax-01

MiniMax

Large Mixture-of-Experts model with 456B parameters and 45.9B active. Massive 1M token context window.

DeepSeek V3

DeepSeek

671B parameter Mixture-of-Experts model with 37B active. Strong general-purpose performance at low cost.

QwQ 32B

Reasoning model from Alibaba with strong math and coding performance. Open-weight and cost-effective.

o1

o-series

Frontier reasoning model that thinks before answering. Excels at complex math, science, and coding.

Qwen 2.5 72B Instruct

Versatile multilingual model with 72B parameters. Strong at code generation and multilingual tasks.

Qwen 2.5 Coder 32B

Code-specialized model matching GPT-4o on coding benchmarks. Supports 29+ programming languages.

Claude 3.5 Haiku

Fast, affordable model with strong performance. Great for high-volume, latency-sensitive workloads.

PaliGemma

Vision language model adept at comprehending text and visual inputs to produce informative responses.

Nemotron Mini 4B

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling.

Hermes 3 405B

Hermes

Fine-tune of Llama 3.1 405B with enhanced roleplay, creative writing, and agentic capabilities.

GPT-4o Mini

Affordable, intelligent small model for fast, lightweight tasks. Supports vision and function calling.

GPT-4o

Multimodal flagship model with vision, audio, and text capabilities. Faster and cheaper than GPT-4 Turbo.

Llama 3 Lumimaid 8B

Lumimaid

Llama 3 8B fine-tune trained on curated roleplay data. Excellent for creative and conversational use.

GPT-4 Turbo

Previous-generation flagship with 128K context. Strong general-purpose model with vision support.

Claude 3 Opus

Powerful model for highly complex tasks. Top-tier performance on reasoning, analysis, and creative writing.

Claude 3 Sonnet

Balanced model for enterprise workloads. Good balance of speed and intelligence.

Gemini 1.5 Pro

Mid-generation multimodal model with 2M token context. Strong at long-document and video understanding.

Gemini 1.5 Flash

Fast and versatile model across modalities. Optimized for high-volume, cost-sensitive tasks.

GPT-3.5 Turbo