Models
102 of 102 models
MiniMax M3
MiniMaxMiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.
DiffusionGemma 26B
GemmaDiffusion-based 26B parameter LLM enabling parallel token generation for real-time text apps.
Nemotron 3 Ultra 550B
NemotronOpen, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more.
Nemotron 3.5 Content Safety
NemotronMultilingual, multimodal model for detecting unsafe and toxic content.
Cosmos 3 Nano
CosmosGenerates physics-aware videos from text prompts or an image prompt for physical AI development.
Cosmos 3 Nano Reasoner
CosmosVision language model that excels in understanding the physical world using structured reasoning on videos or images.
Step 3.7 Flash
StepA sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.
Kimi K2.6
Kimi1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.
Mistral Medium 3.5 128B
MistralA high performing model for text generation, coding and agentic use cases.
Nemotron 3 Nano Omni 30B
NemotronNemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.
DeepSeek V4 Flash
DeepSeekDeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.
DeepSeek V4 Pro
DeepSeekDeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.
GLM-5.1
GLMGLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.
Nemotron 3 Content Safety
NemotronMultilingual, multimodal model for detecting unsafe and toxic content.
Synthetic Video Detector
NVIDIANVIDIA Synthetic Video Detector is an AI-powered micro-service for detecting AI-generated (synthetic) videos.
Active Speaker Detection
NVIDIADetect and track speaker identities across video frames.
Ising Calibration 1 35B
NVIDIAOpen VLM for quantum computer calibration chart understanding across a range of qubit modalities.
MiniMax M2.7
MiniMaxMiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.
Gemma 4 31B
GemmaDense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.
Nemotron VoiceChat
NemotronNemotron 3 Voicechat — real-time voice conversation AI.
Mistral Small 4 119B
MistralHybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context.
Nemotron 3 Super 120B
NemotronOpen, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more.
Qwen 3.5 122B
Qwen122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
GLiNER PII
NVIDIAGLiNER PII detects Personally Identifiable Information in text.
Cosmos Transfer 2.5 2B
CosmosGenerates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
Qwen 3.5 397B
QwenNext-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
Step 3.5 Flash
Step200B open-source reasoning engine with sparse MoE powering frontier agentic AI.
Nemotron Content Safety Reasoning 4B
NemotronA context-aware safety model that applies reasoning to enforce domain-specific policies.
Nemotron 3 Nano 30B
NemotronOpen, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more.
Riva Translate 4B
RivaTranslation model in 12 languages with few-shots example prompts capability.
Mistral Large 3 675B
MistralA state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
Ministral 14B
MistralA general purpose VLM ideal for chat and instruction based use cases.
StreamPETR
NVIDIAStreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.
Llama 3.1 Nemotron Safety Guard 8B v3
NemotronLeading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs.
Nemotron Nano 12B v2 VL
NemotronNemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
Stockmark 2 100B
StockmarkJapanese-specialized large-language-model for enterprises to read and understand complex business documents.
Qwen 3 Next 80B
QwenQwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.
Seed OSS 36B
SeedByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
Nemotron Nano 9B v2
NemotronHigh-efficiency LLM with hybrid Transformer-Mamba design, excelling in reasoning and agentic tasks.
GPT-OSS 120B
GPT-OSSMixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
GPT-OSS 20B
GPT-OSSSmaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math.
Llama 3.3 Nemotron Super 49B v1.5
NemotronHigh efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
Sarvam M
SarvamMultilingual, hybrid-reasoning model optimized for Indian language tasks, programming, mathematical reasoning capabilities.
NV-Embed v1
NVIDIAGenerates high-quality numerical embeddings from text inputs.
Llama 3.3 Nemotron Super 49B v1
NemotronHigh efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
SparseDrive
NVIDIAEnd-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.
BEVFormer
NVIDIAAdvanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.
Mixtral 8x7B Instruct v0.1
MixtralAn MOE LLM that follows instructions, completes requests, and generates creative text.
Llama 4 Maverick 17B
LlamaA general purpose multimodal, multilingual 128 MoE model with 17B parameters.
Gemma 3n E4B
GemmaAn edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments.
Gemma 3n E2B
GemmaAn edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments.
Llama 3.1 8B Instruct
LlamaAdvanced state-of-the-art model with language understanding, superior reasoning, and text generation.
Llama 3.1 Nemotron Nano VL 8B v1
NemotronMulti-modal vision-language model that understands text/img and creates informative responses.
Cosmos Transfer 1 7B
CosmosGenerates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
Llama Guard 4 12B
Llama GuardMulti-modal model to classify safety for input prompts as well output responses.
Llama 3.1 Nemotron Nano 8B v1
NemotronLeading reasoning and agentic AI accuracy model for PC and edge.
Background Noise Removal
NVIDIARemoves unwanted noises from audio improving speech intelligibility.
Studio Voice
NVIDIAEnhance input speech recorded with low-quality microphones in noisy or reverberant environments, producing studio-quality speech.
Magpie TTS Zero-shot
NVIDIAExpressive and engaging text-to-speech, generated from a short audio sample.
Llama 3.3 70B Instruct
LlamaAdvanced LLM for reasoning, math, general knowledge, and function calling.
Llama 3.1 70B Instruct
LlamaPowers complex conversations with superior contextual understanding, reasoning and text generation.
ESMFold
ESMPredicts the 3D structure of a protein from its amino acid sequence.
Mistral Nemotron
MistralBuilt for agentic workflows, this model excels in coding, instruction following, and function calling.
Llama 3.2 90B Vision
LlamaCutting-edge vision-language model exceling in high-quality reasoning from images.
Llama 3.2 11B Vision
LlamaCutting-edge vision-language model exceling in high-quality reasoning from images.
NV-EmbedCode 7B
NVIDIAThe NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.
Phi-4 Multimodal
PhiCutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
Gemma 2 2B
GemmaAdvanced small language generative AI model for edge applications.
Llama 3.2 3B Instruct
LlamaAdvanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
Phi-4 Mini
PhiLightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments.
Dracarys Llama 3.1 70B
DracarysFine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.
Llama 3.2 1B Instruct
LlamaAdvanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
Solar 10.7B Instruct
SolarExcels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.
ESM2 650M
ESMGenerates embeddings of proteins from their amino acid sequences.
Claude 3.7 Sonnet
ClaudeMost intelligent Claude model with extended thinking mode. Best for complex reasoning, coding, and agentic tasks.
Grok 3
GrokFrontier model from xAI with real-time knowledge access. Strong at math, reasoning, and coding.
Grok 3 Mini
GrokReasoning-focused mini model. Faster and cheaper than Grok 3 while maintaining strong reasoning capability.
Sonar Pro
SonarOnline model with real-time web search. Returns cited answers with up-to-date information.
Gemini 2.0 Flash
GeminiFast, capable multimodal model with native tool use, spatial understanding, and real-time audio/video.
o3-mini
o-seriesReasoning model optimized for STEM and coding. Delivers high capability at a fraction of o1 pricing.
Gemini 2.0 Flash Thinking
GeminiReasoning variant of Gemini 2.0 Flash with visible thought process. Excels at complex multi-step problems.
DeepSeek R1
DeepSeekOpen-weight reasoning model rivalling o1 at a fraction of the cost. Excellent for math and coding.
Rerank QA Mistral 4B
NVIDIAGPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
MiniMax-01
MiniMaxLarge Mixture-of-Experts model with 456B parameters and 45.9B active. Massive 1M token context window.
DeepSeek V3
DeepSeek671B parameter Mixture-of-Experts model with 37B active. Strong general-purpose performance at low cost.
QwQ 32B
QwenReasoning model from Alibaba with strong math and coding performance. Open-weight and cost-effective.
o1
o-seriesFrontier reasoning model that thinks before answering. Excels at complex math, science, and coding.
Qwen 2.5 72B Instruct
QwenVersatile multilingual model with 72B parameters. Strong at code generation and multilingual tasks.
Qwen 2.5 Coder 32B
QwenCode-specialized model matching GPT-4o on coding benchmarks. Supports 29+ programming languages.
Claude 3.5 Haiku
ClaudeFast, affordable model with strong performance. Great for high-volume, latency-sensitive workloads.
PaliGemma
PaliGemmaVision language model adept at comprehending text and visual inputs to produce informative responses.
Nemotron Mini 4B
NemotronOptimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling.
Hermes 3 405B
HermesFine-tune of Llama 3.1 405B with enhanced roleplay, creative writing, and agentic capabilities.
GPT-4o Mini
GPTAffordable, intelligent small model for fast, lightweight tasks. Supports vision and function calling.
GPT-4o
GPTMultimodal flagship model with vision, audio, and text capabilities. Faster and cheaper than GPT-4 Turbo.
Llama 3 Lumimaid 8B
LumimaidLlama 3 8B fine-tune trained on curated roleplay data. Excellent for creative and conversational use.
GPT-4 Turbo
GPTPrevious-generation flagship with 128K context. Strong general-purpose model with vision support.
Claude 3 Opus
ClaudePowerful model for highly complex tasks. Top-tier performance on reasoning, analysis, and creative writing.
Claude 3 Sonnet
ClaudeBalanced model for enterprise workloads. Good balance of speed and intelligence.
Gemini 1.5 Pro
GeminiMid-generation multimodal model with 2M token context. Strong at long-document and video understanding.
Gemini 1.5 Flash
GeminiFast and versatile model across modalities. Optimized for high-volume, cost-sensitive tasks.
GPT-3.5 Turbo
GPTFast, inexpensive model for simple tasks. Good for classification, summarization, and basic chat.