VLM Run Orion

Introducing Orion – the visual agent

that sees, reasons and acts.

Introducing Orion – the first visual agent
that sees, reasons and acts.

Introducing Orion – the visual agent that sees, reasons and acts.

Introducing Orion – the first visual agent that sees, reasons and acts.

Orion is the first visual agent that can analyze images, videos, and documents – and act on them with precision, through a unified chat-completions API.

Try Orion

Try Orion

Try Orion

Try Orion

Orion Whitepaper

Orion Whitepaper

Orion Whitepaper

Orion Whitepaper

Loved by leading

AI companies

Loved by leading

AI companies

Loved by leading

AI companies

Interact with images, videos, documents in a single API.

Meet the agent that sees, reasons and acts

Frontier models like GPT, Claude, and Gemini can describe what they see – but they can’t act on it. Orion unites the reasoning power of large Vision-Language Models with the accuracy of specialized computer-vision tools – all through one unified API.

One interface for all your visual AI needs.

Images, documents, and videos – all through a single chat-completions interface.

Compose through conversation.

Chain visual operations like: detect → crop → enhance → analyze in a single conversation.

Integrate at warp speed

Drop-in replacement for the OpenAI SDK.
Same API pattern – new visual powers.

Drop-in replacement for the OpenAI SDK. Same API pattern – new visual powers.

Drop-in replacement for the OpenAI SDK. Same API pattern – new visual powers.

Auditable outputs

Every response comes with visual proof. Build and integrate with confidence.

One Agent – Infinite Possibilities.

One Agent – Infinite Possibilities.

At the core of Orion is a unified architecture that powers understanding, reasoning, and action across every visual modality.

Ridiculously versatile.

Ridiculously versatile.

Whatever your visual task – Orion knows how to act. Built with dozens of specialized computer-vision and multi-modal tools.

01

Image AI

Caption & Tag

Generate rich, contextual descriptions and semantic labels for any image.

Detection

Generate rich, contextual descriptions and semantic labels for any image.

Segmentation

Generate rich, contextual descriptions and semantic labels for any image.

Pointing

Generate rich, contextual descriptions and semantic labels for any image.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

UI Parsing

Generate rich, contextual descriptions and semantic labels for any image.

Image Tools

Generate rich, contextual descriptions and semantic labels for any image.

01

Image AI

Caption & Tag

Generate rich, contextual descriptions and semantic labels for any image.

Detection

Generate rich, contextual descriptions and semantic labels for any image.

Segmentation

Generate rich, contextual descriptions and semantic labels for any image.

Pointing

Generate rich, contextual descriptions and semantic labels for any image.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

UI Parsing

Generate rich, contextual descriptions and semantic labels for any image.

Image Tools

Generate rich, contextual descriptions and semantic labels for any image.

01

Image AI

Caption & Tag

Generate rich, contextual descriptions and semantic labels for any image.

Detection

Generate rich, contextual descriptions and semantic labels for any image.

Segmentation

Generate rich, contextual descriptions and semantic labels for any image.

Pointing

Generate rich, contextual descriptions and semantic labels for any image.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

UI Parsing

Generate rich, contextual descriptions and semantic labels for any image.

Image Tools

Generate rich, contextual descriptions and semantic labels for any image.

01

Image AI

Caption & Tag

Generate rich, contextual descriptions and semantic labels for any image.

Detection

Generate rich, contextual descriptions and semantic labels for any image.

Segmentation

Generate rich, contextual descriptions and semantic labels for any image.

Pointing

Generate rich, contextual descriptions and semantic labels for any image.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

UI Parsing

Generate rich, contextual descriptions and semantic labels for any image.

Image Tools

Generate rich, contextual descriptions and semantic labels for any image.

02

Document AI

Document Parsing

Parse, summarize or split documents in seconds.

Structured OCR

Generate rich, contextual descriptions and semantic labels for any image.

Redaction

Generate rich, contextual descriptions and semantic labels for any image.

02

Document AI

Document Parsing

Parse, summarize or split documents in seconds.

Structured OCR

Generate rich, contextual descriptions and semantic labels for any image.

Redaction

Generate rich, contextual descriptions and semantic labels for any image.

02

Document AI

Document Parsing

Parse, summarize or split documents in seconds.

Structured OCR

Generate rich, contextual descriptions and semantic labels for any image.

Redaction

Generate rich, contextual descriptions and semantic labels for any image.

02

Document AI

Document Parsing

Parse, summarize or split documents in seconds.

Structured OCR

Generate rich, contextual descriptions and semantic labels for any image.

Redaction

Generate rich, contextual descriptions and semantic labels for any image.

03

Video AI

Caption & Tag

Generate comprehensive descriptions and metadata for video content.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

Video Tools

Generate rich, contextual descriptions and semantic labels for any image.

03

Video AI

Caption & Tag

Generate comprehensive descriptions and metadata for video content.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

Video Tools

Generate rich, contextual descriptions and semantic labels for any image.

03

Video AI

Caption & Tag

Generate comprehensive descriptions and metadata for video content.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

Video Tools

Generate rich, contextual descriptions and semantic labels for any image.

03

Video AI

Caption & Tag

Generate comprehensive descriptions and metadata for video content.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

Video Tools

Generate rich, contextual descriptions and semantic labels for any image.

OpenAI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

from openai import OpenAI

client = OpenAI(

base_url="https://agent.vlm.run/v1/openai",

api_key="<VLMRUN_API_KEY>"

)

result = client.chat.completions.create(

model="vlmrun-orion-1",

messages=[

{"role": "user", "content": [

{"type": "text", "text": "Analyze the image & animate into a video."},

{"type": "image_url", "image_url": {"url": "https://..."}}

]}

)

print(result.choices[0].message.content)

OpenAI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

from openai import OpenAI

client = OpenAI(

base_url="https://agent.vlm.run/v1/openai",

api_key="<VLMRUN_API_KEY>"

)

result = client.chat.completions.create(

model="vlmrun-orion-1",

messages=[

{"role": "user", "content": [

{"type": "text", "text": "Analyze the image & animate into a video."},

{"type": "image_url", "image_url": {"url": "https://..."}}

]}

)

print(result.choices[0].message.content)

OpenAI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

from openai import OpenAI

client = OpenAI(

base_url="https://agent.vlm.run/v1/openai",

api_key="<VLMRUN_API_KEY>"

)

result = client.chat.completions.create(

model="vlmrun-orion-1",

messages=[

{"role": "user", "content": [

{"type": "text", "text": "Analyze the image & animate into a video."},

{"type": "image_url", "image_url": {"url": "https://..."}}

]}

)

print(result.choices[0].message.content)

OpenAI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

from openai import OpenAI

client = OpenAI(

base_url="https://agent.vlm.run/v1/openai",

api_key="<VLMRUN_API_KEY>"

)

result = client.chat.completions.create(

model="vlmrun-orion-1",

messages=[

{"role": "user", "content": [

{"type": "text", "text": "Analyze the image & animate into a video."},

{"type": "image_url", "image_url": {"url": "https://..."}}

]}

)

print(result.choices[0].message.content)

OpenAI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

from openai import OpenAI

client = OpenAI(

base_url="https://agent.vlm.run/v1/openai",

api_key="<VLMRUN_API_KEY>"

)

result = client.chat.completions.create(

model="vlmrun-orion-1",

messages=[

{"role": "user", "content": [

{"type": "text", "text": "Analyze the image & animate into a video."},

{"type": "image_url", "image_url": {"url": "https://..."}}

]}

)

print(result.choices[0].message.content)

For Developers

Designed for developers.

Designed for developers.

Familiar API, unfamiliar power

Familiar API, unfamiliar power

All your favorite vision tools, in a single box.

Drop-in replacement for OpenAI SDK.

Handles images, documents, videos via URL or upload.

Streaming support for real-time responses.

Structured outputs with Pydantic / Zod support.

API Docs

API Docs

API Docs

API Docs

For Enterprises

The new visual intelligence layer for your enterprise.

The new visual intelligence layer for your enterprise.

Deploy Orion securely inside your VPC or private cloud – bringing visual intelligence directly to your infrastructure. Power document, image, and video understanding across teams. SOC 2 Type II and HIPAA-ready.

Frequently asked

questions

FAQs

How is Orion different from GPT-5, Claude, or Gemini?

Frontier models can describe what they see — but not act on it. Orion goes beyond perception by planning, executing, and validating visual tasks. Instead of just describing an image, Orion can detect, crop, segment, generate, and analyze in sequence — reliably and deterministically.

What can I build with Orion?

Developers are already using Orion for a wide range of applications – from visual ETL pipelines that detect, extract, and structure information, to automated product and marketing asset generation, document parsing and redaction, video summarization and clipping, and even medical and geospatial visual analysis. If your workflow involves visual data, Orion can make it intelligent and interactive.

How is Orion priced?

Orion’s pricing is designed to be flexible and transparent — based on the tools you use and the volume of visual tasks you run. Each visual capability (detection, segmentation, OCR, generation, etc.) is priced per use, allowing you to scale from experimentation to production without committing to fixed model tiers. Enterprise plans include predictable monthly or on-prem options for teams that need volume pricing, VPC deployments, or compliance guarantees.

How accurate is Orion compared to traditional CV models?

Orion taps into both modern VLM archteictures and traditional computer-vision models, allowing it to reason and accurately perform visual tasks. In benchmarks across several multi-modal tasks (MMMU, MMBench, DocVQA, RefCOCO etc), Orion consistent outperformed leading VLMs on multi-step visual reasoning, structured OCR and traditional CV tasks like detection, segmentation, tracking. See our whitepaper for more details.

How do you keep data private?

Our Pro-tier offering runs entirely in our private cloud deployment. Requests made to our APIs will be logged and made available in our observability dashboard. For our enterprise-tier, we can enforce higher privacy requirements (SOC2, GDPR, HIPAA).

Can I run Orion on-prem on in-VPC?

Yes. For enterprise deployments, VLM Run offers both VPC Peering and In-VPC hosting options — ensuring data never leaves your environment. We’re SOC 2 Type II and HIPAA-ready for teams with compliance requirements.

How is Orion different from GPT-5, Claude, or Gemini?

Frontier models can describe what they see — but not act on it. Orion goes beyond perception by planning, executing, and validating visual tasks. Instead of just describing an image, Orion can detect, crop, segment, generate, and analyze in sequence — reliably and deterministically.

What can I build with Orion?

Developers are already using Orion for a wide range of applications – from visual ETL pipelines that detect, extract, and structure information, to automated product and marketing asset generation, document parsing and redaction, video summarization and clipping, and even medical and geospatial visual analysis. If your workflow involves visual data, Orion can make it intelligent and interactive.

How is Orion priced?

Orion’s pricing is designed to be flexible and transparent — based on the tools you use and the volume of visual tasks you run. Each visual capability (detection, segmentation, OCR, generation, etc.) is priced per use, allowing you to scale from experimentation to production without committing to fixed model tiers. Enterprise plans include predictable monthly or on-prem options for teams that need volume pricing, VPC deployments, or compliance guarantees.

How accurate is Orion compared to traditional CV models?

Orion taps into both modern VLM archteictures and traditional computer-vision models, allowing it to reason and accurately perform visual tasks. In benchmarks across several multi-modal tasks (MMMU, MMBench, DocVQA, RefCOCO etc), Orion consistent outperformed leading VLMs on multi-step visual reasoning, structured OCR and traditional CV tasks like detection, segmentation, tracking. See our whitepaper for more details.

How do you keep data private?

Our Pro-tier offering runs entirely in our private cloud deployment. Requests made to our APIs will be logged and made available in our observability dashboard. For our enterprise-tier, we can enforce higher privacy requirements (SOC2, GDPR, HIPAA).

Can I run Orion on-prem on in-VPC?

Yes. For enterprise deployments, VLM Run offers both VPC Peering and In-VPC hosting options — ensuring data never leaves your environment. We’re SOC 2 Type II and HIPAA-ready for teams with compliance requirements.

How is Orion different from GPT-5, Claude, or Gemini?

Frontier models can describe what they see — but not act on it. Orion goes beyond perception by planning, executing, and validating visual tasks. Instead of just describing an image, Orion can detect, crop, segment, generate, and analyze in sequence — reliably and deterministically.

What can I build with Orion?

Developers are already using Orion for a wide range of applications – from visual ETL pipelines that detect, extract, and structure information, to automated product and marketing asset generation, document parsing and redaction, video summarization and clipping, and even medical and geospatial visual analysis. If your workflow involves visual data, Orion can make it intelligent and interactive.

How is Orion priced?

Orion’s pricing is designed to be flexible and transparent — based on the tools you use and the volume of visual tasks you run. Each visual capability (detection, segmentation, OCR, generation, etc.) is priced per use, allowing you to scale from experimentation to production without committing to fixed model tiers. Enterprise plans include predictable monthly or on-prem options for teams that need volume pricing, VPC deployments, or compliance guarantees.

How accurate is Orion compared to traditional CV models?

Orion taps into both modern VLM archteictures and traditional computer-vision models, allowing it to reason and accurately perform visual tasks. In benchmarks across several multi-modal tasks (MMMU, MMBench, DocVQA, RefCOCO etc), Orion consistent outperformed leading VLMs on multi-step visual reasoning, structured OCR and traditional CV tasks like detection, segmentation, tracking. See our whitepaper for more details.

How do you keep data private?

Our Pro-tier offering runs entirely in our private cloud deployment. Requests made to our APIs will be logged and made available in our observability dashboard. For our enterprise-tier, we can enforce higher privacy requirements (SOC2, GDPR, HIPAA).

Can I run Orion on-prem on in-VPC?

Yes. For enterprise deployments, VLM Run offers both VPC Peering and In-VPC hosting options — ensuring data never leaves your environment. We’re SOC 2 Type II and HIPAA-ready for teams with compliance requirements.

How is Orion different from GPT-5, Claude, or Gemini?

Frontier models can describe what they see — but not act on it. Orion goes beyond perception by planning, executing, and validating visual tasks. Instead of just describing an image, Orion can detect, crop, segment, generate, and analyze in sequence — reliably and deterministically.

What can I build with Orion?

Developers are already using Orion for a wide range of applications – from visual ETL pipelines that detect, extract, and structure information, to automated product and marketing asset generation, document parsing and redaction, video summarization and clipping, and even medical and geospatial visual analysis. If your workflow involves visual data, Orion can make it intelligent and interactive.

How is Orion priced?

Orion’s pricing is designed to be flexible and transparent — based on the tools you use and the volume of visual tasks you run. Each visual capability (detection, segmentation, OCR, generation, etc.) is priced per use, allowing you to scale from experimentation to production without committing to fixed model tiers. Enterprise plans include predictable monthly or on-prem options for teams that need volume pricing, VPC deployments, or compliance guarantees.

How accurate is Orion compared to traditional CV models?

Orion taps into both modern VLM archteictures and traditional computer-vision models, allowing it to reason and accurately perform visual tasks. In benchmarks across several multi-modal tasks (MMMU, MMBench, DocVQA, RefCOCO etc), Orion consistent outperformed leading VLMs on multi-step visual reasoning, structured OCR and traditional CV tasks like detection, segmentation, tracking. See our whitepaper for more details.

How do you keep data private?

Our Pro-tier offering runs entirely in our private cloud deployment. Requests made to our APIs will be logged and made available in our observability dashboard. For our enterprise-tier, we can enforce higher privacy requirements (SOC2, GDPR, HIPAA).

Can I run Orion on-prem on in-VPC?

Yes. For enterprise deployments, VLM Run offers both VPC Peering and In-VPC hosting options — ensuring data never leaves your environment. We’re SOC 2 Type II and HIPAA-ready for teams with compliance requirements.

Try Orion today.

Try Orion today.

Chat with Orion

Chat with Orion

Chat with Orion

Chat with Orion

Book a Demo

Book a Demo

Book a Demo

Book a Demo

Visual Intelligence for Enterprise.

Company

Product

Developers

Integrations

Resources

by Autonomi Al Inc. All rights reserved. © 2025

Terms & Conditions

Visual Intelligence for Enterprise.

Company

Product

Developers

Integrations

Resources

by Autonomi Al Inc. All rights reserved. © 2025

Terms & Conditions

Visual Intelligence for Enterprise.

Company

Product

Developers

Integrations

Resources

by Autonomi Al Inc. All rights reserved. © 2025

Terms & Conditions

Visual Intelligence for Enterprise.

Company

Product

Developers

Integrations

Resources

by Autonomi Al Inc. All rights reserved. © 2025

Terms & Conditions