> For the complete documentation index, see [llms.txt](https://docs.stackai.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.stackai.com/workflow-builder/apps/vlm.md).

# Vlm

The **Vlm Node** represents an integration with a Visual Language Model (VLM) provider. This type of node is typically used for advanced AI tasks that involve both visual and language understanding, such as analyzing images, generating text from images, creating images from text, or performing multimodal reasoning.

#### What the Vlm Node Is Best Used For

* **Image Analysis:** Extracting information, objects, or text from images.
* **Image Generation:** Creating new images based on text prompts or modifying existing images.
* **Document Understanding:** Reading and summarizing visual documents (e.g., PDFs, scanned pages).
* **Multimodal Tasks:** Combining text and image inputs for richer AI outputs (e.g., answering questions about a chart or diagram).
* **File Management:** Uploading, downloading, and managing files for use in AI workflows.

### Establishing a Connection

The Vlm Node requires establishing a new connection using an API key before use.&#x20;

#### Available Actions (Exhaustive List)

Here are all the actions available for the Vlm provider:

1. **Get Files** – Get a list of files.
2. **Upload File** – Upload a file.
3. **Get File (by ID)** – Get a file by its ID.
4. **Get File (by Hash)** – Get a file by its hash.
5. **Generate Presigned URL** – Generate a presigned URL for file upload.
6. **Verify File Upload** – Verify a file upload.
7. **Check Health** – Check the health of the OpenAI VLM service.
8. **chat\_completions\_v1\_openai\_chat\_completions\_post** – Generate chat completions (multimodal).
9. **models\_v1\_openai\_models\_get** – List available models.
10. **model\_info\_v1\_openai\_models\_\_model\_\_get** – Get information about a specific model.
11. **info\_v1\_hub\_info\_get** – Get hub information.
12. **list\_domains\_v1\_hub\_domains\_get** – List available domains.
13. **get\_domain\_schema\_v1\_hub\_schema\_post** – Get the schema for a domain.
14. **image\_generate\_v1\_image\_generate\_post** – Generate an image from a prompt.
15. **schema\_generate\_image\_v1\_image\_schema\_post** – Get the schema for image generation.
16. **document\_generate\_v1\_document\_generate\_post** – Generate a document from a prompt.
17. **document\_execute\_v1\_document\_execute\_post** – Execute a document generation task.
18. **schema\_generate\_document\_v1\_document\_schema\_post** – Get the schema for document generation.
19. **video\_generate\_v1\_video\_generate\_post** – Generate a video from a prompt.
20. **audio\_generate\_v1\_audio\_generate\_post** – Generate audio from a prompt.
21. **agent\_execute\_v1\_agent\_execute\_post** – Execute an agent task (multimodal agent).
22. **get\_predictions\_v1\_predictions\_get** – List predictions.
23. **get\_prediction\_v1\_predictions\_\_id\_\_get** – Get a specific prediction.
24. **get\_prediction\_domain\_v1\_predictions\_\_id\_\_domain\_get** – Get the domain of a prediction.
25. **health\_v1\_health\_get** – General health check.
26. **get\_models\_v1\_models\_get** – List all models.
27. **get\_domains\_v1\_domains\_get** – List all domains.
28. **get\_schema\_v1\_schema\_post** – Get a schema for a task.

> **Note:** Each action is designed for a specific type of multimodal or file-related task. Some are for file management, others for generating or analyzing content, and some for managing or querying models and domains.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.stackai.com/workflow-builder/apps/vlm.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.