# Using Multiple LLMs

When using multiple LLMs in one project, there are important points to consider in order to ensure they work well together.

***

### Clear Input/Output Flow

* **Explicit Connections:** Each LLM node should have clearly defined input and output connections. Use Input nodes (`in-0`, `in-1`, etc.) to gather user data, and connect them to the relevant LLM nodes.
* **Output Handling:** Route the output of each LLM node to Output nodes or downstream processing nodes (like Template or Python nodes) for further formatting or logic.

### Sequential vs. Parallel LLMs

* **Sequential Orchestration:** If the output of one LLM is needed as input for another, connect them in sequence (e.g., `llm-0` → `llm-1`). This is useful for multi-step reasoning or refinement. Having initial LLMs give structured outputs to downstream LLMs can be helpful.
* **Parallel Orchestration:** If you want to compare or aggregate results from multiple LLMs, connect the same input to several LLM nodes in parallel, then merge their outputs downstream using the Combine Node or a third LLM that will summarize and logically merge the two outputs

### Memory and State

* **Sliding Window Memory:** Use the memory feature in LLM nodes to maintain context across turns or steps, especially in multi-turn workflows.
* **Stateful Processing:** If you need to track or update state, consider using Python nodes between LLMs to manipulate or store intermediate results.

### Error Handling and Fallbacks

* **On Failure Branches:** Configure `on_failure_branch` and retry settings for each LLM node to handle errors gracefully.
* **Fallback LLMs:** Use the fallback options to specify alternative models/providers if the primary LLM fails.

### Data Formatting and Validation

* **Template Nodes:** Use Template nodes to format or merge outputs from multiple LLMs before presenting to the user.
* **Output Validation:** If LLMs are expected to return structured data (e.g., JSON), use the `json_schema` parameter to enforce output format and validate results.

### Chaining with Other Nodes

* **Integration with Actions:** LLM outputs can be passed to Action nodes (e.g., sending emails, updating databases) for real-world effects.
* **Custom Logic:** Insert Python nodes between LLMs for custom logic, filtering, or aggregation.

### Citations and Traceability

* **Citations:** Enable citations in LLM nodes if you want to track sources or provide references in the output.
* **Auditability:** Use Output nodes and logs to trace the flow of data and decisions across multiple LLMs.

### Performance and Latency

* **Parallelization:** Where possible, run LLMs in parallel to reduce overall latency.
* **Token and Cost Management:** Set appropriate `max_tokens` and temperature settings to control cost and response quality.

***

**Summary Table:**

| Aspect                 | Best Practice                                           |
| ---------------------- | ------------------------------------------------------- |
| Input/Output Flow      | Use explicit node connections and references            |
| Orchestration Style    | Choose sequential or parallel based on use case         |
| Prompt Engineering     | Customize prompts and use context passing               |
| Memory/State           | Use memory features and Python nodes for stateful logic |
| Error Handling         | Configure retries, fallbacks, and failure branches      |
| Data Formatting        | Use Template nodes and output validation                |
| Chaining/Integration   | Connect to Action nodes and use Python for custom logic |
| Citations/Traceability | Enable citations and use Output nodes for auditability  |
| Performance            | Parallelize where possible, manage tokens and latency   |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.stackai.com/getting-started/core-ai-concepts/using-multiple-llms.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
