# Handling Large Input Files

Sometimes, the files you are uploading as input to the workflow might be very large - e.g. an 800-page PDF. In this case, one single input file already exceeds the LLM's context window no matter which model you use.

We often resort to a chain of nodes to help us process the files to be within context window - Split Files tool -> Python node -> StackAI Project node.

<figure><img src="/files/AZoJcY2HmTDpAXONwTVI" alt=""><figcaption></figcaption></figure>

### Overview of nodes

* **Split Files tool** splits text content from files into smaller pieces using different strategies: by character chunks, by pages, or by files.
* **Python node** lets you write and execute custom Python code as part of your workflow. Alternatively you can also use the Code Node. See [Python Code](/workflow-builder/utils-logic-and-others/logic/python-code.md) or [Code Node](/workflow-builder/utils-logic-and-others/logic/code-node.md).
* **StackAI Project node** allows you to run (or "call") another Stack AI project from within your current workflow. See [StackAI Project Node](/workflow-builder/utils-logic-and-others/utils/stackai-project-node.md).

### How this hack works

This chain of nodes accomplishes a few things:

* **Split Files tool** splits the file into "digestible" chunks (by pages, files, or chunks/characters). It outputs a JSON object.
* **Python node** converts the JSON object into a list format that an LLM could easily take as input.
* **StackAI Project node** runs a subagent taking the output from the Python node and run it through a pre-selected LLM. Here you don't necessarily have to use the StackAI Project node. If your workflow is straightforward, you can use an LLM node directly here.

#### Output from Split Files tool

This node returns a JSON object with a `chunks` field.

```json
{
  "chunks": [
    "Chunk 1 text…",
    "Chunk 2 text…",
    "Chunk 3 text…"
  ]
}
```

#### Code for Python node

Using the below sample code, you can normalize the JSON object into a single list.

```python
def extract_chunks(data: typing.Any) -> typing.List[typing.Any]:
    """
    Return the inner list from {'chunks': [...]}.
    Accepts:
      - dict: {'chunks': [...]}
      - str: JSON or Python-literal form of the above
      - list: returns as-is
    """
    if isinstance(data, list):
        return data
    if isinstance(data, dict):
        return data["chunks"]  # raises KeyError if missing

    if isinstance(data, str):
        s = data.strip()
        # Try JSON first, then Python literal
        try:
            obj = json.loads(s)
        except json.JSONDecodeError:
            obj = ast.literal_eval(s)

        if isinstance(obj, list):
            return obj
        if isinstance(obj, dict) and "chunks" in obj:
            return obj["chunks"]

#result = extract_chunks({'chunks': ['a', 'b']})
result = extract_chunks(action_0)
return json.dumps(result)    # ["a", "b"]

```

#### Output from Python node

Output returns a JSON array string that is easy to feed into an LLM node or sub-agent:

```json
[
  "Chunk 1 text…",
  "Chunk 2 text…",
  "Chunk 3 text…"
]
```

### Advanced technique

Set this chain of nodes up as a fallback path so small files go straight to the LLM.

Recommended setup:

1. Start with your primary **LLM node**.
2. Turn on a fallback path using either:
   * the node-level “On Error” fallback branch (good when the failure mode is “context exceeded”), or
   * an explicit router like [If/Else Node](/workflow-builder/utils-logic-and-others/logic/if-else-node.md) (good when you can predict size).
3. In the fallback path, run:
   * Split Files tool → Python node → StackAI Project node (or another LLM node).

{% hint style="info" %}
If you’re using “On Error”, pair it with [Fallback & Error Handling](/guides-and-tips/stackai-hacks/handling-errors-and-fallback.md) settings like **Retry on Failure** and **LLM Fallback Mode**.
{% endhint %}

### Tips & best practices

**1. Test early with small inputs and pin nodes**\
Before running large documents, validate the workflow using small files or by pinning nodes. This makes debugging easier and helps catch parsing or context-limit issues early.

**2. Choose the right LLM for the job**\
LLMs differ in context window size and how well they reason over long lists of chunks. Select the model based on total token volume and whether cross-chunk synthesis is required.

**3. Optimize paths by file type**\
Spreadsheets and documents behave differently. Spreadsheets often benefit from row- or sheet-based processing, while Word/PDF files work best with page- or chunk-based splitting. In mixed workflows, consider branching early by file type.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.stackai.com/guides-and-tips/stackai-hacks/handling-large-input-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
