# Handling Large Input Files

Sometimes, the files you are uploading as input to the workflow might be very large - e.g. an 800-page PDF. In this case, one single input file already exceeds the LLM's context window no matter which model you use.

We often resort to a chain of nodes to help us process the files to be within context window - Split Files tool -> Python node -> StackAI Project node.

<figure><img src="https://3697023207-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFSlso1Kjob5CLDrh0dVn%2Fuploads%2F7NKoaGagndrzHE6ar1Vj%2Fimage.png?alt=media&#x26;token=9943dbb2-b10b-491d-a911-0d9a00e0f27f" alt=""><figcaption></figcaption></figure>

### Overview of nodes

* **Split Files tool** splits text content from files into smaller pieces using different strategies: by character chunks, by pages, or by files.
* **Python node** lets you write and execute custom Python code as part of your workflow. Alternatively you can also use the Code Node. See [Python Code](https://docs.stackai.com/workflow-builder/utils-logic-and-others/logic/python-code) or [Code Node](https://docs.stackai.com/workflow-builder/utils-logic-and-others/logic/code-node).
* **StackAI Project node** allows you to run (or "call") another Stack AI project from within your current workflow. See [StackAI Project Node](https://docs.stackai.com/workflow-builder/utils-logic-and-others/utils/stackai-project-node).

### How this hack works

This chain of nodes accomplishes a few things:

* **Split Files tool** splits the file into "digestible" chunks (by pages, files, or chunks/characters). It outputs a JSON object.
* **Python node** converts the JSON object into a list format that an LLM could easily take as input.
* **StackAI Project node** runs a subagent taking the output from the Python node and run it through a pre-selected LLM. Here you don't necessarily have to use the StackAI Project node. If your workflow is straightforward, you can use an LLM node directly here.

#### Output from Split Files tool

This node returns a JSON object with a `chunks` field.

```json
{
  "chunks": [
    "Chunk 1 text…",
    "Chunk 2 text…",
    "Chunk 3 text…"
  ]
}
```

#### Code for Python node

Using the below sample code, you can normalize the JSON object into a single list.

```python
def extract_chunks(data: typing.Any) -> typing.List[typing.Any]:
    """
    Return the inner list from {'chunks': [...]}.
    Accepts:
      - dict: {'chunks': [...]}
      - str: JSON or Python-literal form of the above
      - list: returns as-is
    """
    if isinstance(data, list):
        return data
    if isinstance(data, dict):
        return data["chunks"]  # raises KeyError if missing

    if isinstance(data, str):
        s = data.strip()
        # Try JSON first, then Python literal
        try:
            obj = json.loads(s)
        except json.JSONDecodeError:
            obj = ast.literal_eval(s)

        if isinstance(obj, list):
            return obj
        if isinstance(obj, dict) and "chunks" in obj:
            return obj["chunks"]

#result = extract_chunks({'chunks': ['a', 'b']})
result = extract_chunks(action_0)
return json.dumps(result)    # ["a", "b"]

```

#### Output from Python node

Output returns a JSON array string that is easy to feed into an LLM node or sub-agent:

```json
[
  "Chunk 1 text…",
  "Chunk 2 text…",
  "Chunk 3 text…"
]
```

### Advanced technique

Set this chain of nodes up as a fallback path so small files go straight to the LLM.

Recommended setup:

1. Start with your primary **LLM node**.
2. Turn on a fallback path using either:
   * the node-level “On Error” fallback branch (good when the failure mode is “context exceeded”), or
   * an explicit router like [If/Else Node](https://docs.stackai.com/workflow-builder/utils-logic-and-others/logic/if-else-node) (good when you can predict size).
3. In the fallback path, run:
   * Split Files tool → Python node → StackAI Project node (or another LLM node).

{% hint style="info" %}
If you’re using “On Error”, pair it with [Fallback & Error Handling](https://docs.stackai.com/guides-and-tips/stackai-hacks/handling-errors-and-fallback) settings like **Retry on Failure** and **LLM Fallback Mode**.
{% endhint %}

### Tips & best practices

**1. Test early with small inputs and pin nodes**\
Before running large documents, validate the workflow using small files or by pinning nodes. This makes debugging easier and helps catch parsing or context-limit issues early.

**2. Choose the right LLM for the job**\
LLMs differ in context window size and how well they reason over long lists of chunks. Select the model based on total token volume and whether cross-chunk synthesis is required.

**3. Optimize paths by file type**\
Spreadsheets and documents behave differently. Spreadsheets often benefit from row- or sheet-based processing, while Word/PDF files work best with page- or chunk-based splitting. In mixed workflows, consider branching early by file type.
