# Cerebras

### What is Cerebras?

Cerebras AI is a high-performance inference provider built on wafer-scale engine technology - the world's largest chip - purpose-built to accelerate large language model inference. Unlike GPU-based providers, Cerebras hardware processes entire models on a single wafer, eliminating memory bandwidth bottlenecks and delivering significantly faster token generation speeds. Stack AI integrates natively with Cerebras, letting you connect your Cerebras API key and use Llama-family models directly inside your workflows through both the LLM node and the Cerebras action node.

***

### How to use it?

Add an **LLM** node or a **Cerebras** action node to your workflow, select Cerebras as the provider, choose a model, configure your generation parameters, and run. Stack AI handles authentication, request routing, and output parsing automatically.

<figure><img src="/files/EctGMj9M6q3xG3EJbuGe" alt=""><figcaption></figcaption></figure>

***

### Benefits

#### Speed and Throughput

1. **Wafer-scale inference**: Cerebras processes models on a single chip, eliminating inter-chip communication overhead that slows GPU clusters.
2. **Low latency for real-time applications**: High token-per-second throughput makes Cerebras well-suited for chatbots, live document generation, and interactive pipelines.
3. **Consistent performance under load**: The hardware architecture maintains throughput without the variability common in shared GPU infrastructure.

#### Model Selection

1. **Llama 4 Scout 17B**: A compact, instruction-tuned model optimized for speed-critical tasks where low latency matters most.
2. **Llama 3.3 70B**: A high-capability model balancing quality and speed for complex reasoning, summarization, and generation tasks.
3. **Llama 3.1 8B**: The lightest option for high-volume, cost-sensitive pipelines with straightforward generation requirements.

#### Ease of Use

1. **Direct API key authentication**: Connect with a single API key - no OAuth flow or additional setup required.
2. **Works in LLM node and action node**: Use the standard LLM node for quick setup or the dedicated Cerebras action node when you need granular parameter control and structured outputs.
3. **Managed connection available**: Stack AI provides an organization-level managed connection, so your team can use Cerebras without each member supplying their own key.

***

### How It Works

#### Authentication Flow

* Stack AI stores your Cerebras API key as an encrypted credential in your organization's connection vault.
* At workflow runtime, Stack AI retrieves the credential and attaches it as a Bearer token on each request to `https://api.cerebras.ai/v1`.
* The connection health check validates your key by listing available models before any workflow execution.

#### Request Execution

* When a Cerebras node fires, Stack AI constructs a chat message from your `prompt` input, applies your generation parameters (`temperature`, `max_tokens`, `top_p`, `frequency_penalty`, `presence_penalty`), and sends the request to the Cerebras API.
* The response is parsed and surfaced as structured outputs: generated text, finish reason, model identifier, and token usage counts.
* Finish reasons are normalized across providers: `stop` (natural completion), `length` (hit `max_tokens`), `tool_calls`, or `error`.

#### LLM Node vs. Action Node

* The **LLM node** uses a shared provider interface — select Cerebras from the provider dropdown, pick a model, and the node handles the rest. Best for standard conversational or generation patterns.
* The **Cerebras action node** exposes all generation parameters explicitly and outputs structured fields including token counts. Best when your downstream nodes need precise control or usage data.

***

### Setting Up a Connection

#### Step 1: Get your Cerebras API key

<figure><img src="/files/8t4F3wrHtO5xqJNQdODC" alt=""><figcaption></figcaption></figure>

Navigate to [cloud.cerebras.ai](https://cloud.cerebras.ai/), sign in to your account, and go to the **API Keys** section. Create a new key and copy it — you will not be able to view it again after leaving the page.

#### Step 2: Open the Connections page in Stack AI

In your Stack AI workspace, navigate to **Settings > Connections**. Click **New Connection** and search for or select **Cerebras** from the provider list.

#### Step 3: Enter your API key and save

<figure><img src="/files/NyNvzZLXpFH0cj6sIZrT" alt=""><figcaption></figcaption></figure>

Paste your API key into the **API Key** field. Click **Test** to verify the connection - Stack AI will call the Cerebras API to confirm your key is valid. Once the test passes, click **Save**. Your connection is now available to all workflows in your organization.

#### Step 4: Verify the connection

After saving, the connection appears in your **Connections** list. The health-check indicator confirms Stack AI can reach the Cerebras API with your key.

***

### Available Models

| Model ID                         | Display Name      | Best For                                                                     |
| -------------------------------- | ----------------- | ---------------------------------------------------------------------------- |
| `llama-4-scout-17b-16e-instruct` | Llama 4 Scout 17B | Low-latency tasks, real-time applications, high-throughput pipelines         |
| `llama3.3-70b`                   | Llama 3.3 70B     | Complex reasoning, summarization, structured generation, default general use |
| `llama3.1-8b`                    | Llama 3.1 8B      | High-volume, cost-sensitive pipelines with simple generation requirements    |

***

### Using Cerebras in a Workflow

#### Option A - LLM Node

1. Add an **LLM** node to your workflow canvas.
2. In the node configuration panel, set **Provider** to **Cerebras**.
3. Select a model from the **Model** dropdown.
4. Connect your prompt input and configure any standard LLM node parameters.
5. Wire the node output to downstream nodes and run.

#### Option B - Cerebras Action Node

1. Add a **Cerebras** action node from the integrations panel — search for "Cerebras" in the node sidebar.
2. Select **Text Completion** as the action.
3. Connect or configure the input parameters described below.
4. Map the output fields to downstream nodes.

**Input Parameters**

| Parameter           | Type   | Required | Default                          | Range      | Description                                                             |
| ------------------- | ------ | -------- | -------------------------------- | ---------- | ----------------------------------------------------------------------- |
| `model`             | Select | Yes      | `llama-4-scout-17b-16e-instruct` | -          | The model to use for generation                                         |
| `prompt`            | String | Yes      | `1.0`                            | 0.0 – 2.0  | Controls output randomness; higher values produce more varied responses |
| `temperature`       | Number | No       | `1000`                           | 0.0 – 2.0  | Maximum number of tokens to generate                                    |
| `top_p`             | Number | No       | `1.0`                            | 0.0 – 1.0  | Lower values restrict to higher-probability tokens                      |
| `frequency_penalty` | Number | No       | `0.0`                            | -2.0 – 2.0 | Reduces repetition of tokens that appear frequently in the output       |
| `presence_penalty`  | Number | No       | `0.0`                            | -2.0 – 2.0 | Reduces repetition of any token that has already appeared in the output |

**Output Parameters**

| Parameter                 | Type   | Description                                                        |
| ------------------------- | ------ | ------------------------------------------------------------------ |
| `content`                 | String | The generated text response                                        |
| `finish_reason`           | String | Why generation stopped: `stop`, `length`, `error`, or `tool_calls` |
| `model_used`              | String | The model identifier used for the request                          |
| `usage_total_tokens`      | Number | Total tokens consumed (prompt + completion)                        |
| `usage_prompt_tokens`     | Number | Tokens used by the input prompt                                    |
| `usage_completion_tokens` | Number | Tokens used by the generated response                              |

***

### Best Practices

* **Use `llama3.3-70b` as your default starting point.** It is the default model in the LLM node and offers the best balance of capability and speed for most use cases.
* **Lower `temperature` for factual or structured outputs.** Values between `0.0` and `0.4` produce more deterministic responses suited to extraction, classification, and data transformation tasks.
* **Set `max_tokens` explicitly for pipeline reliability.** An unbounded token count can cause unexpected `length` finish reasons downstream; set a value appropriate for your expected output length.
* **Use `frequency_penalty` and `presence_penalty` together to reduce repetition.** For long-form generation, values between `0.3` and `0.8` on both fields help maintain output variety without degrading coherence.
* **Use the action node when token usage matters.** The `usage_total_tokens`, `usage_prompt_tokens`, and `usage_completion_tokens` outputs let you track costs and enforce budget limits in your workflow logic.
* **Prefer the managed connection for team deployments.** Using the organization-level Stack AI managed connection avoids credential sprawl and ensures your team shares a single auditable connection.

***

### Summary

Cerebras brings wafer-scale inference speed to Stack AI workflows, making it a strong choice for latency-sensitive applications and high-throughput pipelines. Connect once with your API key, then access Llama 4 Scout 17B, Llama 3.3 70B, or Llama 3.1 8B through either the LLM node for quick setup or the Cerebras action node for full parameter control and structured output.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.stackai.com/workflow-builder/apps/cerebras.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
