Laravel AI Agents: Streaming, Token Budgets &amp; Typed Output | Mohamed Said        [  ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MH.png)   Mohamed Said Laravel Backend Engineer  ](https://msaied.com) [ Home ](https://msaied.com) [ Projects ](https://msaied.com/projects) [ Articles  ](https://msaied.com/articles) [ Certificates ](https://msaied.com/certificates) [ Contact ](https://msaied.com#contact-section) 

       [  ](https://github.com/EG-Mohamed)       

 [ Home ](https://msaied.com) [ Projects ](https://msaied.com/projects) [ Articles ](https://msaied.com/articles) [ Certificates ](https://msaied.com/certificates) [ Contact ](https://msaied.com#contact-section) 

  [ home ](https://msaied.com)    [ articles ](https://msaied.com/articles)    Production AI Agents in Laravel: Streaming, Token Budgets, and Structured Output Contracts        On this page       1. [  The Problem With Naive AI Integration ](#the-problem-with-naive-ai-integration)
2. [  Streaming Responses to the Browser ](#streaming-responses-to-the-browser)
3. [  Token Budget Guard ](#token-budget-guard)
4. [  Structured Output Contracts ](#structured-output-contracts)
5. [  Keeping the Domain Clean ](#keeping-the-domain-clean)
6. [  Takeaways ](#takeaways)

  ![Production AI Agents in Laravel: Streaming, Token Budgets, and Structured Output Contracts](https://cdn.msaied.com/262/4220ef4ee3875717c6384f298d891a3f.png)

  #laravel   #ai   #llm   #php  

 Production AI Agents in Laravel: Streaming, Token Budgets, and Structured Output Contracts 
============================================================================================

     22 Jun 2026      3 min read    ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MJ.jpg)  Mohamed Said  

       Table of contents

1. [  01   The Problem With Naive AI Integration  ](#the-problem-with-naive-ai-integration)
2. [  02   Streaming Responses to the Browser  ](#streaming-responses-to-the-browser)
3. [  03   Token Budget Guard  ](#token-budget-guard)
4. [  04   Structured Output Contracts  ](#structured-output-contracts)
5. [  05   Keeping the Domain Clean  ](#keeping-the-domain-clean)
6. [  06   Takeaways  ](#takeaways)

 The Problem With Naive AI Integration
-------------------------------------

Most Laravel + LLM tutorials stop at `Http::post('https://api.openai.com/v1/chat/completions', [...])` and call it done. That works for demos. In production you need three things the tutorials skip:

1. **Streaming** — users shouldn't stare at a spinner for 8 seconds.
2. **Token budgets** — unbounded prompts destroy your billing and latency SLAs.
3. **Structured output contracts** — raw JSON strings from an LLM are not domain objects.

Let's solve all three without a heavy third-party SDK.

---

Streaming Responses to the Browser
----------------------------------

OpenAI's `stream: true` returns server-sent events. Laravel's `StreamedResponse` pipes them straight to the client.

```php
// app/Http/Controllers/AgentController.php
public function stream(Request $request): StreamedResponse
{
    $prompt = $request->validated()['prompt'];

    return response()->stream(function () use ($prompt) {
        $stream = Http::withToken(config('services.openai.key'))
            ->withOptions(['stream' => true])
            ->post('https://api.openai.com/v1/chat/completions', [
                'model'    => 'gpt-4o-mini',
                'stream'   => true,
                'messages' => [['role' => 'user', 'content' => $prompt]],
            ])->toPsrResponse()->getBody();

        while (! $stream->eof()) {
            $line = trim($stream->read(512));
            if (str_starts_with($line, 'data: ')) {
                $payload = substr($line, 6);
                if ($payload === '[DONE]') break;
                $delta = json_decode($payload, true)['choices'][0]['delta']['content'] ?? '';
                echo "data: {$delta}\n\n";
                ob_flush(); flush();
            }
        }
    }, 200, ['Content-Type' => 'text/event-stream', 'X-Accel-Buffering' => 'no']);
}

```

The `X-Accel-Buffering: no` header is essential when Nginx sits in front — without it, Nginx buffers the whole response.

---

Token Budget Guard
------------------

Never let user input dictate prompt size. Enforce a budget before the HTTP call.

```php
// app/AI/TokenBudget.php
final class TokenBudget
{
    private const CHARS_PER_TOKEN = 4; // rough heuristic
    private const MAX_INPUT_TOKENS = 1_500;

    public static function enforce(string $text): string
    {
        $limit = self::MAX_INPUT_TOKENS * self::CHARS_PER_TOKEN;

        if (strlen($text) post('https://api.openai.com/v1/chat/completions', [
                'model'           => 'gpt-4o-mini',
                'max_tokens'      => 256,
                'response_format' => [
                    'type'        => 'json_schema',
                    'json_schema' => [
                        'name'   => 'sentiment_result',
                        'strict' => true,
                        'schema' => [
                            'type'       => 'object',
                            'properties' => [
                                'label'  => ['type' => 'string', 'enum' => ['positive','neutral','negative']],
                                'score'  => ['type' => 'number'],
                                'reason' => ['type' => 'string'],
                            ],
                            'required'            => ['label','score','reason'],
                            'additionalProperties'=> false,
                        ],
                    ],
                ],
                'messages' => [
                    ['role' => 'system', 'content' => 'Analyse sentiment. Reply only with the JSON schema.'],
                    ['role' => 'user',   'content' => $text],
                ],
            ])->throw()->json();

        $raw = json_decode(
            $response['choices'][0]['message']['content'],
            true,
            flags: JSON_THROW_ON_ERROR
        );

        return SentimentResult::fromArray($raw);
    }
}

```

With `strict: true` the model is constrained to the schema at the API level — you still validate on your side, but you'll rarely see a mismatch.

---

Keeping the Domain Clean
------------------------

The `SentimentAgent` returns a typed DTO. Nothing in your domain layer touches raw LLM strings. If you swap providers tomorrow, only the agent changes — every caller keeps working.

Wrap the agent in a queued job for non-interactive workloads, and inject it via the service container so Pest can swap in a fake:

```php
// tests/Feature/SentimentTest.php
it('classifies positive text', function () {
    $this->instance(SentimentAgent::class, new class {
        public function analyse(string $text): SentimentResult {
            return new SentimentResult('positive', 0.95, 'stub');
        }
    });

    $result = app(SentimentAgent::class)->analyse('Great product!');
    expect($result->label)->toBe('positive');
});

```

---

Takeaways
---------

- Stream via `response()->stream()` and set `X-Accel-Buffering: no` for Nginx.
- Enforce token budgets *before* the HTTP call, not after.
- Use `response_format.json_schema` with `strict: true` to pin model output shape.
- Map LLM JSON to typed readonly DTOs immediately — keep raw strings out of your domain.
- Inject agents through the container so tests can swap fakes without HTTP calls.

 Found this useful?

          [  ](https://twitter.com/intent/tweet?url=https%3A%2F%2Fmsaied.com%2Farticles%2Fproduction-ai-agents-in-laravel-streaming-token-budgets-and-structured-output-contracts&text=Production+AI+Agents+in+Laravel%3A+Streaming%2C+Token+Budgets%2C+and+Structured+Output+Contracts) [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fmsaied.com%2Farticles%2Fproduction-ai-agents-in-laravel-streaming-token-budgets-and-structured-output-contracts) 

 Frequently Asked Questions 
----------------------------

  3 questions  

     Q01  Does `response\_format: json\_schema` work with all OpenAI models?        No. As of the current API, structured output with `strict: true` is supported on `gpt-4o`, `gpt-4o-mini`, and later snapshots. Older models like `gpt-3.5-turbo` support `json_object` mode only, which does not enforce a schema. 

      Q02  How do I handle streaming in a queued job rather than an HTTP response?        In a job you don't need SSE. Disable streaming (`stream: false`), collect the full completion, then persist or broadcast the result. Streaming is only valuable when a human is waiting in real time. 

      Q03  Is the 4-characters-per-token heuristic accurate enough for production?        It's a conservative approximation for English text. For precise budgeting, use a tokeniser library such as `yethee/tiktoken` which implements the actual BPE encoding. The heuristic is fine as a cheap pre-flight guard. 

  Continue reading

 More Articles 
---------------

 [ View all    ](https://msaied.com/articles) 

 [ ![Multi-Tenant SaaS in Laravel: Isolating Tenant State with Scoped Singletons](https://cdn.msaied.com/263/0ead3161989557874b88d47f8a9e023a.png) laravel multi-tenant saas 

### Multi-Tenant SaaS in Laravel: Isolating Tenant State with Scoped Singletons

Shared-state bugs are the silent killers of multi-tenant Laravel apps. Learn how scoped singletons, per-reques...

  ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MJ.jpg)  Mohamed Said 

 22 Jun 2026     3 min read  

  Read    

 ](https://msaied.com/articles/multi-tenant-saas-in-laravel-isolating-tenant-state-with-scoped-singletons) [ ![Filament v3 to v4: Breaking Changes and Practical Refactor Patterns](https://cdn.msaied.com/261/9f6e22fb99f40a6947cdaeda527ad8c1.png) filament laravel upgrade 

### Filament v3 to v4: Breaking Changes and Practical Refactor Patterns

Upgrading from Filament v3 to v4 involves more than a version bump. This guide covers the real breaking change...

  ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MJ.jpg)  Mohamed Said 

 21 Jun 2026     4 min read  

  Read    

 ](https://msaied.com/articles/filament-v3-to-v4-breaking-changes-and-practical-refactor-patterns) [ ![Laravel AI SDK: Tool-Calling Agents and Conversation Persistence](https://cdn.msaied.com/260/8c84f424e42da01993c9ba4b8eb19655.png) laravel ai agents 

### Laravel AI SDK: Tool-Calling Agents and Conversation Persistence

Build reliable tool-calling AI agents in Laravel using the Prism package. Learn how to wire tools, persist con...

  ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MJ.jpg)  Mohamed Said 

 21 Jun 2026     3 min read  

  Read    

 ](https://msaied.com/articles/laravel-ai-sdk-tool-calling-agents-and-conversation-persistence) 

   [  ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MH.png)   Mohamed Said Laravel Backend Engineer  ](https://msaied.com)Senior Backend Engineer specializing in Laravel, scalable SaaS platforms, APIs, and cloud infrastructure. I build secure, high-performance web applications that help businesses grow.

Explore

- [Home](https://msaied.com)
- [Projects](https://msaied.com/projects)
- [Articles](https://msaied.com/articles)
- [Certificates](https://msaied.com/certificates)
- [Contact](https://msaied.com#contact-section)

Connect

- [   hello@msaied.com ](mailto:hello@msaied.com)
- [   +20 109 461 9204 ](tel:+201094619204)

© 2026 Mohamed Said. All rights reserved.

 [  ](https://github.com/EG-Mohamed) [  ](https://www.linkedin.com/in/msaiedm/) [  ](https://wa.me/201094619204) [  ](mailto:hello@msaied.com) [  ](https://drive.google.com/file/u/0/d/1MF20IPRJyzfy32mhEutjL5EpSls0w2Q8/view)