RAG in Laravel with pgvector and Embeddings | Mohamed Said        [  ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MH.png)   Mohamed Said Laravel Backend Engineer  ](https://msaied.com) [ Home ](https://msaied.com) [ Projects ](https://msaied.com/projects) [ Articles  ](https://msaied.com/articles) [ Certificates ](https://msaied.com/certificates) [ Contact ](https://msaied.com#contact-section) 

       [  ](https://github.com/EG-Mohamed)       

 [ Home ](https://msaied.com) [ Projects ](https://msaied.com/projects) [ Articles ](https://msaied.com/articles) [ Certificates ](https://msaied.com/certificates) [ Contact ](https://msaied.com#contact-section) 

  [ home ](https://msaied.com)    [ articles ](https://msaied.com/articles)    Practical RAG in Laravel: pgvector, Embeddings, and Retrieval Pipelines        On this page       1. [  Why RAG Instead of Fine-Tuning? ](#why-rag-instead-of-fine-tuning)
2. [  Setting Up pgvector in Laravel ](#setting-up-pgvector-in-laravel)
3. [  Embedding Service ](#embedding-service)
4. [  Ingestion Pipeline ](#ingestion-pipeline)
5. [  Retrieval Service ](#retrieval-service)
6. [  Prompt Assembly ](#prompt-assembly)
7. [  Key Takeaways ](#key-takeaways)

  ![Practical RAG in Laravel: pgvector, Embeddings, and Retrieval Pipelines](https://cdn.msaied.com/343/2f38128287a6e9daeaafa8cf91abe79a.png)

  #laravel   #ai   #pgvector   #postgresql   #embeddings  

 Practical RAG in Laravel: pgvector, Embeddings, and Retrieval Pipelines 
=========================================================================

     2 Jul 2026      3 min read    ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MJ.jpg)  Mohamed Said  

       Table of contents

1. [  01   Why RAG Instead of Fine-Tuning?  ](#why-rag-instead-of-fine-tuning)
2. [  02   Setting Up pgvector in Laravel  ](#setting-up-pgvector-in-laravel)
3. [  03   Embedding Service  ](#embedding-service)
4. [  04   Ingestion Pipeline  ](#ingestion-pipeline)
5. [  05   Retrieval Service  ](#retrieval-service)
6. [  06   Prompt Assembly  ](#prompt-assembly)
7. [  07   Key Takeaways  ](#key-takeaways)

 Why RAG Instead of Fine-Tuning?
-------------------------------

Retrieval-augmented generation (RAG) lets you ground an LLM's answers in your own data without the cost and complexity of fine-tuning. The pattern is straightforward: embed your documents, store the vectors, embed the user query at runtime, retrieve the closest chunks, and inject them into the prompt. PostgreSQL's `pgvector` extension makes this viable without a dedicated vector store.

Setting Up pgvector in Laravel
------------------------------

Enable the extension in a migration:

```php
public function up(): void
{
    DB::statement('CREATE EXTENSION IF NOT EXISTS vector');

    Schema::create('document_chunks', function (Blueprint $table) {
        $table->id();
        $table->foreignId('document_id')->constrained()->cascadeOnDelete();
        $table->text('content');
        $table->string('embedding_model', 64)->default('text-embedding-3-small');
        // Store as text; cast to vector in queries
        $table->text('embedding');
        $table->timestamps();
    });

    // Create an IVFFlat index after bulk-loading data
    DB::statement(
        'CREATE INDEX document_chunks_embedding_idx
         ON document_chunks
         USING ivfflat (embedding::vector(1536) vector_cosine_ops)
         WITH (lists = 100)'
    );
}

```

> **Note:** IVFFlat requires data to exist before the index is useful. For smaller datasets (&lt; 100k rows) an exact scan without an index is often fast enough.

Embedding Service
-----------------

Wrap the OpenAI call behind an interface so you can swap providers or mock in tests:

```php
interface EmbeddingProvider
{
    /** @return float[] */
    public function embed(string $text): array;
}

final class OpenAiEmbeddingProvider implements EmbeddingProvider
{
    public function __construct(
        private readonly OpenAI\Client $client,
        private readonly string $model = 'text-embedding-3-small',
    ) {}

    public function embed(string $text): array
    {
        $response = $this->client->embeddings()->create([
            'model' => $this->model,
            'input' => $text,
        ]);

        return $response->embeddings[0]->embedding;
    }
}

```

Bind it in a service provider:

```php
$this->app->singleton(
    EmbeddingProvider::class,
    fn () => new OpenAiEmbeddingProvider(
        client: OpenAI::client(config('services.openai.key')),
    )
);

```

Ingestion Pipeline
------------------

Chunk documents and persist embeddings as a queued job:

```php
final class IngestDocumentChunks implements ShouldQueue
{
    use Dispatchable, Queueable;

    public function __construct(private readonly Document $document) {}

    public function handle(EmbeddingProvider $embedder): void
    {
        $chunks = TextSplitter::splitByTokens($this->document->body, maxTokens: 512);

        foreach ($chunks as $content) {
            $vector = $embedder->embed($content);
            $literal = '[' . implode(',', $vector) . ']';

            DB::table('document_chunks')->insert([
                'document_id' => $this->document->id,
                'content'     => $content,
                'embedding'   => $literal,
                'created_at'  => now(),
                'updated_at'  => now(),
            ]);
        }
    }
}

```

Retrieval Service
-----------------

At query time, embed the question and pull the top-k chunks by cosine similarity:

```php
final class ChunkRetriever
{
    public function __construct(private readonly EmbeddingProvider $embedder) {}

    /** @return Collection */
    public function retrieve(string $query, int $topK = 5): Collection
    {
        $vector = $this->embedder->embed($query);
        $literal = '[' . implode(',', $vector) . ']';

        return DB::select(
            "SELECT content,
                    1 - (embedding::vector(1536)  ?::vector(1536)) AS similarity
             FROM document_chunks
             ORDER BY embedding::vector(1536)  ?::vector(1536)
             LIMIT ?",
            [$literal, $literal, $topK]
        ) |> collect(...);
    }
}

```

The `` operator is cosine distance; subtracting from 1 gives similarity.

Prompt Assembly
---------------

```php
$chunks = $retriever->retrieve($userQuestion);
$context = $chunks->pluck('content')->implode("\n\n---\n\n");

$messages = [
    ['role' => 'system', 'content' => "Answer using only the context below.\n\n{$context}"],
    ['role' => 'user',   'content' => $userQuestion],
];

```

Keep the system prompt tight. Stuffing too many chunks degrades answer quality and burns tokens.

Key Takeaways
-------------

- **pgvector removes the need for a separate vector database** for most Laravel applications.
- **IVFFlat indexes** trade recall for speed; tune `lists` and `probes` based on your dataset size.
- **Interface-backed embedding providers** make unit testing and provider swaps trivial.
- **Chunk size matters**: 256–512 tokens per chunk balances retrieval precision and context coverage.
- **Cosine distance (``)** is the right operator for normalized OpenAI embeddings.
- Queue ingestion jobs; embedding API calls are slow and should never block a request cycle.

 Found this useful?

          [  ](https://twitter.com/intent/tweet?url=https%3A%2F%2Fmsaied.com%2Farticles%2Fpractical-rag-in-laravel-pgvector-embeddings-and-retrieval-pipelines&text=Practical+RAG+in+Laravel%3A+pgvector%2C+Embeddings%2C+and+Retrieval+Pipelines) [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fmsaied.com%2Farticles%2Fpractical-rag-in-laravel-pgvector-embeddings-and-retrieval-pipelines) 

 Frequently Asked Questions 
----------------------------

  3 questions  

     Q01  Do I need a dedicated vector database like Pinecone or Weaviate for RAG in Laravel?        Not for most applications. PostgreSQL with the pgvector extension handles millions of vectors efficiently, especially with an IVFFlat or HNSW index. A dedicated vector store only becomes necessary when you need multi-tenanted vector isolation at very large scale or features like hybrid BM25+vector search that pgvector does not yet support natively. 

      Q02  How do I handle embedding model changes without re-ingesting all documents?        Store the model name alongside each embedding (as shown in the migration). When you switch models, run a background job that re-embeds only the chunks whose `embedding_model` column does not match the current model. This lets you migrate incrementally without downtime. 

      Q03  What chunk size should I use for text splitting?        256–512 tokens is a practical starting point. Smaller chunks improve retrieval precision but increase the number of rows and API calls during ingestion. Larger chunks carry more context per result but can dilute relevance scores. Benchmark against your specific documents and query patterns. 

  Continue reading

 More Articles 
---------------

 [ View all    ](https://msaied.com/articles) 

 [ ![Commune: A Private Community for Laravel Founders and Builders](https://cdn.msaied.com/346/a188e82cf37740fad2be5b4f70efaad1.png) community founders indie makers 

### Commune: A Private Community for Laravel Founders and Builders

Commune is a private community built for founders, makers, and developers to share progress, get feedback, fin...

  ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MJ.jpg)  Mohamed Said 

 2 Jul 2026     3 min read  

  Read    

 ](https://msaied.com/articles/commune-a-private-community-for-laravel-founders-and-builders) [ ![Laravel AI Tasks: AI Orchestration with Queues, Logging, and Cost Control](https://cdn.msaied.com/347/4274eb6d6025d184daaaba35cc79c1f9.png) Laravel AI Packages 

### Laravel AI Tasks: AI Orchestration with Queues, Logging, and Cost Control

Laravel AI Tasks is a package that wraps the Laravel AI SDK with reusable task classes, three execution modes,...

  ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MJ.jpg)  Mohamed Said 

 2 Jul 2026     3 min read  

  Read    

 ](https://msaied.com/articles/laravel-ai-tasks-ai-orchestration-with-queues-logging-and-cost-control) [ ![Laravel Reverb WebSocket Broadcasting: Real-Time Channels, Auth, and Scaling Patterns](https://cdn.msaied.com/345/e17d357902124a7017fb076e5e19fb14.png) laravel reverb websockets 

### Laravel Reverb WebSocket Broadcasting: Real-Time Channels, Auth, and Scaling Patterns

Go beyond the hello-world demo: learn how to structure private and presence channels, lock down authorization,...

  ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MJ.jpg)  Mohamed Said 

 2 Jul 2026     4 min read  

  Read    

 ](https://msaied.com/articles/laravel-reverb-websocket-broadcasting-real-time-channels-auth-and-scaling-patterns) 

   [  ![Mohamed Said](https://cdn.msaied.com/01KT78WE565VEMM3PSNQAAB0MH.png)   Mohamed Said Laravel Backend Engineer  ](https://msaied.com)Senior Backend Engineer specializing in Laravel, scalable SaaS platforms, APIs, and cloud infrastructure. I build secure, high-performance web applications that help businesses grow.

Explore

- [Home](https://msaied.com)
- [Projects](https://msaied.com/projects)
- [Articles](https://msaied.com/articles)
- [Certificates](https://msaied.com/certificates)
- [Contact](https://msaied.com#contact-section)

Connect

- [   hello@msaied.com ](mailto:hello@msaied.com)
- [   +20 109 461 9204 ](tel:+201094619204)

© 2026 Mohamed Said. All rights reserved.

 [  ](https://github.com/EG-Mohamed) [  ](https://www.linkedin.com/in/msaiedm/) [  ](https://wa.me/201094619204) [  ](mailto:hello@msaied.com) [  ](https://drive.google.com/file/u/0/d/1MF20IPRJyzfy32mhEutjL5EpSls0w2Q8/view)