RAG Overview

Retrieval Augmented Generation (RAG) is a powerful approach that combines the strengths of neural language models with retrieval from a large corpus of documents to produce more accurate and informed outputs. The RAG system enables the retrieval of highly relevant documents using natural language search, and it is scalable to billions of records, functioning much like a traditional database. A key advantage of RAG is that it does not require context stuffing, meaning that the relevant context is pulled from the extensive database rather than being fed into the model in advance.

RAG can be particularly effective in addressing issues like hallucinations in large language models (LLMs), which are instances where the model generates factually incorrect or nonsensical information. By using vector databases that store embeddings of texts, RAG can provide an anchor to the “real world” and structured data. When a query is made, the system retrieves semantically relevant content from the vector database, which is then used by the LLM to generate responses. This ensures that the information is grounded in the content that has been previously validated and stored in the database.

The architecture of RAG involves several components working together. An application sends a query to an embedding model, which converts the query into a vector representation. This vector is then matched against a vector database via an API to find the most relevant documents or chunks of text. The results are filtered through metadata to ensure relevance and compliance with access controls. The selected content is streamed into the LLM’s context window, providing a foundation for it to generate informed responses.

Additionally, RAG incorporates a continuous cycle of ingestion and updating. New documents are crawled, chunked, and their embeddings are addei to the vector database, which ensures that the knowledge base is constantly expanding and staying up-to-date with the latest information. This dynamic nature of RAG makes it a robust system for applications requiring access to a vast, evolving pool of data and the ability to provide accurate, context-aware responses.

Messaging Interface

We have worked hard on providing the ultimate communication interface for developing RAG AI Agents. Each channel has a subscription to a GraphQL API that allows you to query and mutate data in the RAG system. You can use this API to build your own custom UI or integrate with your existing systems.


let GET_MY_CHANNEL_MESSAGES = gql`
  subscription (
    $last_received_id: String
    $last_received_ts: String
    $first_received_date: date
    $chat_id: uuid
  ) {
    message(
      order_by: { date: asc, timestamp: asc }
      where: {
        _and: {
          id: { _neq: $last_received_id }
          timestamp: { _gte: $last_received_ts }
          date: { _gte: $first_received_date }
          _and: {
            user: { username: { _neq: "null" } }
            _and: { chat_id: { _eq: $chat_id } }
          }
        }
      }
      limit: 200
    ) {
      id
      body
      username
      from
      time
      timestamp
      date
      created_at
      isCode
      likes
      channel_id
      isCommerce
      isReason
      chartContent
      image_url
      agent_id
      voice_model_id
      user {
        id
        username
        avatar
      }
    }
  }
`;

Get Returns Embeddings from StateSet API


// Main function to create returns feed
const createReturnsFeed = async (req, res) => {

    async function getReturns(offset, token) {

        const DOCUMENT_UPSERT_BATCH_SIZE = 3;
        var token;
        var limit = 1;
        console.log(offset);
        console.log(limit);

        try {

            const response = await axios.post(`https://api.stateset.com/api/get-returns`, { limit: limit, offset: offset, "order_direction": "desc" }, {
                headers: {
                    'Content-Type': 'application/json',
                    'token': "Bearer " + token,
                },
            });

            var returns = response.data;

            if (returns) {

                const documents = [];

                for (const item of returns) {

                    try {

                        const returnId = item.id || null;
                        const orderId = item.order_id || null;
                        const actionNeeded = item.action_needed || null;
                        const description = item.description || null;
                        const status = item.status || null;
                        const customerEmail = item.customerEmail || null;
                        const customerEmailNormalized = item.customer_email_normalized || null;
                        const zendeskNumber = item.zendesk_number || null;
                        const serialNumber = item.serial_number || null;
                        const reportedCondition = item.reported_condition || null;
                        const trackingNumber = item.tracking_number || null;
                        const rma = item.rma || null;
                        const reasonCategory = item.reason_category || null;
                        const country = item.country || null;
                        const ssoId = item.sso_id || null;
                        const enteredBy = item.entered_by || null;
                        const scannedSerialNumber = item.scanned_serial_number || null;
                        const issue = item.issue || null;
                        const condition = item.condition || null;
                        const amount = item.amount || null;
                        const taxRefunded = item.tax_refunded || null;
                        const totalRefunded = item.total_refunded || null;
                        const createdDate = item.created_date || null;
                        const orderDate = item.order_date || null;
                        const shippedDate = item.shipped_date || null;
                        const requestedDate = item.requested_date || null;
                        const flatRateShipping = item.flat_rate_shipping || null;
                        const warehouseReceivedDate = item.warehouse_received_date || null;
                        const warehouseConditionDate = item.warehouse_condition_date || null;


                        let metadata = {
                            returnId,
                            orderId,
                            actionNeeded,
                            description,
                            status,
                            customerEmail,
                            customerEmailNormalized,
                            zendeskNumber,
                            serialNumber,
                            reportedCondition,
                            trackingNumber,
                            rma,
                            reasonCategory,
                            country,
                            ssoId,
                            enteredBy,
                            scannedSerialNumber,
                            issue,
                            condition,
                            amount,
                            taxRefunded,
                            totalRefunded,
                            createdDate,
                            orderDate,
                            shippedDate,
                            requestedDate,
                            flatRateShipping,
                            warehouseReceivedDate,
                            warehouseConditionDate,

                        };

                        const document = {
                            id: uuid(),
                            returnId,
                            metadata,
                        };

                        documents.push(document);


                    } catch (error) {
                        console.log(`Error processing item: ${JSON.stringify(item)}`);
                        console.log(error);
                    }

                    for (let i = 0; i < documents.length; i += DOCUMENT_UPSERT_BATCH_SIZE) {

                        // Split documents into batches
                        var batchDocuments = documents.slice(i, i + DOCUMENT_UPSERT_BATCH_SIZE);

                        console.log(batchDocuments);

                        // Convert batchDocuments to string
                        var batchDocumentString = JSON.stringify(batchDocuments)

                        // Remove commas from string
                        console.log(batchDocumentString);

                        // Create Embeddings
                        console.log('Looping through documents...');

                        const user_id = "domsteil";

                        // OpenAI Request Body
                        var raw = JSON.stringify({ "input": batchDocumentString, "model": "text-embedding-3-large", "user": user_id });

                        // OpenAI Request Options
                        var requestOptions = {
                            method: 'POST',
                            headers: {
                                'Content-Type': 'application/json',
                                'Authorization': process.env.OPEN_AI
                            },
                            body: raw,
                            redirect: 'follow'
                        };

                        // Make Callout to OpenAI to get Embeddings

                        console.log('Creating Embedding...');

                        // Make Callout to OpenAI
                        let embeddings_response = await fetch("https://api.openai.com/v1/embeddings", requestOptions)

                        // Create Pinecone Request Body
                        const vectors_embeddings = await embeddings_response.json();

                        console.log(vectors_embeddings);

                        // Create Pinecone Request Body
                        var vectors_object = { id: uuid(), values: vectors_embeddings.data[0].embedding, metadata: { "text": batchDocumentString, "user": user_id } };

                        console.log(vectors_object);

                        var raw = JSON.stringify({ "vectors": vectors_object, "namespace": `return_data` });

                        var pineconeRequestOptions = {
                            method: "POST",
                            headers: {
                                "Content-Type": "application/json",
                                "Host": pinecone_index,
                                "Content-Length": 60,
                                "Api-Key": pinecone_api_key,
                            },
                            body: raw,
                            redirect: "follow",
                        };

                        // Make Callout to Pinecone
                        // Pinecone Upsert

                        console.log('Upserting Pinecone...');

                        let pinecone_query_response = await fetch(`https://${pinecone_index}/vectors/upsert`, pineconeRequestOptions)
                            .then(response => response.text())
                            .then(json => {
                                console.log(json);
                            })
                            .catch(error => {
                                console.error(error);
                            });

                    }
                }

                await getReturns(offset + 3, token);

            }

        } catch (error) {
            console.log(offset);
            console.error(error);
        }
    }

    // Start at offset 0 with limit of 100
    await getReturns(0, access_token);
}

Stream Object Interface

Now that we have the embeddings stored in the vector database, we can use the Stream Object API to generate the returns object details for a returns management system. This will allow us to retrieve the relevant information from the vector database and generate the return object details using a language model.


export type Return = {
    returnId: string | null;
    orderId: string | null;
    actionNeeded: string | null;
    description: string | null;
    status: string | null;
    customerEmail: string | null;
    customerEmailNormalized: string | null;
    zendeskNumber: string | null;
    serialNumber: string | null;
    reportedCondition: string | null;
    trackingNumber: string | null;
    rma: string | null;
    reasonCategory: string | null;
    country: string | null;
    ssoId: string | null;
    enteredBy: string | null;
    scannedSerialNumber: string | null;
    issue: string | null;
    condition: string | null;
    amount: string | null;
    taxRefunded: string | null;
    totalRefunded: string | null;
    createdDate: string | null;
    orderDate: string | null;
    shippedDate: string | null;
    requestedDate: string | null;
    flatRateShipping: string | null;
    warehouseReceivedDate: string | null;
    warehouseConditionDate: string | null;
};

export async function generate(input: string) {
    'use server';

    const stream = createStreamableValue();

        let contexts = [];

    try {
        const pinecone_query_response = await fetch(
            `https://${process.env.PINECONE_INDEX}/query`,
            {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify({
                    topK: 1,
                    vector: input,
                    includeMetadata: true,
                }),
            }
        )
        .then((response) => response.json())
        .then((json) => {
            contexts = json.matches.map((x) => x.metadata.text);
        })
        .catch((error) => {
            console.error(error);
            throw new Error("An error occurred while querying Pinecone");
        });
    } catch (error) {
        console.error("Error fetching context:", error);
        return { error: "An error occurred while querying Pinecone" };
    }

    const promptWithContext = `Context: ${contexts.join(' ')}, Query: ${input}`;

    (async () => {
        const { partialObjectStream } = await streamObject({
            model: openai('gpt-4-turbo'),
            system: 'You generate return object details for a returns management system',
            prompt: promptWithContext,
            schema: z.object({
                returns: z.array(
                    z.object({
                        returnId: z.string().nullable().describe('ID of the return.'),
                        orderId: z.string().nullable().describe('ID of the order.'),
                        actionNeeded: z.string().nullable().describe('Action needed for the return.'),
                        description: z.string().nullable().describe('Description of the return.'),
                        status: z.string().nullable().describe('Status of the return.'),
                        customerEmail: z.string().nullable().describe('Email of the customer.'),
                        customerEmailNormalized: z.string().nullable().describe('Normalized email of the customer.'),
                        zendeskNumber: z.string().nullable().describe('Zendesk ticket number.'),
                        serialNumber: z.string().nullable().describe('Serial number of the returned item.'),
                        reportedCondition: z.string().nullable().describe('Reported condition of the returned item.'),
                        trackingNumber: z.string().nullable().describe('Tracking number for the return shipment.'),
                        rma: z.string().nullable().describe('Return Merchandise Authorization number.'),
                        reasonCategory: z.string().nullable().describe('Category of the return reason.'),
                        country: z.string().nullable().describe('Country of the return.'),
                        ssoId: z.string().nullable().describe('SSO ID associated with the return.'),
                        enteredBy: z.string().nullable().describe('User who entered the return.'),
                        scannedSerialNumber: z.string().nullable().describe('Scanned serial number of the returned item.'),
                        issue: z.string().nullable().describe('Issue reported with the returned item.'),
                        condition: z.string().nullable().describe('Condition of the returned item.'),
                        amount: z.string().nullable().describe('Amount refunded for the return.'),
                        taxRefunded: z.string().nullable().describe('Tax refunded for the return.'),
                        totalRefunded: z.string().nullable().describe('Total amount refunded for the return.'),
                        createdDate: z.string().nullable().describe('Date the return was created.'),
                        orderDate: z.string().nullable().describe('Date the order was placed.'),
                        shippedDate: z.string().nullable().describe('Date the order was shipped.'),
                        requestedDate: z.string().nullable().describe('Date the return was requested.'),
                        flatRateShipping: z.string().nullable().describe('Flat rate shipping cost for the return.'),
                        warehouseReceivedDate: z.string().nullable().describe('Date the return was received at the warehouse.'),
                        warehouseConditionDate: z.string().nullable().describe('Date the return condition was assessed at the warehouse.'),
                    }),
                ),
            }),
        });

        for await (const partialObject of partialObjectStream) {
            stream.update(partialObject);
        }

        stream.done();
    })();

    return { object: stream.value };
}

Conclusion

RAG is a powerful approach that combines the strengths of neural language models with retrieval from a large corpus of documents to produce more accurate and informed outputs. By leveraging the capabilities of vector databases and dynamic ingestion, RAG can provide context-aware responses grounded in validated information. This makes it an ideal solution for applications requiring access to vast, evolving pools of data and the ability to generate accurate, context-aware responses. With the right architecture and components in place, RAG can be a game-changer for AI applications that need to deliver reliable, fact-based information to users.