ReSponse AI

ReSponse is leveraging the Vector Embeddings API from OpenAI


var vectors_embeddings;

let embedding_response = await fetch("https://api.openai.com/v1/embeddings", requestOptions)
    .then(response => response.json())
    .then(json => {
        vectors_embeddings = json.data[0].embedding;
    })
    .catch(error => {
        console.error(error);
    });

OpenAI’s text embeddings measure the relatedness of text strings. Embeddings are commonly used for:

  • Search (where results are ranked by relevance to a query string)
  • Clustering (where text strings are grouped by similarity)
  • Recommendations (where items with related text strings are recommended)
  • Anomaly detection (where outliers with little relatedness are identified)
  • Diversity measurement (where similarity distributions are analyzed)
  • Classification (where text strings are classified by their most similar label)

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

Embeddings are central to AI, offering a way to represent large knowledge bases in a continuous vector space. They are numerical representations that encapsulate the features and relationships of discrete objects, like words or documents. For applications like Retrieval-Augmented Generation (RAG), handling embeddings at scale is crucial. With potentially millions of documents, the sheer number of embeddings can be overwhelming, necessitating a pipeline that continuously processes and updates the embedded knowledge base.

Internally, many AI models mimic human brain functions using neural networks composed of interconnected layers of neurons. These networks learn from massive amounts of data, adjusting the connections between layers to extract high-level features. These features, represented as vectors in high-dimensional space, are what we refer to as embeddings.

The geometric representation of vector embeddings is often visualized in a space where the position of each point (embedding) corresponds to the characteristics of the object it represents. This spatial arrangement allows models to infer relationships and similarities between objects based on their proximity.

Contextualized embeddings go a step further by considering the context in which a word or phrase is used. This means that the same word in different sentences can have different embeddings, reflecting its varying meanings depending on usage.

At a very high level, the workflow that ReSponse use can be divided into three stages:

1

Data preprocessing / embedding

This stage involves storing private data (FAQ / Customer Support documents, in our example) to be retrieved later. Typically, the documents are broken into chunks, passed through an embedding model, then stored in a specialized database called a vector database.

2

Prompt construction / retrieval:

When a user submits a query (a customer question, in this case), the application constructs a series of prompts to submit to the language model. A compiled prompt typically combines a prompt template hard-coded by the developer; examples of valid outputs called few-shot examples; any necessary information retrieved from external APIs; and a set of relevant documents retrieved from the vector database.

3

Prompt execution / inference

Once the prompts have been compiled, they are submitted to a pre-trained LLM for inference—including both proprietary model APIs and open-source or self-trained models. Some developers also add operational systems like logging, caching, and validation at this stage.

Upsert, Retrieve and Update

The Upsert function enable us to generate an embedding, save it with the text in the metadata and vector array into Pinecone. The Retrieve functions allows us to pass query the vector database with the embedding based on the question, return back the text from the metadata as a context to be used in the completions API. The Update function enable us to generate an embedding, search for the corresponding vector Id in Pinecone, and update it with the text in the metadata and vector array into Pinecone.

ReSponse is using the chat functionality to to call /add, /answer and /update commands.

  • /add calls the Upsert API
  • /answer calls the Retrieve API
  • /update calls the Update API

Getting Started

Sign up on response.cx and create a new organization.

To add data to the knowledge base use the /add command.

To update data in the knowledge base use the /update command.

Managing your Knowledge Base

To see what data is in your knowledge base use the /list command or go to the knowledge base tab.