Key takeaways
We recently onboarded more than 500 engineers for one of our clients, one of the biggest AI companies in the world. We started getting about 10 queries an hour from developers. Very quickly, our developer success team was overwhelmed and our costs shot up. Instead of adding a ton of new developer success managers, we made them 2x faster w/ this bot. This bot now handles 50% of queries on its own and saves us about $20k/month.
- 51.4% of queries are resolved instantly and don’t require a human responses
- Does the work of approximately 5 humans, it’s up 24/7, and answers instantly
- Making each dev success manager ~2x faster, this results in roughly $20k/month in savings
- It answers within slack threads
- The heart of our bot is a custom-developed Retrieval-Augmented Generation technique which we’re calling RAG 2.0
- We’ve added a data pipeline that allows the bot to learn from the human responses within slack
Sometimes, it gets vulnerable and lets you know it doesn’t know… and it results in a nice laugh.
Also, we’ve decided to open source how we achieved this.
Here’s the technical details of this ‘RAG 2.0’ method
1. Abstract
We have developed a tool that helps improve model accuracy when extracting information from a list of questions and their answers. Usually, when these systems look for answers, they can get a bit repetitive as they consider both the question and its answer together, consequently, when compared against a user query, the similarity score often diminishes. Our approach addresses this issue by making sure the system only assesses questions for similarity and consults answers solely when generating responses. We also use multiple sources of information to make sure the model has extensive knowledge in varied areas. Plus, we've set up a rule that the model should only answer when it fully comprehends the question being asked. This not only enhances the tool’s intelligence but also augments its usefulness. Here’s how we did it.
2. Introduction
The integration of Retrieval-Augmented Generation models with custom retrievers and multiple databases represents a significant leap in the development of useful chatbots. By focusing on question similarity and selectively drawing from a broader range of information sources, our approach substantially improves the precision and relevance of responses. This document outlines the technical foundation and implementation details of our enhanced RAG model, highlighting the benefits of our methodology.
3. The Challenge of Precision in AI Responses
In the realm of AI-driven question-answering systems, achieving high precision and relevance in responses is a complex challenge. Traditional Retrieval-Augmented Generation (RAG) models often struggle with redundancy and relevance, especially when pulling information from extensive databases without a refined focus. Our objective is to enhance the precision of these models by integrating a custom retriever that exclusively matches user queries against a curated database of questions and also leverages multiple databases to enrich the contextual understanding of the AI. We’re calling this RAG 2.0.
4. Technical Stack Overview
The development and success of our Developer Success Bot are rooted in a sophisticated technical stack, meticulously chosen to ensure optimal performance, efficiency, and scalability. At the core of our system, we leverage Python as the programming language, GPT-4 for our Language Model (LLM), LangChain as the library for implementing the Retrieval-Augmented Generation (RAG) approach, and ChromaDb for storing the embeddings.
Step 1: Designing the Custom Retriever
The cornerstone of our custom-developed RAG model is the custom retriever, meticulously engineered to compare incoming queries against a specifically prepared database of questions. This step involves several key technical advancements:
1. Preprocessing Techniques: Prior to similarity assessment, we implement a series of preprocessing steps designed to standardize the textual input. These steps ensure that both queries and database questions are in a uniform format conducive to accurate similarity matching. Also converted all the question answer pairs into JSON.
2. Similarity Metrics: To determine the closeness between the processed query and database questions, we employ similarity metrics. These may include vector space models like cosine similarity approaches using embeddings for a deeper semantic analysis. Also, we preprocess the matched documents so that we match the query with only the question part of documents (FAQ question and answer pairs) to get a better similarity metric.
3. Threshold-Based Filtering: A crucial component of our retriever design is the introduction of a similarity threshold. Only those questions in the database that meet or exceed this threshold are considered valid matches, ensuring that the system prioritizes high-confidence queries for response generation.
Step 2: Integrating with Multiple Databases
To broaden the scope of accessible knowledge and enhance response relevance, our system dynamically interacts with multiple databases. This process involves:
1. Database Ranking and Selection: Each database is initially ranked based on the relevance scores determined by preliminary query matching. We then select a subset of the highest-scoring databases for detailed analysis.
2. Cross-Database Query Matching: Within the chosen databases, we apply the custom retriever to identify questions that surpass the similarity threshold. This multi-layered filtering ensures that the system selects the most pertinent databases and questions for response generation.
Step 3: Enhancing Language Model Integration
The final step in our methodology is the integration of the retrieved context with language models to generate accurate and relevant answers. This involves feeding the selected answers as context into the language model, which then synthesizes this information to produce a coherent and precise response.
Conclusion: a New Benchmark in AI Precision
Through the detailed development and integration of a custom retriever and multi-database strategy, our enhanced RAG 2.0 framework sets a new benchmark for accuracy and efficiency in AI-driven question-answering systems. By focusing on semantic similarity, implementing preprocessing and similarity metrics, and optimizing database selection, we have significantly improved the relevance and precision of AI responses. This approach not only addresses the fundamental challenges in retrieval-augmented models but also opens new avenues for the development of intelligent, context-aware AI systems across various domains.