Retrieval Augmented Generation

RAG (Retrieval-Augmented Generation)

Overview

In our Retrieval-Augmented Generation (RAG) system, documents are indexed and made accessible through an intelligent agent. Each RAG instance has a unique index, a structured repository that efficiently organizes and retrieves information. This article describes the key functionalities of the RAG system, including document processing with OCR, ingestion and indexing, index management, document access, and querying data from a PostgreSQL server through our platform.

Key Concepts

Index

An index in the context of RAG is a structured reference within the agent that organizes document data for efficient retrieval. Each document ingested and indexed in the RAG system supports structured queries and search capabilities.

Storage Locations

RAG supports various storage options for accessing and processing documents:

Amazon S3: We connect to your S3 bucket to access remotely stored documents.
Google Drive: We access your cloud-stored files through your Google Drive account.
Local Disk: We support files stored locally on your device.
PostgreSQL Server: Through our platform, we perform surveys and queries on PostgreSQL databases, ensuring seamless integration for retrieving data.

OAuth (encrypted authentication) ensures secure connections to these systems.

RAG Tool for Document Ingestion and Indexing

Our RAG tool enables users to efficiently process hundreds of documents within the system. The workflow includes selecting documents and configuring their ingestion and indexing in the agent.

Document Selection

The first step is selecting the files you want to process. You can choose documents stored locally, in Amazon S3, Google Drive, or on a PostgreSQL server. Secure connections are established via OAuth for protected data access.

Processing Documents with OCR

We use Optical Character Recognition (OCR) technology to analyze documents. OCR fragments the text, identifying sections and formats to break it into manageable parts. This process converts text into semantic vectors, enabling efficient search and retrieval of relevant information.

Document Ingestion and Indexing

Once documents are processed with OCR, the system ingests and indexes them into the agent. This step is crucial for creating an effective index that supports efficient queries and retrieval.

Core Processes in RAG

1. Adding Documents to the Agent

After ingestion and indexing, you can specify which documents to assign to the RAG agent. This step ensures that only relevant files are accessible for queries and retrieval.

2. Index Management

Effective index management is essential for maintaining a robust RAG system. It enables you to organize and structure data for quick and precise retrieval.

3. Performing Queries

With documents ingested and indexed, users can perform queries within the RAG system. The agent retrieves and compiles relevant information based on the indexed data, acting as an efficient search tool that provides real-time answers.

Summary

The RAG system is a powerful tool for managing and retrieving information from large volumes of documents stored across various locations. Its capabilities include document OCR processing, ingestion and indexing, and efficient query execution, making it a robust solution for data recovery and archival.

By integrating storage options like Amazon S3, Google Drive, local disks, and PostgreSQL servers, RAG ensures seamless access to your documents and structured data, empowering agents with accurate and organized information retrieval.

Articles