GraphFusion-AI: Building a Hybrid RAG System with Vector + Graph Retrieval

June 17, 2026 Dhiman Sutradhar 15 min read

Python FastAPI LangGraph LangChain Neo4j Pinecone Streamlit Tesseract OCR OpenAI

🚀 Live Demo (Coming Soon) View Source Code

Introduction

Traditional RAG (Retrieval-Augmented Generation) systems rely on a single retrieval strategy — usually vector similarity search. While effective, this approach can miss contextual relationships between pieces of information. GraphFusion-AI solves this by combining three retrieval strategies into a single platform: Normal RAG, Graph RAG, and Hybrid RAG.

In this blog, I'll walk through the complete architecture, implementation details, and design decisions behind building this system.

Architecture Overview

┌─────────────────┐         ┌────────────────────────────┐
│                 │         │       FastAPI Backend       │
│   Streamlit     │  HTTP   │                            │
│   Frontend      │────────▶│  ┌──────────────────────┐  │
│                 │         │  │   LangGraph Workflow  │  │
│  • Chat UI      │         │  │                      │  │
│  • File Upload  │         │  │  ┌────────────────┐  │  │
│  • RAG Selector │         │  │  │  Route by Mode │  │  │
│                 │         │  │  └───────┬────────┘  │  │
└─────────────────┘         │  │          │           │  │
                            │  │    ┌─────┼─────┐     │  │
                            │  │    ▼     ▼     ▼     │  │
                            │  │ Normal Graph Hybrid   │  │
                            │  └───┬─────┬─────┬──────┘  │
                            │      │     │     │         │
                            └──────┼─────┼─────┼─────────┘
                                   │     │     │
                         ┌─────────┘     │     └──────────┐
                         ▼               ▼                ▼
                  ┌─────────────┐ ┌─────────────┐ ┌───────────┐
                  │  Pinecone   │ │    Neo4j    │ │  Both +   │
                  │  (Vectors)  │ │   (Graph)   │ │  Merge    │
                  └─────────────┘ └─────────────┘ └───────────┘

The Problem with Traditional RAG

Standard RAG pipelines work like this:

Split documents into chunks
Convert chunks into vector embeddings
When a user asks a question, convert it to a vector
Find the most similar chunks via cosine similarity
Pass those chunks as context to an LLM

This works well for direct factual questions, but struggles with:

Multi-hop reasoning — when the answer requires connecting information from multiple chunks
Structural relationships — when document structure (sections, hierarchy) matters
Entity relationships — when understanding how entities relate is key

💡 Key Insight: By storing documents in both a vector database AND a knowledge graph, we can combine the precision of semantic search with the relational power of graph traversal.

Dual Ingestion Pipeline

When a document is uploaded, GraphFusion-AI processes it through a dual ingestion pipeline:

    Document Upload
         │
         ▼
┌─────────────────┐
│  OCR / Text     │  ← Tesseract for scanned docs
│  Extraction     │  ← pdf2image for PDF pages
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Text Chunking  │  ← 1000 chars, 200 overlap
│  (LangChain)    │
└────────┬────────┘
         │
    ┌────┴────┐
    ▼         ▼
┌────────┐ ┌────────┐
│Pinecone│ │ Neo4j  │
│Upsert  │ │ Create │
│Vectors │ │ Nodes  │
└────────┘ └────────┘

Pinecone Ingestion

Each chunk is embedded using OpenAI's text-embedding-ada-002 model (1536 dimensions) and stored in Pinecone with metadata including the document ID, user email, and the original text.

# Generate embeddings for all chunks
vectors = embeddings.embed_documents(chunks)

# Upsert to Pinecone with metadata
records = [
    {
        "id": f"{doc_id}_{i}",
        "values": vector,
        "metadata": {"text": chunk, "doc_id": doc_id, "user": user_email}
    }
    for i, (vector, chunk) in enumerate(zip(vectors, chunks))
]
index.upsert(vectors=records)

Neo4j Graph Ingestion

Simultaneously, we create a graph structure with Document nodes connected to Chunk nodes via PART_OF relationships:

# Create document node
session.run("MERGE (d:Document {id: $id}) SET d.filename = $fn, d.user = $user",
            id=doc_id, fn=filename, user=user_email)

# Create chunk nodes with relationships
for i, chunk in enumerate(chunks):
    session.run(
        "MATCH (d:Document {id: $doc_id}) "
        "CREATE (c:Chunk {id: $cid, text: $text, index: $idx})-[:PART_OF]->(d)",
        doc_id=doc_id, cid=f"{doc_id}_{i}", text=chunk, idx=i
    )

Three RAG Strategies

1. Normal RAG (Vector Search)

Converts the user's question into a vector and finds the top-5 most semantically similar chunks from Pinecone.

Question → Embed → Query Pinecone (top-5) → Context → LLM → Answer

Best for: Direct factual questions, keyword-heavy queries, finding specific passages.

2. Graph RAG (Knowledge Graph)

Traverses the Neo4j graph to find all chunks belonging to the target document using Cypher queries.

MATCH (c:Chunk)-[:PART_OF]->(d:Document {id: $doc_id})
RETURN c.text AS text LIMIT 10

Best for: Questions requiring broader document context, structural understanding, relationship-based queries.

3. Hybrid RAG (Combined)

Executes both vector search AND graph traversal, then merges the results before sending to the LLM.

Question ──┬──▶ Pinecone (top-3) ──┐
           │                       ├──▶ Merge Context ──▶ LLM ──▶ Answer
           └──▶ Neo4j (top-5)   ───┘

Best for: Complex questions requiring both semantic similarity and structural context.

LangGraph Workflow Orchestration

Rather than using simple if/else logic, GraphFusion-AI uses LangGraph to orchestrate the RAG pipeline as a state machine:

from langgraph.graph import StateGraph, END

class RAGState(TypedDict):
    query: str
    mode: str           # "normal", "graph", or "hybrid"
    document_id: str | None
    response: str
    web_results: list

def build_rag_workflow():
    workflow = StateGraph(RAGState)
    workflow.add_node("route_rag", route_rag)
    workflow.add_node("web_search", web_search_node)
    workflow.set_entry_point("route_rag")
    workflow.add_edge("route_rag", END)
    workflow.add_edge("web_search", END)
    return workflow.compile()

🔧 Why LangGraph? It makes the workflow extensible. Adding new nodes (reranking, fact-checking, multi-step reasoning) is as simple as adding a node and an edge — no refactoring needed.

OCR Pipeline with Tesseract

Many real-world documents are scanned PDFs or photos of printed pages. GraphFusion-AI handles this transparently:

def extract_text(content: bytes, filename: str) -> str:
    ext = filename.rsplit(".", 1)[-1].lower()
    if ext == "pdf":
        # Convert each PDF page to an image, then OCR
        images = convert_from_bytes(content)
        return "\n".join(pytesseract.image_to_string(img) for img in images)
    elif ext in ("png", "jpg", "jpeg", "tiff", "bmp"):
        img = Image.open(io.BytesIO(content))
        return pytesseract.image_to_string(img)
    else:
        return content.decode("utf-8", errors="ignore")

The pipeline uses pdf2image (backed by Poppler) to render PDF pages as images, then Tesseract extracts the text. This handles both digital PDFs and scanned documents.

Authentication with Google SSO

Security is handled via Google OAuth 2.0 using the authlib library:

User ──▶ /auth/login ──▶ Google OAuth ──▶ /auth/callback
                                              │
                                              ▼
                                     Create JWT Token
                                              │
                                              ▼
                                   Redirect to Frontend
                                   with token in URL

Every subsequent API request includes the JWT in the Authorization: Bearer header. The backend validates it before processing any request.

Internet Search Integration

Sometimes the uploaded document doesn't contain all the information needed. GraphFusion-AI includes a Tavily-powered internet search feature that lets users find additional context on any topic:

async def internet_search(query: str) -> list[dict]:
    client = TavilyClient(api_key=settings.tavily_api_key)
    results = client.search(query, max_results=5)
    return [{"title": r["title"], "url": r["url"], "content": r["content"]}
            for r in results["results"]]

Frontend: Streamlit Dashboard

The user interface is built with Streamlit, providing:

Google SSO login — single click authentication
RAG mode selector — dropdown to switch between Normal/Graph/Hybrid
Document upload — drag-and-drop with progress indication
Chat interface — conversational Q&A with message history
Internet search — sidebar search with clickable results

┌───────────────────────────────────────────────────────┐
│  ┌──────────────┐  ┌──────────────────────────────┐  │
│  │  SIDEBAR     │  │        MAIN AREA             │  │
│  │              │  │                              │  │
│  │ RAG Mode: ▼  │  │  🧑 What is the revenue?    │  │
│  │ [Hybrid RAG] │  │                              │  │
│  │              │  │  🤖 Based on the document,  │  │
│  │ ─────────── │  │     the revenue for Q4 was  │  │
│  │ Upload Doc   │  │     $2.3M, representing a   │  │
│  │ [Browse...]  │  │     15% increase...         │  │
│  │              │  │                              │  │
│  │ ─────────── │  │                              │  │
│  │ Web Search   │  │  ┌─────────────────────────┐│  │
│  │ [...........] │  │  │ Ask a question...      ││  │
│  │ [🔍 Search]  │  │  └─────────────────────────┘│  │
│  └──────────────┘  └──────────────────────────────┘  │
└───────────────────────────────────────────────────────┘

Deployment

Local (Docker Compose)

The entire stack runs locally with a single command:

docker-compose up --build
# Backend:  http://localhost:8000
# Frontend: http://localhost:8501
# Neo4j:    http://localhost:7474

Production (AWS ECS + Terraform)

For production, the project includes Terraform scripts that provision:

ECS Fargate — serverless containers for backend and frontend
Application Load Balancer — HTTPS termination, path-based routing
Route53 — DNS management
S3 — document storage
ECR — container image registry

Key Design Decisions

Decision	Rationale
Dual storage (Pinecone + Neo4j)	Each excels at different retrieval patterns
LangGraph over plain functions	Extensibility — easy to add reranking, routing, guardrails
Tesseract OCR	Open-source, no API costs, handles most document types
FastAPI + Streamlit separation	Independent scaling, API reusable by other clients
JWT over sessions	Stateless auth, works across services

Performance Comparison

In testing with a 50-page technical document:

RAG Mode	Accuracy	Latency	Best For
Normal RAG	Good	~2s	Factual lookups
Graph RAG	Good	~1.5s	Structural queries
Hybrid RAG	Best	~3s	Complex questions

What's Next

Add entity extraction during ingestion to create richer graph relationships
Implement reranking with Cohere or a cross-encoder for better precision
Add conversation memory for multi-turn follow-up questions
Support multi-document chat — query across all uploaded documents
Implement streaming responses for better UX

Try It Out

The full source code is available on GitHub. Deploy it locally in under 5 minutes with Docker Compose:

git clone https://github.com/bergerlijk07/GraphFusion-AI.git
cd GraphFusion-AI
cp .env.example .env    # Fill in your API keys
docker-compose up --build

🚀 Try the Live Demo

Written by Dhiman Sutradhar — Technical Lead & Backend Architect

LinkedIn GitHub