Skip to content

feat : Add Indexation API for tracking sync jobs

zakariae yahya a demandé de fusionner feature/accounts-folders vers develop

Summary

This MR implements a complete indexation tracking system that monitors document synchronization from external sources (Google Drive, SharePoint, S3) and integrates with n8n workflows for automated RAG processing.

Key Features

  • Indexation Job Tracking: Track sync jobs with status, progress, and error handling
  • Document Tracking: Automatic tracking of indexed documents in database
  • Immediate Sync: Trigger file synchronization immediately after wizard completion
  • n8n Integration: Seamless integration with n8n workflows for automated processing
  • RAG Pipeline: Documents are processed, chunked, and stored in Qdrant

Changes

New Models & Schemas

  • IndexationJob model with status tracking (pending, running, completed, failed)
  • IndexedDocument model for tracking processed documents
  • Pydantic schemas for API payloads and responses

New API Endpoints

  • GET /indexations/stats - Global indexation statistics
  • GET /indexations/stats/timeline - Daily stats for charts
  • GET /indexations/jobs - List jobs with filters and pagination
  • POST /indexations/jobs - Create new indexation job
  • DELETE /indexations/jobs/{id} - Cancel a job
  • GET /indexations/documents - List indexed documents
  • DELETE /indexations/documents/{id} - Delete document from index
  • POST /indexations/trigger-sync - Manual sync trigger
  • POST /indexations/webhooks/job-progress - n8n progress updates
  • POST /indexations/webhooks/job-complete - n8n job completion

Wizard Integration

  • Automatic sync trigger after wizard activation
  • Creates indexation jobs for each datasource config
  • Triggers n8n webhooks to start processing immediately

n8n Workflow Updates

  • Google Drive immediate sync workflow with subfolder support
  • Progress tracking via webhooks
  • PDF/DOCX/TXT file filtering (no images)

Bug Fixes

  • Fixed document tracking condition in /documents/process
  • Added torch multiprocessing spawn mode for ASGI compatibility
  • Added fallback for batch embedding errors
  • Added pad_token for gpt2 tokenizer

Rapports de requête de fusion