Skip to content
Tamas Demeter
HR/Recruitment3 months

Zero-Touch CV Processing: 4-Stage AI Pipeline on n8n

Built a fully automated CV processing pipeline with 4-stage AI analysis, dual-model failover (Claude Sonnet primary, Gemini Flash backup), quality scoring, 3-tier routing, and a dedicated error handler workflow. Zero manual intervention for the happy path.

Role:Workflow Automation Architect & AI Integration Designer
Tools:n8nGoogle DriveGoogle SheetsClaude SonnetGemini FlashSlack

Watch the walkthrough

Coming soon

3-6 minute screen-share showing Problem → Solution → Result

The Problem

Manual CV Processing That Does Not Scale

Every CV required a recruiter to open the PDF, read through the content, manually extract contact details, skills, experience, preferences, then type that data into a spreadsheet. A single CV took 15-20 minutes. At volume, this created a backlog that delayed candidate responses and frustrated hiring managers waiting for structured candidate data.

Inconsistent Data Extraction and No Visibility

Different people extracted different fields. One recruiter captured LinkedIn URLs, another did not. Seniority levels were labeled inconsistently. Years of experience were calculated differently. The spreadsheet was full of gaps. CVs were dropped into a shared Google Drive folder with no tracking of which files had been processed, which were duplicates, and which had failed. Corrupted files sat indefinitely.

No Audit Trail for Compliance or Quality Review

Recruiter decisions about candidate data had no log. No record of when a CV was processed, what data was extracted, or whether the extraction was accurate. Quality review was impossible because there was nothing to review against. PII handling for sharing with hiring managers was fully manual and error-prone.

The Solution

Architecture diagram — click to zoom

01

Stage 1: Scheduled Trigger with Two-Layer Deduplication

The system runs on a scheduled trigger and scans a designated Google Drive folder for new CV files. Before processing, it performs two layers of deduplication. First, it checks file URLs against the master database to skip files already recorded. Second, it computes an MD5 hash of the binary file content and compares against a hash registry, catching re-uploads or renamed duplicates that URL matching would miss.

02

Stage 2: File Type Routing

A Switch node routes files by MIME type. PDFs proceed to the processing pipeline. Non-PDF files trigger an immediate Slack notification to the operations channel with the file name and MIME type, then route to a "needs conversion" folder. Ensures no file gets silently ignored.

03

Stage 3: 4-Stage AI Pipeline with Dual-Model Failover

Core of the system. Four-stage LLM chain, each stage running Claude Sonnet primary, Gemini Flash automatic fallback. (1) Analyze CV: extracts structured data including core identity, role and experience, skills and tools, languages, location, salary, work type. (2) Score and Validate: dedicated validation LLM checks every field against rules and produces a quality score out of 100. (3) Rescan: takes rescan instructions and re-reads the original CV to find what the first pass missed. Deep merge combines original extraction with rescan findings. (4) Summarize: generates a full summary and a PII-stripped shareable summary for hiring managers.

04

Stage 4: Quality Gate + Master Database Upsert

Quality gate routes CVs based on their score. 60 or above: full pipeline continues. 30-59: pipeline continues but flags "needs review." Under 30: too unreliable for the master database. System sends Slack alert, moves file to failed CVs folder, skips DB write. Successful records are written to the master Google Sheets database using append-or-update matched on email.

05

Stage 5: Error Handler & Audit Trail

A dedicated "CV Parser Error Handler" workflow catches any unhandled errors from the main pipeline. Extracts execution ID, failed node, error message, file context, timestamps. Writes a FAILED row to the Processing Log spreadsheet. Sends a detailed Slack alert. Every CV that passes through the system — success, failure, or flagged for review — gets a row in the Processing Log with 13 tracked fields.

The Impact

Quantitative Results

  • CV processing reduced from 15-20 minutes of manual work per candidate to fully automated zero-touch operation
  • Every CV produces the same structured output regardless of format, layout, or writing style
  • Three-tier quality gate prevents low-confidence data from polluting the master database
  • Complete audit trail for every CV processed including failures and partial extractions

Strategic Value

  • Dual-model failover keeps the pipeline operational even when the primary AI provider rate-limits or errors. PII-stripped summaries enable safe candidate sharing with hiring managers without compliance risk
  • Processing Log enables trend analysis: which CV formats cause problems, what quality scores look like over time, where the pipeline fails most often
15-20 min → 0
Manual work per CV eliminated
4 stages + failover
AI pipeline with Claude + Gemini
100% audited
Every CV logged with status and quality score

Have a similar problem?

Tell me what is going on and I will tell you what I would do about it. No obligation.

Get in touch