Anthropic Application Startups Solutions Architect, Applied AI

building a production
rag assistant

a sovereign HR assistant for french public sector teams

EF Edouard Foussier · AI Engineer

the challenge

context and constraints

Anthropic Application

15,000+ pages

HR regulations scattered across 4 sources: Légifrance API, service-public.gouv.fr XML, ministry PDFs, Excel spreadsheets.

300+ HR managers

Spending 4+ hours per week searching for answers in dense legal documentation. Need instant, reliable answers.

sovereignty first

No external APIs. All inference through Albert API — France's sovereign cloud (SecNumCloud certified).

Building an assistant to help French public sector HR teams navigate complex employment regulations.

Goal: Turn hours of searching into seconds of answers.

basic rag architecture

initial baseline pipeline (v1)

Anthropic Application

Query User question

→

Embedding Query embedding

→

Retrieval Union of 4 tables

→

Reranking BGE reranker

→

Generation Albert API

100% Sovereign Stack

Albert API for LLM, embeddings, reranking
On-premise PostgreSQL + pgvector
No data leaves French infrastructure

Built-in Traceability

Every answer cites source documents
Clickable links to original legal texts
Full audit trail for compliance

real-life sprint

4-hour session with 13 HR managers — 508 questions, I analyzed every failure

Anthropic Application

User satisfaction from 347 evaluations

52% Happy (4-5★)

22% Neutral (3★)

26% Failed (1-2★)

Error Taxonomy (92 failures)

48% Missing info in KB

11% Wrong chunks

11% Bad synthesis

9% Unclear question

6% Hallucination

Diagnosis (grouped)

Retrieval issues 59%

Generation issues 33%

59% of failures came from retrieval. Instead of tuning prompts blindly,
I rebuilt the architecture from scratch. → This became my roadmap.

key improvements

3 architectural changes that moved the needle

Anthropic Application

Semantic Chunking

Structure-aware parser preserving legal article boundaries.

1 article = 1 chunk

Markdown formatting for hierarchy.

Parallel Retrieval

4 concurrent searches across different sources.

Parameters tuned per source

All sources fairly represented.

LLM Selector

Dedicated agent with priority rules filters chunks.

20 → 2.5 chunks on average

Reasoning exposed → explainable selections.

Trade-off: adds processing time, but improves precision and user trust.

llm selector in action

a dedicated filtering agent beats self-filtering

Anthropic Application

              
              input: 20 chunks from 4 tables
            
Source Hierarchy
                [MINISTRY] → P1 Ministry PDFs
              
                [GOV_PORTAL] → P2 Gov website
              
                [LEGAL] → P3 Legal texts
              
                Rules:

                1. Prefer practical guides

                2. Eliminate off-topic docs

                3. Order by relevance

                4. Keep ~5 docs max

output: structured json

{
  "selected_ordered": ["MINISTRY-3", "MINISTRY-1", "GOV_PORTAL-2"],
  "dropped": ["LEGAL-1", "LEGAL-2", "GOV_PORTAL-1", ...],
  "primary_source": "ministry",
  "reasoning": "Question about leave policy → ministry guides sufficient. Legal texts too generic."
}

Result: LLM Selector reduces context from 20 chunks → 2.5 curated chunks on average.
Generator receives only high-quality, relevant information.

other useful ideas

additional modules in the system

Anthropic Application

Query Processor

Acronym expansion
CDD → Fixed-term contract

Handles French HR jargon.

Intent Gating

Filters out-of-scope questions
before retrieval even starts.

Saves compute, avoids hallucination.

Context Expansion

Legal reference extraction
Detects "Article L3141-1" in chunks.

Auto-fetches full text from law API.

These modules address edge cases and improve robustness.

new rag architecture

runtime pipeline (v2): query to answer

Anthropic Application

Query Processor Acronym + Intent

→

Parallel Retrieval 4 tables

→

LLM Selector 20 → 2.5 chunks

→

Context Expansion Legal references

→

Generation Albert API

New modules Standard

4 new modules tackle the 59% retrieval failures.
Same sovereign stack. Same traceability. Better answers.

metrics measured

V1 vs V2 comparison with RAGAS

Anthropic Application

Metric	V1 Baseline	V2 Production	Improvement
Faithfulness	0.45	0.79	+78%
Answer Relevancy	0.53	0.81	+53%
Context Relevancy	0.83	0.89	+7%
Global Score	0.60	0.83	+38%

Evaluation framework: RAGAS (industry-standard)
Test set: 347 real user questions from November 2025 sprint
Key insight: Biggest gain in Faithfulness (+78%) — LLM Selector eliminates hallucinations

next step

beta testing in production

Anthropic Application

Started January 8, 2026 · 1 month

Beta Test with 70+ Users

HR managers across French public administration are now using V2 in production

Goal: Measure user satisfaction on V2 architecture to complete our analysis.
Comparing real-world feedback against RAGAS metrics to validate improvements.

what I'd bring to Anthropic

as a Startups Solutions Architect, Applied AI

Anthropic Application

Founder DNA
Co-founded web agency, closed CAC40 clients (Carrefour, EY). I speak founder language because I've been there.
Builder Credibility
Demo > Deck — Won beta.gouv mission by shipping a working prototype, not a proposal.
Win Technical Evaluations
Global Score 0.60 → 0.83 (+38%). I prove value with data, I don't promise it.
Understand Startup Velocity
7 years at Le Wagon (Paris GM). I know founders need to ship fast, iterate faster.

Founder empathy. Technical depth. Ready to help startups build on Claude.

thank you

I measure, diagnose, build, iterate. And I communicate.
Let's do this together at Anthropic.

LinkedIn GitHub Hugging Face

EF Edouard Foussier · AI Engineer

building a productionrag assistant

the challenge

15,000+ pages

300+ HR managers

sovereignty first

basic rag architecture

100% Sovereign Stack

Built-in Traceability

real-life sprint

Error Taxonomy (92 failures)

Diagnosis (grouped)

key improvements

Semantic Chunking

Parallel Retrieval

LLM Selector

llm selector in action

Source Hierarchy

other useful ideas

Query Processor

Intent Gating

Context Expansion

new rag architecture

metrics measured

next step

Beta Test with 70+ Users

what I'd bring to Anthropic

thank you

building a production
rag assistant