Anthropic Application Startups Solutions Architect, Applied AI

building a production
rag assistant

a sovereign HR assistant for french public sector teams

EF Edouard Foussier · AI Engineer

the challenge

context and constraints

Anthropic Application

15,000+ pages

HR regulations scattered across 4 sources: Légifrance API, service-public.gouv.fr XML, ministry PDFs, Excel spreadsheets.

300+ HR managers

Spending 4+ hours per week searching for answers in dense legal documentation. Need instant, reliable answers.

sovereignty first

No external APIs. All inference through Albert API — France's sovereign cloud (SecNumCloud certified).

Building an assistant to help French public sector HR teams navigate complex employment regulations.

Goal: Turn hours of searching into seconds of answers.

basic rag architecture

initial baseline pipeline (v1)

Anthropic Application
Query User question
Embedding Query embedding
Retrieval Union of 4 tables
Reranking BGE reranker
Generation Albert API

100% Sovereign Stack

  • Albert API for LLM, embeddings, reranking
  • On-premise PostgreSQL + pgvector
  • No data leaves French infrastructure

Built-in Traceability

  • Every answer cites source documents
  • Clickable links to original legal texts
  • Full audit trail for compliance

real-life sprint

4-hour session with 13 HR managers — 508 questions, I analyzed every failure

Anthropic Application

User satisfaction from 347 evaluations

52% Happy (4-5★)
22% Neutral (3★)
26% Failed (1-2★)

Error Taxonomy (92 failures)

48% Missing info in KB
11% Wrong chunks
11% Bad synthesis
9% Unclear question
6% Hallucination

Diagnosis (grouped)

Retrieval issues 59%
Generation issues 33%
59% of failures came from retrieval. Instead of tuning prompts blindly,
I rebuilt the architecture from scratch. → This became my roadmap.

key improvements

3 architectural changes that moved the needle

Anthropic Application

Semantic Chunking

Structure-aware parser preserving legal article boundaries.

1 article = 1 chunk

Markdown formatting for hierarchy.

Parallel Retrieval

4 concurrent searches across different sources.

Parameters tuned per source

All sources fairly represented.

LLM Selector

Dedicated agent with priority rules filters chunks.

20 → 2.5 chunks on average

Reasoning exposed → explainable selections.

Trade-off: adds processing time, but improves precision and user trust.

llm selector in action

a dedicated filtering agent beats self-filtering

Anthropic Application
input: 20 chunks from 4 tables

Source Hierarchy

[MINISTRY] → P1 Ministry PDFs
[GOV_PORTAL] → P2 Gov website
[LEGAL] → P3 Legal texts
Rules:
1. Prefer practical guides
2. Eliminate off-topic docs
3. Order by relevance
4. Keep ~5 docs max
output: structured json
{
  "selected_ordered": ["MINISTRY-3", "MINISTRY-1", "GOV_PORTAL-2"],
  "dropped": ["LEGAL-1", "LEGAL-2", "GOV_PORTAL-1", ...],
  "primary_source": "ministry",
  "reasoning": "Question about leave policy → ministry guides sufficient. Legal texts too generic."
}
Result: LLM Selector reduces context from 20 chunks → 2.5 curated chunks on average.
Generator receives only high-quality, relevant information.

other useful ideas

additional modules in the system

Anthropic Application

Query Processor

Acronym expansion
CDD → Fixed-term contract

Handles French HR jargon.

Intent Gating

Filters out-of-scope questions
before retrieval even starts.

Saves compute, avoids hallucination.

Context Expansion

Legal reference extraction
Detects "Article L3141-1" in chunks.

Auto-fetches full text from law API.

These modules address edge cases and improve robustness.

new rag architecture

runtime pipeline (v2): query to answer

Anthropic Application
Query Processor Acronym + Intent
Parallel Retrieval 4 tables
LLM Selector 20 → 2.5 chunks
Context Expansion Legal references
Generation Albert API
New modules Standard
4 new modules tackle the 59% retrieval failures.
Same sovereign stack. Same traceability. Better answers.

metrics measured

V1 vs V2 comparison with RAGAS

Anthropic Application
Metric V1 Baseline V2 Production Improvement
Faithfulness 0.45 0.79 +78%
Answer Relevancy 0.53 0.81 +53%
Context Relevancy 0.83 0.89 +7%
Global Score 0.60 0.83 +38%
Evaluation framework: RAGAS (industry-standard)
Test set: 347 real user questions from November 2025 sprint
Key insight: Biggest gain in Faithfulness (+78%) — LLM Selector eliminates hallucinations

next step

beta testing in production

Anthropic Application
Started January 8, 2026 · 1 month

Beta Test with 70+ Users

HR managers across French public administration are now using V2 in production

Goal: Measure user satisfaction on V2 architecture to complete our analysis.
Comparing real-world feedback against RAGAS metrics to validate improvements.

what I'd bring to Anthropic

as a Startups Solutions Architect, Applied AI

Anthropic Application
  • Founder DNA
    Co-founded web agency, closed CAC40 clients (Carrefour, EY). I speak founder language because I've been there.
  • Builder Credibility
    Demo > Deck — Won beta.gouv mission by shipping a working prototype, not a proposal.
  • Win Technical Evaluations
    Global Score 0.60 → 0.83 (+38%). I prove value with data, I don't promise it.
  • Understand Startup Velocity
    7 years at Le Wagon (Paris GM). I know founders need to ship fast, iterate faster.
Founder empathy. Technical depth. Ready to help startups build on Claude.

thank you

I measure, diagnose, build, iterate. And I communicate.
Let's do this together at Anthropic.

EF Edouard Foussier · AI Engineer