Skip to slides
MA
Magento Association
Developer Documentation Project

Developer Docs
Infrastructure

How we built a three-project pipeline — parser, generation engine, and HTML builder — to produce structured developer docs for Magento's 344 core modules.

344 core modules
9 doc types each
3,096 total documents
TypeScript LangChain Qdrant Handlebars Tailwind CSS Alpine.js
Presenter
Carl Simpson
Date
March 2026
Evolution

How We Built It

From David Lambauer's merchant RAG foundation to a full developer documentation pipeline.

Stage 1 David Lambauer
merchant-mastra
David built the original Mastra AI agent with RAG retrieval for generating merchant documentation — the store owner guides, getting-started content, and task-oriented how-tos. This was the foundation: a TypeScript RAG pipeline that could query Magento knowledge and produce structured Markdown.
Stage 2 Carl Simpson
html-docs-transformer
The merchant docs were raw Markdown — no styling, no structure. I built the HTML docs transformer to retrospectively style every .md file into production HTML. Applies 7 semantic transformations: callout boxes, step cards, breadcrumb chips, auto-generated table of contents. Bauhaus-inspired design system.
Stage 3 Carl Simpson
developer-mastra
Took over the RAG pipeline and refined it. Built developer-mastra — a new engine focused on developer documentation. Added 5-source RAG retrieval, Handlebars templates for consistent structure, and a 3-agent review loop (Reviewer → Qualifier → Updater) that iterates until quality hits 9.5/10. Integrated the HTML transformer directly into the generation pipeline.
Stage 4 Current
the-core + Learning Paths
Built the-core — a source-level parser that maps Magento's entire codebase. XML parsers extract every plugin, observer, and dependency across all 344 modules. This feeds the RAG pipeline with real source data instead of general knowledge. Added learning paths using the Divio framework, with certification tracks planned next.
developer-mastra

The Engine: How It Works

Generation Pipeline
1. the-core parses all XML → module graphs
2. Module Scanner discovers 344 modules + API surface
3. Gap Analysis diffs existing docs vs taxonomy
4. RAG retrieves from 5 sources (see right →)
5. LLM + Handlebars template → Markdown
6. 3-Agent Review Loop → 9.5/10 quality gate
7. html-docs-transformer → production HTML
5-Source RAG Retrieval
Qdrant vector store (semantic similarity)
Local docs from the-core (keyword match)
Magento source code (di.xml, Api/, Model/)
Module dependency graphs (JSON)
External Magento docs (Context7 MCP)
Sources ranked, deduplicated, diversity-bonused. Top 5 chunks sent to LLM.
3-Agent Review Loop
REVIEWER Scores across 5 dimensions: accuracy, completeness, clarity, developer value, structure
QUALIFIER Filters vague or incorrect feedback. Only actionable, technically sound revisions pass.
UPDATER Applies qualified revisions. Rollback if quality drops. Loops until 9.5/10.
9.5
Quality Gate
Source-code-grounded. No hallucination — the LLM sees actual XML configs.
Output Architecture

9 Doc Types + HTML Transformer

Per-Module Documentation Set
01 README & Overview
02 Architecture — schema, contracts, deps
03 Execution Flows — operation sequences
04 Plugins & Observers
05 Integrations — cross-module APIs
06 Anti-Patterns — mistakes + fixes
07 Known Issues & Workarounds
08 Version Compatibility
09 Performance Optimisation
html-docs-transformer
Markdown → Production HTML
Takes raw AI-generated Markdown and transforms it into styled, accessible HTML using Chris's Bauhaus design system.
• Callout boxes (Note, Tip, Warning, Caution)
• Numbered step cards with visual badges
• UI path breadcrumbs (System > Tools > Cache)
• Auto-generated table of contents
• Syntax-highlighted code blocks
• Idempotent — safe to run multiple times
Design System
Bauhaus principles — angular geometry, Magento orange/gold/charcoal palette, WCAG AA accessible, zero build process. TailwindCSS CDN + Alpine.js.
Before
After
Learning Paths + Output

What's Live Today

Module Reference Docs (the-core)
10 Core Modules
Catalog (2,120 components), Sales (861), Customer (499), Checkout, Quote, Payment, Shipping, Store, EAV, ConfigurableProduct
90
Module Docs
4,860+
Components
160+
Total Docs
Developer Guides
6 Tutorials 11 How-Tos 21 Advanced
Docker, Declarative Schema, Plugins, Service Contracts, GraphQL, B2B, Indexers, Message Queues, and more
Divio Documentation Framework
Every document is classified into one of four types — designed for different developer needs:
TUTORIAL Learning-oriented. Step-by-step, hands-on building exercises.
HOW-TO Task-oriented. Solve a specific problem quickly.
EXPLAIN Understanding-oriented. Why things work the way they do.
REFERENCE Information-oriented. API specs, CLI commands, config options.
5 Learning Paths
Beginner → Module fundamentals
Intermediate → Plugins, observers, DI
Advanced → Service contracts, GraphQL
Expert → B2B, multi-store, ERP
Enterprise → Architecture, scaling
carlsimpson.co.uk/magento-association/
Roadmap

Scaling to 90 Priority Modules

13 phases. 90 prioritised modules out of 344. All 9 doc types per module. We need reviewers and domain experts.

Delivery Phases (13 total)
Phase 1
Pipeline + 9 Core Modules
Parsers, templates, design system, review loop — 10 modules live
Phase 2
Content Analysis + Taxonomy
Gap mapping across all 344 modules, priority ranking
Phase 3–4
Tooling + Specialist Agents
PHP introspection, execution flow tracer, domain-specific agents
Phase 5–8
Commerce Modules
Inventory, Tax, CMS, Search, Wishlist, Review, Bundle, Downloadable...
Phase 9–13
Full Coverage + QA
Admin, API, integration modules. Community review. 90 modules total.
Progress 10 / 90 modules (11%)
How You Can Help
Review generated docs for modules you know well
Share domain knowledge — anti-patterns, gotchas, integration details the AI can't know
Prioritise — tell us which modules matter most to your daily work
Adopt — help bring this into the official Magento Association docs
Get In Touch
The Magento community built the platform.
Let's document it — together.