Government data / e-filing2025LEARN Québec · CRA Portal
CRA e-filing portal — Excel-to-XML pipeline with XSD validation, PDF prefill, and multi-year support
An internal T4/T4A e-filing preparation portal for LEARN Québec — converts payroll Excel workbooks into CRA-compliant XML, validates the output against 217 official XSD schemas offline, prefills CRA fillable PDF slips with a JSON-mapped field engine, and supports T4 and T4A across 2025 and 2026 submission years.
The problem
Preparing CRA T4/T4A XML submissions was a manual, error-prone process with no systematic validation against CRA schema rules and no way to pre-check workbooks before generation. The team needed a governed portal that caught formatting and compliance errors before submission.
Approach
- Excel workbook parsing pipeline (openpyxl) — reads T619 transmitter data, employer summary, and individual slip sheets from structured XLSX files; preflight AJAX validation identifies missing sheets, keys, and columns before generation runs
- lxml-based programmatic XML construction with dotted-path column mapping (e.g. `T4A_AMT.pens_spran_amt` → nested elements), automatic 2-decimal amount formatting, NR4 validation, and CRA-required country code normalization
- XSD validation against 217 official CRA schema files (T4, T4A, T5, T5008, T5013, T4RIF, T4RSP, T4FHSA, T4E, and 12+ other return types) with a custom local schema resolver — fully offline, no network calls
- PDF prefill engine (pypdf) — fills official CRA fillable PDFs from generated XML using JSON field mapping files; supports multi-field mapping, per-slip index selection, all-slips merge mode, and configurable field redaction
- Multi-year, multi-mode operation — T4 and T4A for 2025 and 2026, with mode-aware sheet detection, per-year schema routing, and official Excel template downloads per year and mode
- Schema documentation and fields guide — scrapes and caches CRA documentation pages (24-hour TTL) to display every schema field with occurrence rules (required, optional, repeating), human-readable labels, and presence status against the uploaded workbook
- File management with UUID-based tracking — lists the 10 most recent files with timestamps, per-file and bulk deletion, and persistent validation sidecars that survive page refresh
- Session-signed authentication with HMAC-SHA256, bootstrap admin via environment variables for first-run setup, and XXE protection via disabled XML entity resolution
Outcome
- T4/T4A XML generation and XSD validation reduced from a manual multi-step process to a single workbook upload with preflight checks surfacing every error before submission
- Offline schema validation against 217 CRA files means the portal catches formatting, required-field, and structural errors without network access or manual cross-referencing
- PDF prefill eliminates manual data re-entry into CRA slip forms — generated XML maps directly to fillable PDF fields in one operation
Got a project that's been waiting too long?
We respond to every inquiry within one business day. No funnels — just a real conversation about whether we're a fit.