Skip to main content
    Foresera
    ← Resources
    March 2026 · 6 min read

    If You Can't Trace Where a Document Came From, You Can't Prove What's Been Fixed

    M

    Matt

    Founder, Foresera

    There's a particular problem that surfaces when organizations begin remediating documents at scale. It starts with a simple question: "Do we already have this one?"

    The honest answer, for most organizations, is that they don't know. Documents accumulate across years and departments. The same policy brief gets updated and re-uploaded under a different filename. A CMS migration strips the original URLs. A PDF that was remediated last quarter gets replaced by a newer version from the source department, and no one catches it because nothing flagged the change.

    Remediation without provenance tracking means you can do significant work and still have no reliable way to demonstrate what was fixed, whether it stayed fixed, or which version of a document the public is currently accessing.

    The hashed filename problem

    Many enterprise CMS platforms and content delivery pipelines generate hashed filenames when documents are uploaded or published. A file that an author names annual-report-2025.pdf becomes a3f8c2d1e9b74f1d.pdf at the CDN layer. This is common in Drupal installations, SharePoint integrations, and a number of civic technology platforms.

    Hashed filenames are fine for caching purposes. They're a significant problem for accessibility tracking. When you download a document for remediation, the filename tells you nothing about where the document lives on the public-facing site, which page links to it, or how it relates to other versions of the same document. Two files with identical names in a remediation batch may be completely different documents. Two documents from different departments may be identical — remediate one, and you've remediated both. Without provenance, you can't tell.

    This problem compounds at scale. An organization with several thousand documents in its remediation queue that can't connect each file to its source URL is essentially working without a map. Work gets duplicated. Some documents get missed. And there's no clean answer when someone asks for documentation of what was addressed.

    What a proper audit trail looks like

    Document provenance, in the context of accessibility remediation, means connecting each document to its origin and tracking its state over time. A complete record for a single document includes:

    • Source URL — the canonical link on your public-facing site where the document is accessible
    • Page context — the breadcrumb path and page title of the page that links to it (e.g., Services > Housing > Rental Assistance > Program Guidelines)
    • Content hash — a SHA-256 fingerprint of the file at the time of discovery, used to detect when a document has changed after remediation
    • Version chain — a record of each remediation pass: what the document's accessibility score was before, what corrections were applied, and what the score became after
    • Remediation timestamp — when each version was processed, so you can demonstrate that a document was in a known state on a specific date

    Together these form the evidence layer that connects your remediation effort to real, specific documents on your public site — not just files in a folder.

    Why this matters under the DOJ Final Rule

    The DOJ's implementing regulations for Title II of the ADA (28 CFR Part 35, effective April 2026 for large entities) require covered organizations to make their digital content accessible. The regulation doesn't prescribe a documentation format, but the concept of "systematic effort" is central to how compliance is evaluated. An organization that can demonstrate a structured, traceable remediation process — here are our documents, here are the ones we've addressed, here is the version history, here is what changed — is in a fundamentally different position than one that remediated files in a folder with no record of what was done.

    This isn't about preparing for litigation. It's about having an accurate picture of your own program. Organizations that track document provenance find it much easier to answer operational questions that come up routinely: "Has this form been updated since we remediated it?" "Do we have a record of what this document looked like before?" "We changed our template — which documents were affected?" Without provenance tracking, these questions require manual investigation every time.

    The re-upload problem

    One of the most common ways remediated documents become inaccessible again is simple: someone re-uploads the original, unmediated version.

    This happens because accessibility remediation typically happens outside the normal document workflow. A document is exported, remediated, and re-published. But the original source file — the Word document, the InDesign export, the template output — still exists in the organization's systems. When that document is updated for content reasons and re-exported, the new export goes through the normal publication process without automatically going through remediation. The remediated version is replaced by a new inaccessible version, and nothing flags the change.

    Content hashing addresses this directly. If each document has a known hash at the time of remediation, any subsequent version can be compared against it. A hash mismatch means the document has changed and needs to be re-evaluated. This makes it possible to detect re-upload events systematically, rather than discovering them by accident months later.

    Starting with a crawler

    For organizations that don't have a current inventory of their public-facing documents, the starting point is a crawl. A web crawler that traverses your public site — following links, recording the page context around each PDF it finds, capturing the breadcrumb path — produces the raw material for a provenance system. It transforms an unstructured collection of files into a map: document, source page, breadcrumb path, depth in the site hierarchy, and the links that lead to it.

    This is the foundation that makes everything else tractable. With a complete inventory connected to source locations, prioritization becomes possible: which documents are most frequently accessed, which sections of the site have the highest concentration of accessibility failures, which document types account for the most issues. Without it, remediation tends to proceed in whatever order files happen to be encountered — which is rarely the order that produces the most benefit.

    Foresera's document provenance system tracks source URLs, breadcrumb paths, content hashes, and version chains through the full remediation lifecycle. Documents that change after remediation are flagged automatically. Every version of every document has a complete record of what was done and when. For organizations managing large document inventories, this is the infrastructure that makes a remediation program sustainable rather than a one-time effort.

    Are you an early adopter?

    We're looking for compliance teams, IT directors, and accessibility leads who want to shape the product.

    Get Early Access

    Do you use assistive technology?

    Your feedback directly improves how we remediate documents for people who rely on screen readers, magnifiers, and braille displays.

    Share Your Experience
    Compliance Analysis

    See what we find

    Run a full WCAG compliance audit on your documents. Upload a PDF and get results in minutes.

    112
    WCAG Issues
    118
    Auto Fixes
    15
    Doc Templates

    By submitting, you agree to receive email communications from Foresera regarding your results. Privacy Policy