MCP-Compliant Server for Parsing Loan Estimates (LE) and Closing Disclosures (CD)

Share:

Loan Estimate PDF being converted to JSON format and powering tools like chatbots, compliance agents, and dashboards
A Loan Estimate (PDF) parsed into MISMO-compliant JSON, enabling downstream mortgage tools—chatbots, compliance checks, and dashboards.
Loan Estimate PDF being converted to JSON format and powering tools like chatbots, compliance agents, and dashboards

Why the Mortgage Industry Needs a Canonical Adapter for PDF-to-LLM Transformation—And How We Built It

The Problem: Inconsistent, Unstructured, and LLM-Unfriendly LE and CD Data

Every mortgage transaction starts and ends with two critical documents: the Loan Estimate (LE) and Closing Disclosure (CD). These documents hold the fees, the timelines, the APRs, the tolerances—everything a lender, fintech platform, regulator, or analytics engine needs to understand the economics and compliance of a deal. Even if you want to use AI all the way from origination to post-close data analytics, including HMDA reporting and investor delivery, you still need structured data that maps cleanly to both MISMO standards and LLM inputs.

But here’s the problem: the data in these documents is trapped in PDFs. And every time a system needs to extract information from them—whether to pre-fill underwriting systems, feed AI models, or audit compliance—they have to reinvent the wheel.

Some banks have in-house tools. Even those who have it struggle—most still operate with manual steps in the loop, and even the 80th percentile of automation maturity still requires human intervention 20% of the time. But even they could benefit from a self-healing AI layer that brings that down to just ~2% human effort.

Others rely on vendors. Many still do this manually. Worse, every implementation parses fields differently, maps them to slightly different names, and fails silently when layouts change. Despite over a decade of effort, a large portion of the industry remains non-compliant with MISMO standards, especially in automated workflows.

That means:

  • Fintechs spend weeks reverse-engineering LE formats just to get basic fields.

  • LLMs hallucinate because inputs lack structure, field meaning, or domain nuance.

  • The same fee might show up in multiple sections, and there’s no reliable cross-validation for what should appear where. But LLMs now require deterministic input logic, and our structured mapping provides exactly that certainty.

  • Compliance teams miss violations because tolerances or delivery timelines aren’t calculated.

  • No two systems speak the same mortgage language.

This is expensive, risky, and unsustainable.

The Opportunity: A Shared, Open MCP Server for LE and CD Parsing

Imagine a world where anyone—a bank, a bot, a compliance platform—could simply send a PDF URL and get back:

  • A structured JSON that matches the MISMO data standard.

  • LLM-enriched context like field explanations, summaries, and flags.

  • A portable, reviewable codebase that runs on-prem or via a hosted API.

This isn’t just convenient. It’s critical infrastructure.

We believe this canonical transformation layer should live in an open MCP (Model Context Protocol) server dedicated to LE and CD parsing. It’s not trying to replace your models. It’s not trying to own your data. It’s a standardized gateway that transforms messy PDFs into a structured, machine-ready context your systems and LLMs can reason over.

Our Solution: The LE/CD MCP Server

We’ve built an MCP server that:

  1. Parses PDFs of LEs and CDs

  2. Returns MISMO-compliant JSON (e.g., fees.origination_chargesGFEOriginationCharges)

  3. Adds metadata for LLM context, including:

    • Field definitions

    • Flags (“Fee exceeds standard tolerance”)

    • Delivery timeline logic

    • Calculated APR deltas


MCP server converting Loan Estimate PDFs into JSON with data flowing into validation engines, dashboards, and compliance systems

MVP Scope:

Tool

Input

Output

parse_le_to_mismo_json

pdf_url

JSON + LLM context

parse_cd_to_mismo_json

pdf_url

JSON + LLM context

(Later) validate_le_cd_consistency

both JSONs

pass/fail + flags

What Makes This Different?

We’re not trying to parse every document. We’re going deep, not wide. We specialize in LE and CD, because those two documents are the heart of every mortgage transaction.

We don’t replace your LLMs. We supercharge them with enriched, trustworthy inputs. You can use any LLM you want. You can create your own reasoning pipelines. But no matter how good your model is, it still needs clean, structured input. Rather than spending time on brittle mapping logic or letting models hallucinate, we give you standardized, MISMO compliant JSON—built by mortgage pros.

We aren’t asking for your data. We give you the tools. You can run this on-prem. You can inspect every line of code. You can modify it to meet your regulatory, internal, or privacy standards.

This isn’t vendor lock-in. This is industry collaboration. We’re building this out in the open, for the benefit of everyone—so no lender, fintech, or compliance team ever needs to solve this problem again. We believe this should be a shared layer—like HTTP, not another proprietary tool.

Let’s work together. Help us make this the standard the industry deserves.

How We Built It

We started with best-in-class PDF parsers and fine-tuned the pipeline to extract mortgage-specific data blocks. Each block is mapped to MISMO field names, and enriched with metadata

mortgage MCP server architecture

View the full source code on GitHub: confersolutions/mcp-mortgage-server

Why the Industry Needs It Now

Every LLM-powered mortgage assistant. Every pre-underwriting automation. Every compliance validation engine. They all need the same thing: structured, contextual data from LE and CD PDFs.

Even consumer-facing AI apps—something as simple as a chatbot that answers “What fees are listed on my LE?”—need a deterministic, reliable way to extract information from these documents without hallucinations or inconsistencies. Whether it’s fetching fee breakdowns, answering borrower questions, or performing agentic flows like summarizing key terms, the foundation has to be structured, mapped, and consistent.

And for decisioning engines—whether rule-based or powered by AI—the need is even greater. Lenders building systems to decide next-best actions, pricing strategies, or exception handling workflows depend on clean inputs. Without standardized mappings and validation logic, these systems are prone to error.

Not to mention compliance. Critical Control Validation (CCV) tasks, tolerance flagging, and delivery checks must someday be automated too—and they will likely be performed by AI agents. These agents will need a canonical, explainable, and trusted data format to operate at scale.

By standardizing this once and for all, we save every player in the ecosystem hundreds of engineering hours, reduce risk, and eliminate fragmentation. But more importantly, we free up innovation.

Let every team—big bank, nimble fintech, AI startup—focus on building reasoning models, rich workflows, and intelligent assistants. Let them think about how to think. And let this open-source MCP server be the dependable bridge—transparent, interpretable, and trusted—to get them the data they need to do it.


				
					# Example: Parsing LE PDF to JSON
result = parse_le_to_mismo_json(pdf_url="https://example.com/le.pdf")

# Output (early structure, actively expanding for CCV, HMDA, and compliance reasoning)
{
  "GFEOriginationCharges": {
    "value": 2500,
    "description": "Charges by lender for originating the loan",
    "flags": ["Above typical range for 1% origination cap"],
    "tolerance_bucket": "Limited Increase",
    "source_location": "Page 2, Section A"
  },
  "APRDelta": 0.31,
  "DeliveryTimeline": {
    "received_by_borrower": "2024-03-01",
    "days_to_close": 12,
    "compliance_check": "Pass"
  },
  "HMDA": {
    "loan_purpose": "Home purchase",
    "property_type": "Single-family",
    "loan_term": 30
  },
  "CCVChecks": [
    {
      "check": "Origination Fee within Tolerance",
      "status": "Pass",
      "reference_rule": "12 CFR 1026.19"
    },
    {
      "check": "Delivery within Required Timeline",
      "status": "Pass",
      "reference_rule": "TRID Timing Requirements"
    }
  ]
}
				
			

Self-Hosting Sample Setup

				
					# Clone and run MCP server locally
git clone https://github.com/confersolutions/mcp-mortgage-server.git
cd mcp-mortgage-server
pip install -r requirements.txt
python main.py

				
			

Agent Prompting Example with LLM Integration

				
					# Example prompt injection with JSON and enriched metadata
prompt = f"""
You are a mortgage compliance AI assistant.
Given the following structured LE data:
{json.dumps(le_json, indent=2)}
Identify any tolerance violations or unusual fees.
"""
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)

				
			

Validating LE and CD Consistency (Future scope)

				
					# Example: Validating consistency between LE and CD
result = validate_le_cd_consistency(le_json, cd_json)

# Output
{
  "status": "fail",
  "issues": [
    {
      "field": "TotalClosingCosts",
      "le_value": 8234,
      "cd_value": 8910,
      "discrepancy": 676,
      "tolerance_exceeded": true
    }
  ]
}

				
			

Please help shape the future of AI-driven mortgage data infrastructure.

Join the Movement

We’re inviting banks, fintechs, AI companies, and regulators to adopt this as the industry standard. Help us refine it. Help us keep it accurate and up-to-date. Help make the mortgage stack finally LLM-ready.

Explore the GitHub repo: github.com/confersolutions/mcp-mortgage-server

Let’s stop reinventing the wheel. Let’s build the adapter layer the industry needs.