Product / Parse & Extract

Extract Every Debt Tranche from Any Bankruptcy Filing

Upload a disclosure statement or plan of reorganization. TrancheLab parses the PDF, classifies sections, filters noise, and returns a structured capital table with confidence scores on every field. Minutes, not hours.

TrancheLab extraction pipeline: PDF inputs flowing through parsing, classification, extraction, confidence scoring, and deduplication to structured output
297 > ~40Pages filtered before LLM
< 10 minAverage extraction time
0 to 1.0Confidence score on every field
3 formatsPDF, JSON, CSV output

How TrancheLab Extract Works

Step 1:

Upload your filing

Drop a disclosure statement, plan of reorganization, or DIP order. TrancheLab accepts any PDF, including scanned documents. No preprocessing required.

Docs: Supported Filing Types
Upload interface showing drag-and-drop zone with hertz_disclosure_statement_297p.pdf uploaded

Step 2:

PDF parsing with OCR fallback

TrancheLab runs a three-stage parsing chain: PyMuPDF for native text, pdfminer for layout-sensitive extraction, and Tesseract OCR as a fallback for scanned pages. Bad scans do not break the pipeline. Pages with fewer than 80 characters of extracted text automatically trigger OCR.

Docs: PDF Parsing Chain
Three-stage parsing chain: PyMuPDF to pdfminer to Tesseract OCR with decision points

Step 3:

Section classification and pre-filter

Before any LLM call, a deterministic classifier scans every page and tags it: capital structure, classification of claims, recovery analysis, risk factors, legal boilerplate. Only relevant pages pass through. A 297-page filing typically reduces to approximately 40 pages.

Docs: Section Classifier
297 pages filtered down to approximately 40 relevant pages by section classifier

Step 4:

Tranche extraction with confidence scoring

The extraction pipeline identifies every debt tranche and pulls face amounts, outstanding balances, interest rates, maturity dates, seniority rankings, and recovery estimates where disclosed. Every extracted value gets a deterministic confidence score from 0.0 to 1.0. If a value has a raw text excerpt backing it, confidence reflects match quality. If it does not, confidence is forced to 0.

Docs: Confidence Calibration
Extracted tranche table showing debt classes with amounts, rates, maturities, seniority, and confidence scores

Step 5:

Fuzzy deduplication

Levenshtein matching groups tranches that appear under different names across sections or plan amendments. 'First Lien Notes,' 'Existing First Lien Facility,' and 'Prepetition First Lien Credit Agreement' resolve to one entry instead of three. Amounts must be within 5% to merge. When both amounts are missing, name similarity must exceed 90%.

Docs: Deduplication Engine
Before and after deduplication: 8 entries with duplicates merged into 4 clean tranches via Levenshtein fuzzy matching

Step 6:

Structured output

Results export as a sortable data table in the UI, downloadable JSON, or CSV. Every field links back to the source text excerpt from the filing. Click any row to see the exact sentence the value was extracted from, the page number, and the confidence breakdown.

Docs: API Reference
Structured output table with download options for JSON and CSV

Extraction pipeline architecture

Your filing goes through a deterministic pre-filter before any LLM call. Extraction runs on the filtered pages only.

PDF Upload

Parse Chain

PyMuPDF > pdfminer > Tesseract

Section Classifier

deterministic

LLM Extraction

Groq / llama-3.3-70b

Only ~40 of 297 pages reach this stage

Confidence + Dedup

scored 0.0 to 1.0

Levenshtein fuzzy matching on tranche names

Structured Output

Table / JSON / CSV

Supported filing types

Upload any bankruptcy document. More filing types added based on demand.

Disclosure Statements

Plans of Reorganization

DIP Orders

RSA Exhibits

First Day Declarations

Amended Plans

Bar Date Motions

Cash Collateral Orders

Liquidating Plans

TrancheLab vs. doing it yourself

Compare TrancheLab to manual analyst work and existing terminal subscriptions.

CapabilityTrancheLabManual (Analyst)Terminal Subscription
Time to structured output< 10 minutes4 to 6 hoursVaries (if available)
Confidence scoring0.0 to 1.0 per fieldAnalyst judgmentNot offered
OCR for scanned filingsAutomatic fallbackManual retype-Depends on vendor
Deduplication across amendmentsAutomatic (Levenshtein)Manual cross-referenceNot offered
Source text excerptsLinked per value-Analyst notes-Sometimes
CostAPI callAnalyst hourly rate$30K to $50K/year
CoverageAny Chapter 11 filingAny Chapter 11 filingCurated universe only

FAQ

TrancheLab runs a three-stage parsing chain. It tries PyMuPDF first for native text extraction, falls back to pdfminer for layout-sensitive documents, and uses Tesseract OCR as a final fallback for scanned pages. You do not need to preprocess your files.

A confidence score of 0 means TrancheLab could not find a raw text excerpt in the filing to back the extracted value. This can happen when a value is inferred from context rather than stated explicitly. Rather than guess, TrancheLab flags it.

The deduplication engine uses Levenshtein fuzzy matching to group tranches that appear under slightly different names across sections or plan amendments. You get one clean entry per tranche, not three near-duplicates.

Yes. The diff engine accepts two filings and highlights changes in tranche definitions, recovery estimates, and creditor class treatments between versions.

Any Chapter 11 bankruptcy filing in PDF format: disclosure statements, plans of reorganization, DIP motions, RSA exhibits, first day declarations, and amended plans.

Yes. Subscribe to a case via the API and TrancheLab will notify you when new docket entries appear via CourtListener integration.

Start Extracting Capital Structure Data

See how TrancheLab turns hundreds of pages into a structured tranche table with confidence scores, in minutes.

Book a Demo