DEEP VISA LABS · METHODOLOGY
Methodology
Every number on the site comes from a DOL, USCIS, travel.state.gov, or IRS publication. This page documents which dataset, which formula, what cadence, and which edge cases we surface or hide.
Principles
- Federal sources first. If a question can be answered by a DOL, USCIS, travel.state.gov, or IRS publication, that is the answer we publish. Lawyer-blog summaries and forum threads are not substitutes.
- Reproducible math. Every computed number on the site corresponds to a deterministic ETL pipeline reading from a versioned federal dataset. The same inputs always produce the same outputs.
- Transparent cadence. Each page carries a "Last verified: YYYY-MM-DD" stamp synced with the ETL build that produced its numbers. If the source's publication cadence has elapsed without an update, we flag it.
- Edge cases surfaced, not smoothed. When BLS top-codes a wage or USCIS withholds a priority date, we report it as a structural data limitation rather than imputing a value.
- No predictions about individual cases. We compute distributions, medians, and queue lengths from federal data. We do not estimate the approval probability of any specific case.
1. Prevailing wage (DOL OFLC)
Source: DOL OFLC quarterly Prevailing Wage Determinations CSV plus quarterly LCA disclosure CSV from dol.gov/agencies/eta/foreign-labor/performance and the H-1B / PERM disclosure portal at flag.dol.gov. Quarter-on-quarter releases land in January, April, July, and October.
Pipeline: etl/fetch_dol_oflc.py downloads the quarterly CSVs, validates row counts against DOL's published case-disclosure totals, normalizes SOC codes (legacy 6-digit → SOC 2018), and writes per-SOC × per-MSA × per-wage-level lookups to a SQLite database.
Wage-level mapping: Levels 1-4 map to BLS OEWS percentiles 17 / 34 / 50 / 67 per the H-1B Visa Reform Act of 2004 — applied uniformly across SOC codes. Where the OEWS sample size for an MSA × SOC cell is below the BLS reliability threshold, DOL publishes a NULL wage and we surface that NULL rather than imputing.
Top-coding: BLS top-codes high-wage SOC codes (executive, physician sub-categories) at the OEWS 90th-percentile reporting cap. Pages displaying P90 or P95 wages explicitly note the top-coded ceiling rather than implying a known value.
2. PERM processing time (DOL OFLC PERM disclosure)
Source: DOL OFLC PERM disclosure CSV (case received date and final determination date for every adjudicated case in the quarter). The DOL FLAG dashboard at flag.dol.gov/processingtimes publishes the headline processing-time numbers monthly; we recompute from the underlying disclosure CSV for finer cuts (audited vs unaudited, BALCA appeals, supervised recruitment).
Computation: running median of (final determination date − case received date) by determination month for the 12 most recent published months. We separate audited from unaudited cases via the case-status field and report distributions for each.
Known limitation: the PERM disclosure CSV does not include cases still pending at the publication date. As a result, the published median is biased toward the faster cohort within each cohort year. We surface this caveat on every PERM page.
3. Priority dates (travel.state.gov visa bulletin)
Source: travel.state.gov monthly visa bulletin HTML. Cadence is monthly (typically published mid-month for the following month).
Pipeline: etl/fetch_visa_bulletin.py scrapes the "Final Action Dates" and "Dates for Filing" tables for each preference category × chargeability area, validates against the prior month's bulletin (no nonsensical century-scale jumps), and writes per-category trend lines back to the earliest digitized bulletin (FY1990 onward).
Retrogression handling: when a cut-off moves backward, we report the new cut-off and the magnitude of retrogression. We do not predict future retrogression — the State Department's monthly Charles Oppenheim "Chats" memo is the only authoritative forward-looking source.
"Current" vs "Unavailable": "C" (current) means visa numbers are available; "U" (unavailable) means none are available regardless of priority date. We surface both cases distinctly rather than collapsing into a single "open / closed" binary.
4. USCIS processing times
Source: egov.uscis.gov/processing-times/ per-form per-service-center matrices. Synced monthly.
Reading convention: USCIS publishes the 80th-percentile processing time — the time within which 80% of cases were adjudicated. We do not interpolate, smooth, or report median — only the published 80th percentile, because that is the metric USCIS uses for its public service standards.
Per-form, per-service-center: each I-129, I-140, I-485, I-539, I-765, I-907 row carries a service-center column (CSC, NSC, TSC, VSC, NBC, Potomac). When USCIS reroutes filings between service centers — as happens periodically — we annotate the page rather than treating the new center's number as historically continuous.
5. Substantial Presence Test (IRS Pub 519)
Source: IRS Publication 519: U.S. Tax Guide for Aliens, current edition.
Computation: exactly as published in Pub 519 §1 — full days in the current year + ⅓ of days in the prior year + ⅙ of days in the year before that, with the 31-day current-year minimum.
Exempt individuals: we surface F / J / M / Q exempt-individual rules (5 calendar years for F-1 / J-1 students; 2 of 6 years for J-1 teachers / Q-1 trainees) per the Pub 519 schedule. The First-Year Choice election under IRC §7701(b)(4) and the Closer Connection exception under §7701(b)(3)(B) are also documented.
Federal vs state: the Substantial Presence Test governs federal residency only. State residency is determined separately by each state's domicile and statutory-residency rules — we cite the federal test, then explicitly note the state-test divergence.
6. LCA wage data (DOL FLAG H-1B disclosure)
Source: DOL FLAG H-1B / H-1B1 / E-3 disclosure CSV — every certified Labor Condition Application with employer, SOC code, worksite, wage offered, and wage level.
What we compute: per-employer wage distributions (P10, P25, P50, P75, P90), wage-level rollups by occupation, and worksite-level prevailing-wage benchmarks. We treat the LCA wage as the legal floor — the binding minimum the employer attested to pay.
What we don't compute: "actual paid wages." LCA wages are commitments, not realized compensation. We do not blend LCA data with self-reported sources (Glassdoor, levels.fyi) — selection bias in voluntary salary platforms is severe and undocumented.
Validation
- Row-count parity: after each ETL run, total ingested rows must match the DOL / USCIS published row count for the period. Failures block the build.
- Trend continuity: month-on-month changes that exceed historical variance bounds (3σ) are flagged for manual review before publishing.
- Source URL HEAD checks: every cited source URL is HEAD-checked at build time. 404s block the build.
- SOC code mapping: SOC 2010 → SOC 2018 transitions for any rows where the federal source uses legacy codes are documented in the ETL log.
What we don't do
- We don't predict approval probability for individual cases.
- We don't aggregate self-reported H-1B salary data — we cite DOL LCA filings, which are the legal floor.
- We don't combine government data with lawyer-blog estimates without naming the lawyer-blog source.
- We don't run scaled-content matrices over state × SOC × employer for SEO purposes — every page corresponds to a federal-data question with a Semrush-validated search query.
- We don't impute missing values. NULLs in the federal source are surfaced as NULL, not estimated.
Last sync timestamps
Each data-driven page surfaces a "Last verified: YYYY-MM-DD" stamp synced with the ETL build that produced its numbers. The stamp reflects the date the underlying federal source was last successfully ingested, not just the page's last edit. If a stamp is older than the source's published cadence, we flag it explicitly with a "Pending refresh — federal source publishes [cadence]" notice.
Corrections policy
If a number on a page disagrees with the underlying DOL / USCIS / IRS source, we treat the source as authoritative and update the page within 5 business days of confirmed report. See Corrections for the full procedure and Editorial Policy for the broader review framework.