Regulation5 min read5 June 2026

IG Report Blames NIST for NVD Backlog — Severity Scores Match Only 12% of the Time

A Commerce Department watchdog formally faulted NIST for strategic failures, duplicated enrichment work, and CVSS scores so inconsistent that independent evaluators agreed with them barely one time in eight.

Priya NatarajanCompliance & Risk Analyst A wide-angle photoreal editorial scene inside a dimly lit federal government server room: rows of rackmounted servers wi

A wide-angle photoreal editorial scene inside a dimly lit federal government server room: rows of rackmounted servers wi

The National Institute of Standards and Technology has been formally blamed by the Commerce Department's Office of Inspector General for mismanaging the National Vulnerability Database — the federal backbone that security teams across the globe use to prioritize patching.

What the IG Found

The OIG report lands hard on three failures: poor strategic planning, an active refusal to coordinate with CISA, and enrichment workflows that have wasted an estimated $200,000 since May 2024. NIST's Acting Director Craig Burkhardt accepted the technical recommendations but disputed the report's framing, arguing the draft's language went beyond what he called "objective, factual evaluation."

The coordination breakdown is where the damage becomes concrete. CISA launched its Vulnrichment program in May 2024 and extended NIST an invitation to co-sign a joint public statement. NIST declined. For roughly two years before that, CISA had already been independently generating nearly all the same enrichment data NIST was producing. NIST refused to ingest it. The reason? NVD's internal systems couldn't attribute data to its actual source — a technical limitation that wasn't resolved until March 2025. The IG's conclusion is blunt: two federal agencies were doing the same work in parallel because neither could agree on a metadata credit line.

Fix the efficiency problem, the OIG argues, and NIST could redirect approximately $800,000 toward better uses over the next two years.

The Scoring Problem Is Worse Than the Backlog

Severity scoring draws the sharpest fire. NIST uses CVSS — the Common Vulnerability Scoring System — to assign numerical risk ratings that IT and security teams treat as ground truth when deciding what to patch first. OIG conducted its own testing. Independent evaluators matched NIST's scores just 12% of the time.

Jeff Williams, CTO at Contrast Security, called the implication unavoidable: the metric teams use to prioritize remediation "is barely better than guessing." That is not a minor calibration issue. It means a critical patch that genuinely demands immediate attention may be sitting in a queue behind dozens of lower-risk items because a scoring calculation went the other way.

For security teams using NVD data to drive their patch management programs — and most do — this inconsistency is a quiet, compounding liability. Organizations that train staff to critically evaluate threat intelligence rather than blindly trust external scores are far better positioned to catch these gaps before they become exploitable windows.

The Funding Equation Nobody Wants to Say Out Loud

Williams didn't stop at scoring. He noted that CISA previously covered close to half of NVD's funding before stepping back, and NIST's lab budget absorbed further cuts on top of that. "You can't pull that kind of money out of something this important and then act surprised when it breaks," he said.

Braden Perry, a regulatory attorney at Kennyhertz Perry, rejected NIST's statutory defense directly. The federal mandate NIST cites applies specifically to severity metrics for open-source software vulnerabilities — not all CVEs, and not necessarily via CVSS. Perry's read: "The mandate is narrow and the practice is broad." NIST was not legally required to recalculate scores that vendors or CISA had already produced. That was a policy decision, not a statutory one.

GenAI Just Made the Volume Problem Permanent

Underneath the bureaucratic friction sits a structural challenge the IG report doesn't fully reckon with. Generative AI tooling has sharply accelerated vulnerability discovery over the past two years. Researchers using AI-assisted analysis are surfacing flaws faster than any manual enrichment workflow can absorb. The NVD backlog isn't just a management problem. It's a capacity problem that will not resolve itself even if NIST executes every recommendation perfectly.

Williams argues the automation priority inside security programs has been inverted from the start. Scanning and ticket generation got automated early. Threat modeling and architectural review remain manual tasks, performed by a small number of senior practitioners under constant pressure. "We automated the wrong half," he said. Vulnerability records pile up in NVD. Analysts fall behind. Teams deprioritize patches based on scores that are inconsistent 88% of the time.

Verizon's 2024 Data Breach Investigations Report found that exploitation of vulnerabilities as an initial access method grew 180% year over year. The pipeline feeding defenders' decisions about which vulnerabilities to fix first is the NVD. That pipeline is backed up, underfunded, and producing scores that don't hold up to independent scrutiny.

What Controls Actually Failed Here

This isn't a breach with a clear patient-zero moment. It's an institutional failure across several layers that defenders should study carefully.

First, inter-agency coordination collapsed because of a technical limitation — source attribution — that should have been a sprint-level fix, not a multi-year stalemate. When two government programs duplicate effort for two years over a metadata problem, the real failure is governance: no shared accountability model, no escalation path, no joint SLA.

Second, the CVSS inconsistency exposes a broader industry habit: treating automated scores as authoritative without validation. CVSS was designed as a starting framework, not a final verdict. The NIST NVD documentation itself notes that base scores don't account for environmental or temporal factors. Security teams that accept scores uncritically and let them drive patch sequencing are operating on a foundation the OIG just demonstrated is unreliable.

Third, budget cuts to critical infrastructure — a national vulnerability database qualifies — without a corresponding reduction in scope or a compensating automation investment is a recognized failure pattern. The Verizon DBIR and every major risk framework, including NIST's own SP 800-53, treat resource adequacy as a control requirement, not an aspiration.

Security awareness training plays a specific role here: analysts, developers, and even procurement teams need to understand that published severity scores are inputs, not answers. An analyst who knows CVSS limitations will apply environmental context. One who doesn't will patch the wrong things — or nothing at all.

What Comes Next

NIST agreed with the OIG's recommendations. Burkhardt's formal acceptance is on record. Whether agreement translates into execution — faster enrichment, genuine CISA coordination, a rethought scoring review process — is the open question. The backlog existed before the IG report. It will exist after, unless structural changes follow the words.

Organizations cannot wait for federal databases to catch up. Patch prioritization programs need secondary validation layers: vendor advisories, CISA's Known Exploited Vulnerabilities catalog, and internal threat modeling that doesn't depend solely on NVD base scores. If your team's remediation workflow begins and ends with a CVSS number, the IG report just explained why that's a risk. A good place to start rebuilding that workflow is understanding what a mature vulnerability management standard actually requires.

How your team can stop depending on scores that may be wrong

Train analysts to apply environmental and threat context to CVSS scores rather than accepting base ratings as final prioritization decisions.
Build a multi-source vulnerability intake process that includes CISA's KEV catalog, vendor advisories, and internal risk modeling alongside NVD data.
Run tabletop exercises that test your team's patch prioritization workflow against scenarios where the published severity score is incorrect or delayed.

Train2Secure's security-awareness programs help technical and non-technical staff understand how vulnerability data works — and where it can fail them.

Start free — no card required

Sources & further reading

Frequently asked questions

Why is the NVD backlog dangerous for my organization?

The NVD is the primary database most patch management tools and security teams use to score and prioritize vulnerabilities. When entries are delayed or scored inconsistently, teams may deprioritize critical flaws or waste effort on lower-risk ones — creating exploitable windows attackers can reach before defenders do.

Is CVSS still reliable for prioritizing patches?

CVSS base scores are a starting point, not a final answer. The OIG's independent testing found evaluators matched NIST's scores only 12% of the time. NIST's own documentation notes that base scores don't account for environmental or temporal factors. Teams should cross-reference CVSS with CISA's Known Exploited Vulnerabilities catalog and vendor-specific advisories.

What is CISA's Vulnrichment program and how does it relate to NVD?

CISA launched Vulnrichment in May 2024 to add enrichment data — CVSS scores, CWE classifications, CPE strings — to CVE records independently. For roughly two years prior, CISA and NIST were producing nearly identical enrichment data in parallel. A technical limitation inside NVD's systems prevented NIST from ingesting CISA's work, a problem not resolved until March 2025.

What should security teams do while the NVD backlog persists?

Don't rely solely on NVD data. Use CISA's KEV catalog as a prioritization layer, monitor vendor security advisories directly, apply environmental context to any CVSS score you receive, and build internal threat modeling into your remediation process rather than treating published scores as authoritative.