adverse event Any undesirable medical occurrence following the use of a medical product — reported into systems like VAERS, but not by itself proof that the product caused the event.
An adverse event is any undesirable medical occurrence — a symptom, a diagnosis, a hospitalization, a death — that happens after someone takes a medical product. In safety surveillance, the term is deliberately broad: an event qualifies as “adverse” simply because it is unwanted and happened in proximity to product use, not because anyone has yet decided whether the product caused it.
That distinction matters. A report of an adverse event in VAERS does not mean the vaccine caused the event. It is a starting point for investigation, not a conclusion. Pharmacovigilance systems collect adverse-event reports in bulk so that statistical methods like EB data mining can look for patterns — combinations of product and event that show up more often than chance would predict. Those patterns are what analysts call “safety signals.”
Bates stamp A unique identifier stamped onto each page of a document production so that any page can be cited unambiguously, e.g. "PSI-HHS-000008257443".
A Bates stamp (or Bates number, or Bates ID) is a sequential identifier applied to each page of a document during a legal or congressional production. The practice gets its name from the Bates Manufacturing Company, which made the original mechanical numbering devices. Today the stamping is digital, but the purpose is the same: every page gets a unique label so that any reader, anywhere, can point to exactly the same page.
In this report, the Subcommittee applied Bates stamps with the prefix “PSI-HHS-” to records produced by HHS. Footnotes throughout the report cite specific Bates IDs (for example, “PSI-HHS-000008257443-44”) so that any quoted document can be traced back to its source.
bivalent booster A COVID-19 booster shot designed to target two strains of the virus at once — the original strain and an Omicron variant — authorized by the FDA in 2022.
A “bivalent” vaccine is one that targets two distinct antigens. The bivalent COVID-19 boosters authorized in 2022 combined the original SARS-CoV-2 spike-protein design used in the first mRNA vaccines with an updated component matching one of the Omicron subvariants then in circulation. Pfizer-BioNTech and Moderna both produced bivalent boosters under FDA Emergency Use Authorization.
The report mentions the bivalent booster specifically in connection with safety signals: “statistically significant safety signals appeared for ischemic stroke in individuals 65 years and older following injection of the Pfizer-BioNTech bivalent booster in February and March 2023.”
CBER The FDA's Center for Biologics Evaluation and Research — the unit responsible for vaccine regulation and, during the pandemic, COVID-19 vaccine safety surveillance.
The Center for Biologics Evaluation and Research, or CBER, is the FDA center responsible for regulating biological products, including vaccines, blood products, and gene therapies. As the report explains, CBER was “the unit responsible for COVID-19 vaccine safety surveillance” during the pandemic. Its director throughout much of the relevant period was Dr. Peter Marks.
The Office of Biostatistics and Pharmacovigilance (OBPV), which appears throughout the timeline, is housed within CBER. Many of the FDA officials quoted in the report — Dr. Steven Anderson, Dr. Bethany Baer, Dr. Richard Forshee, Dr. David Menschik, Dr. Narayan Nair, Dr. Manette Niu, Dr. Craig Zinderman, and Dr. Sarah Walinsky — worked at CBER.
CDC The Centers for Disease Control and Prevention — the federal public-health agency that co-manages VAERS with the FDA and runs the Immunization Safety Office.
The Centers for Disease Control and Prevention is a federal agency under HHS whose mission spans disease surveillance, outbreak response, and immunization safety. CDC co-manages the Vaccine Adverse Event Reporting System (VAERS) with the FDA, and runs the Immunization Safety Office, which monitors the safety of vaccines after they are licensed.
Several CDC officials appear in the report’s record, including Dr. Tom Shimabukuro and Dr. John Su of the Immunization Safety Office, and former CDC Director Dr. Rochelle Walensky. CDC officials publicly described EB data mining as the “gold standard” for disproportionality analysis, while internal emails reflect their reliance on FDA’s data-mining outputs.
CDER The FDA's Center for Drug Evaluation and Research — the unit responsible for most drugs (rather than vaccines) and the home of Dr. Ana Szarfman.
The Center for Drug Evaluation and Research, or CDER, is the FDA center responsible for most prescription and over-the-counter drugs. Although vaccines are regulated by CBER, CDER plays an important role in this report because Dr. Ana Szarfman — the FDA medical officer who flagged the masking limitation in the COVID-19 vaccine analyses — was a CDER employee.
Dr. Patrizia Cavazzoni was Director of CDER during the relevant period, and Dr. Peter Stein served as Director of CDER’s Office of New Drugs. Dr. Norman Stockbridge directed CDER’s Division of Cardiology and Nephrology, where Dr. Szarfman worked.
confounding When an outside factor — like age or calendar year — distorts an apparent link between a product and an adverse event, producing false positives or hiding real effects.
Confounding happens when something other than the product itself explains the relationship between the product and an adverse event. If older patients are more likely to receive a particular vaccine and also more likely to have heart attacks, the apparent association between the vaccine and heart attacks may simply reflect the age of the people who took it. Age, in that case, is the confounder.
In data-mining safety surveillance, common confounders include calendar year (because reporting volume changes over time), patient age, and exposure patterns that differ across products. RGPS, the algorithm Dr. Szarfman advocated for, “automatically identifies and corrects for confounders,” in her words — by contrast, MGPS requires analysts to handle confounding through stratification or other manual adjustments.
Confounding can produce false positives (apparent signals that vanish once you adjust for the lurking variable) or false negatives (real effects that disappear into the noise).
disproportionality An excess of reports for a particular product-event pair compared to other products in the same database — the basic statistical concept that drives signal detection.
Disproportionality is the core idea behind data-mining safety surveillance: a particular adverse event is reported much more often for one product than for the rest of the products in the database. If a heart-attack report shows up after Vaccine A at ten times the rate it appears for all other vaccines combined, that is a disproportionate reporting pattern worth a closer look.
A disproportionality analysis is the formal version of this comparison. The report quotes HHS describing EB data mining as a “statistical method for identifying disproportionality (excess of reported [adverse events]) for [a] product relative to other products) in large database[s].” Disproportionality alone does not prove that a product caused an event — it only flags a statistical association that warrants further investigation.
Disproportionality analyses are vulnerable to masking: if many products in the database produce similar reports, the “excess” for any one of them can be diluted, and a real signal can fail to cross the alert threshold.
E The expected count — how many reports of a given product+event pair you'd see under the null hypothesis (no association), based on marginal totals.
E is the expected count for a product+event cell of the disproportionality contingency table — the number of reports you would expect under the null hypothesis that the product and the event are statistically independent. It is computed from the marginal row and column totals: roughly, (reports for this product) times (reports for this event) divided by (total reports). The ratio of observed to expected, N / E, is the relative reporting ratio (RR), which Empirical Bayes algorithms like MGPS and RGPS then shrink toward the group average to produce more reliable estimates such as EBGM and ERAM.
EB data mining Empirical Bayesian data mining — the statistical method federal health agencies use to scan adverse-event databases for disproportionate combinations of products and events.
As the report defines it, Empirical Bayesian (“EB”) data mining is “the data mining method utilized by federal health agencies to identify statistical associations between products and adverse events.” HHS describes it as a “statistical method for identifying disproportionality (excess of reported [adverse events]) for [a] product relative to other products) in large database[s].”
In practice, EB data mining looks at every drug-event pair in a database like VAERS and asks whether a particular pair is reported at a higher rate than would be expected by chance, given the overall reporting volume. The “Bayesian” part refers to the way the algorithm shrinks unstable estimates from rare events toward a more reliable group average — a statistical technique that reduces noise without erasing real signals.
CDC officials publicly called EB data mining the “gold standard” for disproportionality analysis. The FDA’s preferred implementation has been MGPS, run through Oracle’s Empirica Signal software. The method has known blind spots, however — most notably the masking phenomenon documented in this report.
EB05 The lower bound of an Empirical Bayesian disproportionality score. The FDA treated an EB05 above 2.0 as the threshold for a statistically significant safety signal.
EB05 is the 5th-percentile lower confidence bound of the Empirical Bayesian disproportionality score produced by algorithms like MGPS. In plain terms, it is a deliberately conservative estimate of how much more often a particular product-event pair is being reported compared to what would be expected by chance. Using the lower bound rather than the central estimate guards against being misled by small numbers.
EB05 is the lower bound of a 90% credible interval whose central tendency is the EBGM (Empirical Bayes Geometric Mean); EB95 is the upper bound. FDA’s signal threshold is EB05 > 2.0 — meaning the lower bound of the credible interval exceeds 2x the expected reporting rate — though Szarfman’s analyses noted that any EBGM value above 1 already indicates disproportional reporting.
The report describes the FDA’s threshold this way: “FDA’s threshold for determining a statistically significant safety signal was when the lower bound of the reporting estimate (EB05) exceeded 2.0.” That is, the FDA treated an EB05 over 2.0 as a “signal” worth attention, and anything below as not yet worth flagging. An FDA presentation cited in the report acknowledged that “technically, any [Empirical Bayes Geometric Mean] value above one indicates disproportional reporting” — meaning the 2.0 cutoff filtered out alerts that, by a more permissive standard, would have qualified.
Choosing EB05 over 2.0 instead of EB05 over 1.0 is one of the ways researchers like David Wiseman argue real signals were “filtered out by an inappropriately high detection threshold.”
EBGM Empirical Bayes Geometric Mean — the central (shrinkage) estimate of the relative reporting ratio under MGPS. Together with EB05 and EB95, it forms the 90% credible interval.
EBGM is the Empirical Bayes Geometric Mean: the central, shrinkage-corrected point estimate of the relative reporting ratio produced by MGPS for a given product+event pair. It is the geometric mean of the posterior distribution. The 5th and 95th percentiles of that same posterior are reported as EB05 and EB95, which together form a 90% credible interval around EBGM.
The shrinkage step is what distinguishes EBGM from the raw N/E ratio (RR): when N is small, the raw ratio is noisy and easily inflated, and the Empirical Bayes machinery pulls the estimate toward the group average, leaving only differences that the data actually support. FDA’s signal threshold has historically been EB05 > 2.0 — meaning the lower bound of the credible interval must exceed twice the expected reporting rate — though Dr. Ana Szarfman’s analyses noted that any EBGM above 1 already indicates disproportional reporting.
Empirica Signal Oracle's software platform that implements EB data mining (MGPS and RGPS); used by FDA to run signal-detection analyses on VAERS data.
The report’s PDF glossary defines Empirica Signal as “Oracle’s software platform, which utilizes EB data mining, used by FDA for data mining” (page 33). It is the software environment in which MGPS, RGPS, and the standard disproportionality outputs (EBGM, EB05, EB95, ERAM, ER05, ER95, RR, PRR, PRR_CHISQ) are computed against VAERS data.
Brian Hendrix and James Sydnor — both named in the report as Commonwealth Informatics contractors — supported FDA’s operation of Empirica Signal during the period the report covers. The platform itself is the FDA’s day-to-day tool for running the kinds of disproportionality analyses that produced the “49 examples of extreme masking” table.
ERAM Empirical Regression-Adjusted Mean — the central estimate of the relative reporting ratio under RGPS. ER05 and ER95 are the 5th and 95th percentiles.
ERAM is the RGPS counterpart of EBGM: the central, shrinkage-corrected point estimate of the relative reporting ratio for a product+event pair, but produced by the Regression-Adjusted Gamma Poisson Shrinker rather than the original MGPS. The accompanying ER05 and ER95 are the 5th and 95th percentiles of the posterior distribution, forming a 90% credible interval around ERAM.
The “regression-adjusted” part is what differentiates ERAM from EBGM. RGPS layers a regression on top of the MGPS framework that controls for stratification confounders — most importantly the masking effects that occur when one product (such as a high-volume COVID-19 vaccine) inflates the comparison baseline and hides real signals for similar products. When applied to the VAERS COVID-19 data, RGPS surfaced safety signals that MGPS did not.
EUA Emergency Use Authorization — the FDA mechanism that lets unapproved medical products reach the public during a declared emergency, before full approval.
An Emergency Use Authorization, or EUA, is a legal pathway that lets the FDA allow the use of medical products — drugs, vaccines, diagnostic tests — that have not gone through the normal full approval process, when there is a declared public-health emergency and no adequate alternative is available.
The FDA issued EUAs for the Pfizer-BioNTech COVID-19 vaccine on December 11, 2020, and for the Moderna COVID-19 vaccine on December 18, 2020. The bivalent boosters and the Janssen (Johnson & Johnson) vaccine also reached the public initially under EUA. Because EUA products are still under active investigation, the report points out, vigilant safety surveillance after authorization is especially important.
FDA The U.S. Food and Drug Administration — the federal agency that authorizes drugs, biologics, and vaccines, and that runs much of the post-market safety surveillance described in this report.
The Food and Drug Administration is the federal regulator responsible for authorizing and overseeing drugs, biologics (including vaccines), and medical devices in the United States. It sits within the Department of Health and Human Services. The FDA both authorized the COVID-19 vaccines (initially under EUA) and ran the data-mining analyses that are the central subject of this report.
Two FDA centers appear repeatedly: the Center for Biologics Evaluation and Research (CBER), which oversees vaccines, and the Center for Drug Evaluation and Research (CDER), which oversees most drugs. Several of the officials quoted in the report — including Dr. Peter Marks, Dr. Patrizia Cavazzoni, Dr. Robert Califf, Dr. Ana Szarfman, and others — held positions at FDA during the events the report describes.
filtering Excluding certain reports from a safety analysis — for example, by report type or threshold — which can suppress real signals if the filter is set too aggressively.
Filtering means setting rules that exclude certain adverse-event reports from a disproportionality analysis. Filters can be reasonable: analysts may exclude duplicate reports, reports lacking key fields, or reports flagged as foreign. Filters can also be problematic when they remove signal alongside noise.
Researcher David Wiseman, cited in the report, has argued that signals associated with the COVID-19 vaccines were “filtered out by an inappropriately high detection threshold” — specifically, that requiring an EB05 above 2.0 (rather than above 1.0) hid alerts that would technically have qualified as disproportional. In Wiseman’s framing, “truancy, masking, and filtering” together form a set of mechanisms by which real safety information gets dropped before it reaches a human reviewer.
FOIA Freedom of Information Act — the federal statute that lets members of the public request records from federal agencies.
The Freedom of Information Act, or FOIA, is the federal law that gives any person the right to request access to records held by U.S. federal agencies. Agencies must release responsive records unless they qualify for one of several narrow exemptions (for personal privacy, ongoing law enforcement, classified material, and so on). FOIA is the principal tool the press, researchers, and the public use to see what federal agencies are doing.
FOIA appears repeatedly in this report. Children’s Health Defense, the Informed Consent Action Network, and others filed FOIA requests for the FDA’s EB data mining records. Internal CDC emails reproduced in the report suggest that FOIA pressure influenced FDA’s decision to stop circulating its weekly data-mining reports — one CDC official wrote, “I think that because of the FOIAs we may have asked FDA to stop sending these weekly data mining outputs.”
GPS Gamma Poisson Shrinker — the original Empirical Bayesian data-mining algorithm developed by William DuMouchel; the predecessor to MGPS and RGPS.
GPS, short for Gamma Poisson Shrinker, is the original Empirical Bayesian data-mining algorithm invented by William DuMouchel and applied to large databases of spontaneous adverse-event reports. The report quotes HHS records crediting Dr. DuMouchel as the individual who “invented the empirical Bayesian data mining algorithm known as Gamma-Poisson Shrinker (GPS) and its successor MGPS [Multi-item Gamma Poisson Shrinker], which have been applied to the detection of safety signals in databases of spontaneous adverse drug event reports.”
GPS works on single drug-event pairs. MGPS extended the approach to handle many items at once, and RGPS later added regression adjustments to control for masking and confounding. All three are part of the same family of disproportionality-analysis tools used in pharmacovigilance.
HHS The U.S. Department of Health and Human Services — the cabinet department that contains the FDA, CDC, NIH, and other federal health agencies.
The Department of Health and Human Services is the cabinet-level federal department in charge of public health and human-services programs in the United States. It is the parent agency of the FDA, the CDC, the NIH, and several other agencies that appear in this report.
Documents produced to the Senate Permanent Subcommittee on Investigations (PSI) by HHS form the bulk of the evidentiary record summarized in the report. The Subcommittee applied Bates stamps to those records (visible as identifiers like “PSI-HHS-000008257443”) so that they can be cited consistently.
masking A statistical phenomenon where a high volume of reports for one product drowns out reports for a similar product, hiding safety signals that would otherwise stand out.
As the report defines it, masking is “a statistical phenomenon in which the volume of adverse event reports from a similar drug or vaccine product drowns out reports from other drug or vaccine products, thus distorting the baseline group being compared to the drug or vaccine of interest being screened.” It is sometimes called “muting.” When masking is severe, real safety signals can go undetected.
The report uses a plain-language analogy: imagine comparing the adverse events of hemlock against a baseline that mixes arsenic with saline. Hemlock is dangerous, but its harm looks unremarkable next to a baseline that already contains a similarly toxic substance. In the COVID-19 context, an adverse event reported for the Moderna vaccine could appear unremarkable when the baseline it is compared against is dominated by similar reports for the Pfizer vaccine.
Dr. Ana Szarfman and Dr. William DuMouchel published a 2022 paper in Drug Safety concluding that masking “is roughly eight times more likely to occur with COVID-19 vaccines than with other vaccines,” and they identified 49 examples of “extreme masking” in the VAERS data for COVID-19 vaccines.
MedDRA Medical Dictionary for Regulatory Activities — the standard medical-terminology dictionary used by FDA, EMA, and other regulators to code adverse events in databases like VAERS.
MedDRA — the Medical Dictionary for Regulatory Activities — is the standardized medical vocabulary that regulators worldwide (including the FDA, the EMA, and Japan’s PMDA) use to code adverse events, drug indications, and related concepts in safety databases such as VAERS and FAERS. Both Preferred Terms (PTs) and Standardized MedDRA Queries (SMQs) are MedDRA constructs: PTs are the granular individual labels, SMQs are curated groupings of related PTs.
MedDRA is maintained by the MedDRA Maintenance and Support Services Organization (MSSO) under the auspices of the International Council for Harmonisation (ICH) and is updated twice a year. Using a single dictionary across agencies and companies makes adverse-event data comparable across studies, products, and jurisdictions.
MGPS Multi-item Gamma Poisson Shrinker — the data-mining algorithm the FDA used to scan VAERS for safety signals. Invented by William DuMouchel in 1999.
The Multi-item Gamma Poisson Shrinker, or MGPS, is the Empirical Bayesian data-mining algorithm the FDA has long used to identify statistically significant safety signals in adverse event reporting databases like VAERS. As the report describes it, MGPS “was originally developed by William DuMouchel in 1999” and is the algorithm at the heart of the FDA’s “gold standard” disproportionality analysis.
In practice, MGPS scans large databases of spontaneous adverse-event reports and flags drug-event pairs that show up disproportionately often compared to a baseline of all other reports. The FDA’s threshold for declaring a statistically significant signal was when the lower bound of MGPS’s reporting estimate (EB05) exceeded 2.0.
MGPS has a known limitation: it is vulnerable to masking. When one or more products in the database generate enormous volumes of reports — as the COVID-19 vaccines did — the MGPS-generated baseline can be inflated to the point where real signals for similar products are hidden. Dr. Szarfman, one of the FDA officials who helped adopt MGPS in the first place, advocated internally for replacing it with a newer algorithm (RGPS) that adjusts for masking.
mRNA vaccine A vaccine that delivers a small piece of messenger RNA (mRNA) instructing the body's own cells to produce a viral protein, which the immune system then learns to recognize.
An mRNA vaccine works by delivering a strand of messenger RNA — a temporary set of genetic instructions — into a person’s cells. The cells read those instructions, briefly produce a small viral protein (in the COVID-19 case, the spike protein on the surface of SARS-CoV-2), and the immune system learns to recognize it. The mRNA itself does not enter the cell nucleus and degrades quickly.
The two FDA-authorized mRNA COVID-19 vaccines mentioned in this report are made by Pfizer-BioNTech and Moderna. Because both vaccines target the same protein and were administered to overlapping populations at very high volumes, the report argues that they create the conditions in which masking is most likely: each one’s signals can wash out the other’s in disproportionality analyses. As an FDA official put it in a draft manuscript, “[I]f the comparison group is enriched with so many mRNA COVID-vaccine reports, tha[n] it becomes very difficult to exceed the EB05>2 alert threshold.”
myocarditis Inflammation of the heart muscle (myocarditis) or the sac around the heart (pericarditis); together, myopericarditis. A signal of these events appeared in young men after mRNA COVID-19 vaccination.
Myocarditis is inflammation of the heart muscle. Pericarditis is inflammation of the pericardium, the thin sac surrounding the heart. When both occur together — or when it is hard to tell them apart — clinicians use the combined term myopericarditis. Symptoms can include chest pain, shortness of breath, and palpitations. Most cases are mild and resolve, but the condition is taken seriously.
The report describes how, on May 24, 2021, draft notes from an FDA meeting recorded the question, “Is [the Vaccine Adverse Event Reporting System (“VAERS”)] signaling for myopericarditis?” The answer, in the report’s account, was: “For the age groups 16-17 years and 18-24 years, yes.” Despite that internal acknowledgment, the FDA did not update vaccine labels to warn about myocarditis and pericarditis until June 25, 2021.
Acute myocardial infarction — a heart attack, distinct from myocarditis but sometimes discussed alongside it in this record — also appears as a signal in Drs. Szarfman and DuMouchel’s RGPS analyses of the COVID-19 vaccines.
N The observed count — the actual number of adverse-event reports in VAERS for a given product+event pair.
In a disproportionality analysis, N is the raw observed count for a single cell of the contingency table: the number of adverse-event reports that have actually been filed in VAERS pairing a specific product (e.g., a particular COVID-19 vaccine) with a specific event (e.g., a Preferred Term or SMQ). N is what was reported, full stop. It is contrasted with E, the expected count under the null hypothesis of no association, and the ratio N/E is the relative reporting ratio (RR).
OBPV The FDA's Office of Biostatistics and Pharmacovigilance, within CBER — the office that ran the data-mining analyses for COVID-19 vaccine safety surveillance.
The Office of Biostatistics and Pharmacovigilance, or OBPV, is a unit within the FDA’s Center for Biologics Evaluation and Research (CBER). It is staffed by biostatisticians, epidemiologists, and medical officers who run the post-market safety analyses for FDA-regulated biological products, including vaccines.
Many of the officials whose decisions are documented in the report worked at OBPV during the COVID-19 pandemic — including Dr. Steven Anderson (Director), Dr. Richard Forshee (Acting Deputy Director and later Deputy Director), Dr. Narayan Nair (Division Director, Division of Pharmacovigilance), Dr. David Menschik (Associate Director for Surveillance Informatics), Dr. Bethany Baer, Dr. Manette Niu, and Dr. Craig Zinderman.
OCHEN The FDA office (within CDER) where Dr. Ana Szarfman served as medical officer and safety data mining developer.
OCHEN is shorthand for the FDA’s Office of Cardiology, Hematology, Endocrinology, and Nephrology, an office within the Center for Drug Evaluation and Research (CDER). The report identifies Dr. Ana Szarfman as a “Medical Officer, Safety Data Mining Developer and Medical Informatics Analyst” in the Division of Cardiology, Hematology, Endocrinology, and Nephrology at CDER.
It is from this office, rather than CBER (which oversees vaccines), that Dr. Szarfman raised concerns about the masking limitation in the FDA’s MGPS-based analyses of COVID-19 vaccine data — making her, in effect, an outside observer pressing CBER officials to use a different algorithm.
pharmacovigilance The science and practice of monitoring drugs and vaccines for adverse effects after they reach the market — the discipline behind VAERS analysis, signal detection, and EB data mining.
Pharmacovigilance is the science and practice of detecting, assessing, and preventing harm from medicines after they have been authorized for use. It covers everything from individual case reports filed by clinicians to large-scale statistical scans of national databases like VAERS or FAERS. The word literally means “drug-watching.”
In the federal government, pharmacovigilance is concentrated in offices like the FDA’s Office of Biostatistics and Pharmacovigilance (OBPV) and the CDC’s Immunization Safety Office. Most of the people quoted in this report — biostatisticians, epidemiologists, medical officers — are pharmacovigilance professionals. The masking, filtering, and threshold debates documented in the report are, at their core, debates about how the federal pharmacovigilance system should work.
PRR Proportional Reporting Ratio — a disproportionality metric: (events for product / total reports for product) ÷ (events for other products / total reports for other products).
The Proportional Reporting Ratio (PRR) is a disproportionality metric that compares how often a particular adverse event shows up among the reports for one product against how often it shows up in the reports for every other product. In formula form:
PRR = (events of interest for the product / all reports for the product) ÷ (events of interest for all other products / all reports for all other products)
A PRR of 1 means the event appears at the same rate for the product as it does in the rest of the database; a PRR substantially above 1 suggests disproportionate reporting. The European Medicines Agency and the UK’s MHRA traditionally favored PRR-based signal detection, while the FDA leaned on MGPS/EBGM, but both metrics often appear together in disproportionality output tables to give analysts complementary views.
PRR_CHISQ Chi-squared test statistic associated with the PRR — a measure of statistical significance for the proportional reporting ratio.
PRR_CHISQ is the chi-squared test statistic that accompanies the Proportional Reporting Ratio in disproportionality output tables. Where PRR tells you the size of the reporting disproportion, PRR_CHISQ tells you how unlikely that disproportion is to have arisen by chance alone, given the observed and expected counts. Higher chi-squared values correspond to lower p-values and a stronger statistical case that the disproportion is real rather than noise. Common signal-detection rules combine a PRR threshold with a minimum PRR_CHISQ (often around 4) to filter out small or unstable cells.
PT MedDRA Preferred Term — a specific medical concept (e.g., 'Bell's palsy', 'Sudden cardiac death') used to code adverse events in VAERS.
A Preferred Term, or PT, is the standard granular adverse-event label in MedDRA, the medical-terminology dictionary regulators use to code reports in databases like VAERS. Each PT names a single distinct clinical concept — examples include “Bell’s palsy,” “Sudden cardiac death,” and “Pulmonary embolism.” When an adverse event is reported, MedDRA-trained coders map the narrative to the closest PT so that downstream signal-detection tools can group identical events together regardless of how the original report was worded. PTs are the most common unit of analysis in disproportionality output, often appearing alongside broader SMQ groupings.
PT_plus_SMQ PT_plus_SMQ — a column in MGPS/RGPS output where each row is either a single Preferred Term (PT) or a Standardized MedDRA Query (SMQ), as the adverse-event identifier.
PT_plus_SMQ is the column header used in MGPS and RGPS output tables to identify the adverse event under analysis. Each row’s identifier is either a single MedDRA Preferred Term (a granular medical concept like “Bell’s palsy”) or a Standardized MedDRA Query (a curated grouping of related PTs like “Cardiac arrhythmias terms”). Mixing the two in one column lets the analysis surface signals at multiple levels of clinical specificity in the same run — granular enough to spot a single PT spike, broad enough to detect patterns that only become visible when related PTs are pooled.
RGPS Regression-Adjusted Gamma Poisson Shrinker — an updated data-mining algorithm that controls for masking effects, first described in a 2012 Oracle white paper.
The Regression-Adjusted Gamma Poisson Shrinker, or RGPS, is an Empirical Bayesian data-mining algorithm that builds on MGPS. As the report defines it, RGPS was “first outlined in a 2012 Oracle white paper by William DuMouchel” and is “an update to MGPS that controls for masking effects.”
In a March 2021 briefing to senior FDA officials, Dr. Ana Szarfman described RGPS’s performance as “superior” to MGPS, explaining that RGPS “can better adjust for both, masking (false negatives) and confounding (false positives)” because it “incorporates more information into the signal generation process. This leads to a lower rate of missed signals and less false alerts.”
When applied to VAERS data on the COVID-19 vaccines, RGPS surfaced safety signals that MGPS did not, including signals for sudden cardiac death, Bell’s palsy, and pulmonary infarction. Despite repeated internal advocacy from the algorithm’s own co-authors, the FDA continued to rely on MGPS rather than adopt RGPS during the COVID-19 vaccination program.
RR Relative Reporting Ratio — the raw observed-over-expected ratio (N / E) for a product+event pair. No Bayesian shrinkage applied; sensitive to small counts.
In MGPS/RGPS output tables, RR is the Relative Reporting Ratio (sometimes labeled “Relative Risk”): the raw ratio of observed to expected reports for a product+event pair, N / E. Unlike EBGM or ERAM, RR has no Bayesian shrinkage applied — it is the direct output of the contingency table without correction for small-sample noise.
That makes RR useful as a sanity check (a quick view of how disproportionate the raw reporting is) but unreliable as a signal threshold, because a single report against an extremely low expected count can produce a huge RR that does not survive shrinkage. The Empirical Bayes algorithms exist precisely to discount these unstable cells.
safety signal A statistical pattern in adverse-event data suggesting a possible link between a product and an event — a flag for further investigation, not a final conclusion.
A safety signal is a statistical pattern in adverse-event data suggesting that a particular product may be associated with a particular event more often than expected. Signals are generated by tools like MGPS or RGPS scanning a database such as VAERS for disproportionate combinations of product and event.
A signal is a beginning, not an end. It tells regulators that something deserves a closer look — through chart reviews, epidemiological studies, or formal causal analysis — before any conclusion can be drawn about whether the product actually caused the event. The FDA’s threshold for considering a pattern to be a “statistically significant safety signal” was an EB05 above 2.0.
The central question in this report is whether the FDA’s chosen methodology systematically failed to detect signals it should have detected — particularly because of masking — and whether officials acted appropriately on the signals it did detect.
SMQ Standardized MedDRA Query — a curated grouping of MedDRA Preferred Terms that together represent a clinical concept (e.g., 'Cardiac arrhythmias terms').
A Standardized MedDRA Query, or SMQ, is an officially curated grouping of MedDRA Preferred Terms that together represent a clinically meaningful concept — for example, “Cardiac arrhythmias terms” bundles dozens of individual PTs (atrial fibrillation, ventricular tachycardia, bradycardia, and so on) under a single label. SMQs are published and maintained alongside MedDRA itself, so analysts across the FDA, EMA, and industry use the same definitions.
In safety-signal detection, querying by SMQ lets analysts surface signals at a clinically coherent level instead of having to enumerate every relevant PT by hand. Disproportionality outputs frequently show both PT-level and SMQ-level rows in the same table.
stratification Splitting an analysis into subgroups — by age, sex, or year — so each subgroup is compared against a more relevant baseline, helping control for confounding.
Stratification is a technique for controlling confounding by dividing the data into subgroups and analyzing each one separately. A safety analysis stratified by age, for example, would compare reports for older adults only against other reports for older adults, rather than mixing all ages into a single baseline.
In the FDA’s data-mining work, MGPS analyses are typically stratified by year to account for time-dependent reporting patterns. The report quotes Dr. Craig Zinderman discussing a Szarfman-DuMouchel analysis: “I don’t pretend to understand it, but sounds like they are suggesting an analysis not stratified by year.” Stratification is the older, manual approach to confounding control; RGPS uses regression instead, which can adjust for several variables simultaneously without slicing the data into smaller and smaller bins.
truncation Cutting off part of a dataset — for example, by date — in a way that can hide signals or distort the comparison group used in disproportionality analysis.
Truncation refers to the practice of restricting an analysis to a subset of the available data, typically by time. For instance, an analyst might limit a VAERS query to reports filed within a specific date range, or exclude very old reports that pre-date a particular surveillance system.
Truncation is sometimes legitimate — analysts may want to focus on a particular reporting period — but it can also introduce bias. If recent reports are excluded, lagging adverse-event reports may never enter the analysis. If old reports are excluded, the comparison baseline can shift in ways that change which signals appear above threshold. The report and outside researchers (notably David Wiseman) have flagged truncation, alongside masking and filtering, as a way that signals can be lost from disproportionality analyses of VAERS data.
VAERS The Vaccine Adverse Event Reporting System — a U.S. database, run jointly by the FDA and CDC, where anyone can report a health event after vaccination.
VAERS, the Vaccine Adverse Event Reporting System, is the United States’ main early-warning database for vaccine safety. It is co-managed by the FDA and CDC, and accepts reports from clinicians, vaccine manufacturers, and members of the public. The data are public, and analysts at FDA and CDC use them to look for patterns that might warrant further study.
VAERS is a “passive” or “spontaneous” reporting system: it collects what people choose to send in, rather than systematically following a defined population. That means reports can be incomplete, duplicated, or filed by people with varying levels of expertise. A report does not mean the vaccine caused the event — it just means someone observed the event after vaccination and thought it was worth reporting.
Because VAERS is one of the few near-real-time signals for vaccine safety, the way it is analyzed matters a great deal. Most of the analytical questions in this report — about masking, filtering, EB05 thresholds, and which algorithm to run — are questions about how the FDA processes VAERS data.