What signal masking is, and why it matters
When a new vaccine is rolled out to tens of millions of people, regulators need a way to tell, quickly, whether it is causing harm at unusual rates. The U.S. Food and Drug Administration's main early-warning tool is a computer program that scans VAERS (the public adverse-event reporting database) and flags vaccine-event combinations that show up far more often than expected. The program does this by comparing reports for one product against a baseline drawn from reports for everything else. That baseline is what makes the system work, and it is also where it has a known blind spot. When one product, or a small group of similar products, dominates the database, the baseline gets distorted, and real signals can be hidden inside it. Records released by the U.S. Senate Permanent Subcommittee on Investigations describe what FDA scientists saw when they applied a newer method to the COVID-19 vaccine data, and what happened internally when they reported the results.
What "masking" actually means
Masking happens whenever the baseline a surveillance system compares a product against is itself made up largely of similar products with similar adverse-event profiles. When that happens, a real safety signal can be drowned out by the very thing it should stand out against.
The Subcommittee's report uses a plain analogy. Imagine you want to know whether hemlock is dangerous. You compare hemlock's adverse-event reports against a baseline. If that baseline is just saline (an inert substance), hemlock's harms stand out clearly. But if the baseline is a mixture of arsenic and saline, hemlock's reports look unremarkable next to a comparison group that already contains a similarly toxic ingredient. Hemlock has not become safer; the comparison has been corrupted.
The COVID-19 vaccines created an analogous situation in VAERS. By 2021, reports for the Pfizer-BioNTech and Moderna mRNA vaccines made up such a large share of the database that they began appearing on both sides of the comparison: as the product being screened, and as a major component of the baseline they were screened against. Dr. William DuMouchel, the statistician who originally invented the FDA's data-mining algorithm, illustrated the effect with a concrete number: applied to COVID-19 vaccine data, masking was "roughly eight times more likely to occur with COVID-19 vaccines than with other vaccines."
Rave Harpaz et al., Signaling COVID-19 Vaccine Adverse Events, Drug Safety (2022).
External source →The two methods: MGPS and RGPS
To detect safety signals in VAERS, the FDA used a method called the Multi-item Gamma Poisson Shrinker, or MGPS. DuMouchel published the algorithm in 1999, and Dr. Ana Szarfman — a senior FDA medical officer and safety data-mining developer — was one of the agency officials who helped adopt it.
Robert O'Neill and Ana Szarfman, Some US Food and Drug Administration perspectives on data mining for pediatric safety Assessment, Current Therapeutic Research (2001), https://www.sciencedirect.com/science/article/abs/pii/S0011393X01800710; Ana Szarfman et al., Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations of drugs and events in the US FDA's spontaneous reports database, Drug Safety (2002); David Wiseman, Signal loss by truancy, masking, and filtering, and underestimation of potential risks and suspected adverse reactions in the Disproportionality Signal Analyses of VAERS data associated with COVID-19 pro-vaccines, ResearchGate, Sept. 2025, https://www.researchgate.net/publication/395382959_Signal_loss_by_truancy_masking_and_filtering_and_underestimation_of_potential_risks_and_suspected_adverse_reactions_in_the_Disproportionality_Signal_Analyses_of_VAERS_data_associated_with_COVID-19_pro at 8-9; PSI-HHS-000008257238. One FDA official referred to Dr. Szarfman as someone who "worked to develop the data mining system[.]" Id. In 2002, Dr. Szarfman was reportedly awarded "the FDA and CDER Outstanding Scientific Achievement Awards for contributions to safety data mining." See Professional Activities, Ana Szarfman, ORCID, https://orcid.org/0000-0001-6680-1423.
PSI-HHS-000008257238Read PDF (page 162) →View as text →See also: external source →PSI's March 25, 2026 document release on EB data mining: at 166; FOIA production: https://www.fda.gov/media/184988/download?attachment at 23. Although FDA officials used 2.0 as their threshold for a statistically significant signal, an FDA official's April 5, 2021 PowerPoint presentation before the Advisory Committee on Immunization Practices appeared to recognize that, "Technically, any [Empirical Bayes Geometric Mean] value above one indicates disproportional reporting." See PSI's March 25, 2026 document release on EB data mining: https://www.hsgac.senate.gov/wp-content/uploads/2025.05.21-Supporting-Documents-Failure-to-Warn-Part-08.pdf at 47. As medical researcher Dr. David Wiseman wrote in his September 2025 Preprint article, because health officials used the higher threshold of 2.0, as opposed to 1.0 which would "technically" indicate a signal, "[s]ignals were filtered out by an inappropriately high detection threshold." David Wiseman, Signal loss by truancy, masking, and filtering, and underestimation of potential risks and suspected adverse reactions in the Disproportionality Signal Analyses of VAERS data associated with COVID-19 pro-vaccines, ResearchGate, Sept. 2025, https://www.researchgate.net/publication/395382959_Signal_loss_by_truancy_masking_and_filtering_and_underestimation_of_potential_risks_and_suspected_adverse_reactions_in_the_Disproportionality_Signal_Analyses_of_VAERS_data_associated_with_COVID-19_pro at 3.
External source →MGPS has a well-known limitation: it is vulnerable to masking. In 2012, DuMouchel and a colleague published an updated method called the Regression-Adjusted Gamma Poisson Shrinker, or RGPS, designed to correct for that limitation. RGPS still scans for disproportionality, but it adjusts the comparison so that one large product cannot drown out signals from a similar product. In a March 1, 2021 briefing to senior FDA officials, Szarfman described RGPS as "superior" to MGPS because it "can better adjust for both, masking (false negatives) and confounding (false positives)."
RGPS is the state of the art... incorporates more information into the signal generation process. This leads to a lower rate of missed signals and less false alerts. — Dr. Ana Szarfman, briefing to FDA leadership, March 1, 2021
What Szarfman and DuMouchel found
On March 26, 2021, Szarfman sent senior FDA officials a spreadsheet from DuMouchel comparing MGPS and RGPS side by side on the same VAERS data. The comparison identified "49 examples of extreme masking" in the COVID-19 vaccine data.
Of those 49, more than twenty adverse events showed a statistically significant safety signal once RGPS adjusted for masking. These were signals that the MGPS method was not surfacing. Among the events that emerged when masking was accounted for were sudden cardiac death, Bell's palsy, and pulmonary infarction.Over the following months, Szarfman and DuMouchel ran further analyses and reported additional signals to FDA colleagues, including acute myocardial infarction associated with the Moderna and Pfizer vaccines, non-site-specific embolism and thrombosis associated with the Janssen (Johnson & Johnson) and Pfizer vaccines, dementia associated with the Pfizer vaccine, and "Death and sudden death" associated with the Moderna and Pfizer vaccines.
PSI-HHS-000008258306 (with attachment); PSI-HHS-00008258202-03; PSI-HHS-000002208944-45; PSI-HHS-000004592364-65 (with attachment) (emphasis added).
PSI-HHS-000008258306; PSI-HHS-00008258202-03; PSI-HHS-000002208944-45; PSI-HHS-000004592364-65Read document →I am not astonished that [FDA's data mining system] was unable to detect these signals. — Dr. Ana Szarfman, email to FDA colleague, June 11, 2021
In a 2022 paper in the journal Drug Safety, Szarfman, DuMouchel, and co-authors described their method as one that "automatically unmask[s] signals that remain hidden by other data mining methodologies" and concluded that masking "is roughly eight times more likely to occur with COVID-19 vaccines than with other vaccines."
Rave Harpaz et al., Signaling COVID-19 Vaccine Adverse Events, Drug Safety (2022).
External source →What happened internally
Szarfman raised the masking issue with FDA colleagues repeatedly between February 2021 and 2022. The internal correspondence shows officials acknowledging her expertise and the limits of their own. In a March 2021 email, one CBER official wrote, "I know Ana [Szarfman] worked to develop the data mining system," and later added, "I know that she knows a lot more about data mining than I do!"
In an April 2021 email discussing Szarfman's and DuMouchel's analysis, another CBER official wrote, "I don't pretend to understand it, but sounds like they are suggesting an analysis not stratified by year."In mid-April 2021, after Szarfman circulated an analysis of FDA's current system, a senior FDA official wrote to colleagues:
Before we potentially reach out to Ana, we should meet internally — many considerations not suited to email… — Senior FDA official, internal email, April 15, 2021
On May 7, 2021, after a draft email circulated among Drs. David Menschik, Narayan Nair, and Craig Zinderman of FDA's Center for Biologics Evaluation and Research, Zinderman sent Szarfman a directive asking her to "hold off on creating and sending data mining reports and analyses using COVID-19 vaccine [adverse event] data."
PSICOVID_00017246; PSI-HHS-000008251530; PSI-HHS-000008251912-13; PSI-HHS-000001195617-19; PSI_HHS-000008253450-51; PSI-HHS-000001175745-47; PSICOVID_00017246-47.
PSICOVID_00017246; PSI-HHS-000008251530; PSI-HHS-000008251912-13; PSI-HHS-000001195617-19; PSI_HHS-000008253450-51; PSI-HHS-000001175745-47; PSICOVID_00017246-47Read PDF (page 186) →View as text →In September 2021, then-CBER Director Dr. Peter Marks wrote to CDER Director Dr. Patrizia Cavazzoni (to whom Szarfman reported) that Szarfman "has been asked to cease and desist" her vaccine data-mining work, and that her approach "could create erroneous conflicts that feed in to anti-vaccination rhetoric."
PSI-HHS-000002213753; PSICOVID_00017246-47. See also, Testimony of Dr. Peter Marks before the Select Subcomm. on the Coronavirus Pandemic Comm. on Oversight and Accountability, U.S. House of Representatives, Feb. 15, 2024.
PSI-HHS-000002213753; PSICOVID_00017246-47Read PDF (page 26) →View as text →See also: external source →PSI-HHS-000002213752-53 (ellipses in original).
PSI-HHS-000002213752-53Read PDF (page 25) →View as text →In July 2022, the final weekly COVID-19 vaccine data-mining report was distributed inside FDA, ending a practice that had been routine since the vaccines were authorized. Internal CDC emails from later that fall noted that the change in distribution coincided with a series of public records requests for the same data: one CDC official wrote, "I think that because of the FOIAs [Freedom of Information Act requests] we may have asked FDA to stop sending these weekly data mining outputs."
What the records show
The Subcommittee published the underlying records on which this account rests, including the contemporaneous emails quoted above. The site organizes them three ways: a day-by-day timeline of internal correspondence and external events; a quotes index grouped by speaker and topic; and a document archive indexed by Bates number, where each footnote on this page links to the underlying record where one is publicly posted. The glossary defines the pharmacovigilance terms used here and in the source materials.
The substantive finding
A known statistical limitation of the FDA's then-current data-mining method went unaddressed during a period when novel vaccines were under active monitoring. A more sensitive method, designed by the same statistician who built the original system, existed and was offered. The scientists who developed both methods raised the issue inside the agency between early 2021 and 2022, identified specific safety signals (for sudden cardiac death, Bell's palsy, pulmonary infarction, acute myocardial infarction, dementia, and others) that the newer method surfaced and the older one did not, and were asked to stop. The records released by the Subcommittee make it possible to follow that sequence step by step, in the participants' own words.