[Read: The strongest evidence yet that an animal started the pandemic]

A Major Clue to COVID’s Origins Is Just Out of Reach

But what might otherwise have been a straightforward story on new evidence has rapidly morphed into a mystery centered on the origins debate’s data gaps. Within a day or so of nabbing the sequences off a database called GISAID, the researchers told me, they reached out to the Chinese scientists who had uploaded the data to share some preliminary results. The next day, public access to the sequences was locked—according to GISAID, at the request of the Chinese researchers, who had previously analyzed the data and drawn distinctly different conclusions about what they contained.

[Read: The strongest evidence yet that an animal started the pandemic]

Yesterday evening, the international team behind the new Huanan-market analysis released a report on its findings—but did not post the underlying data. The write-up confirms that genetic material from raccoon dogs and several other mammals was found in some of the same spots at the wet market, as were bits of SARS-CoV-2’s genome around the time the outbreak began. Some of that animal genetic material, which was collected just days or weeks after the market was shut down, appears to be RNA—a particularly fast-degrading molecule. That strongly suggests that the mammals were present at the market not long before the samples were collected, making them a plausible channel for the virus to travel on its way to us. “I think we’re moving toward more and more evidence that this was an animal spillover at the market,” says Ravindra Gupta, a virologist at the University of Cambridge, who was not involved in the new research. “A year and a half ago, my confidence in the animal origin was 80 percent, something like that. Now it’s 95 percent or above.”

For now, the report is just that: a report, not yet formally reviewed by other scientists or even submitted for publication to the journal—and that will remain the case as long as this team continues to leave space for the researchers who originally collected the market samples, many of them based at the Chinese Center for Disease Control and Prevention, to prepare a paper of their own. And still missing are the raw sequence files that sparked the reanalysis in the first place—before vanishing from the public eye.


Every researcher I asked emphasized just how important the release of that evidence is to the origins investigation: Without data, there’s no base-level proof—nothing for the broader scientific community to independently scrutinize to confirm or refute the international team’s results. Absent raw data, “some people will say that this isn’t real,” says Gigi Gronvall, a senior scholar at the Johns Hopkins Center for Health Security, who wasn’t involved in the new analysis. Data that flicker on and off publicly accessible parts of the internet also raise questions about other clues on the pandemic’s origins. Still more evidence might be out there, yet undisclosed.

Transparency is always an essential facet of research, but all the more so when the stakes are so high. SARS-CoV-2 has already killed nearly 7 million people, at least, and saddled countless people with chronic illness; it will kill and debilitate many more in the decades to come. Every investigation into how it began to spread among humans must be “conducted as openly as possible,” says Sarah Cobey, an infectious-disease modeler at the University of Chicago, who wasn’t involved in the new analysis.

The team behind the reanalysis still has copies of the genetic sequences its members downloaded earlier this month. But they’ve decided that they won’t be the ones to share them, several of them told me. For one, they don’t have sequences from the complete set of samples that the Chinese team collected in early 2020—just the fraction that they spotted and grabbed off GISAID. Even if they did have all of the data, the researchers contend that it’s not their place to post them publicly. That’s up to the China CDC team that originally collected and generated the data.

Part of the international team’s reasoning is rooted in academic decorum. There isn’t a set-in-stone guidebook among scientists, but adhering to unofficial rules on etiquette smooths successful collaborations across disciplines and international borders—especially during a global crisis such as this one. Releasing someone else’s data, the product of another team’s hard work, is a faux pas. It risks misattribution of credit, and opens the door to the Chinese researchers’ findings getting scooped before they publish a high-profile paper in a prestigious journal. “It isn’t right to share the original authors’ data without their consent,” says Niema Moshiri, a computational biologist at UC San Diego and one of the authors of the new report. “They produced the data, so it’s their data to share with the world.”

If the international team released what data it has, it could potentially stoke the fracas in other ways. The World Health Organization has publicly indicated that the data should come from the researchers who collected them first: On Friday, at a press briefing, Tedros Adhanom Ghebreyesus, the WHO’s director-general, admonished the Chinese researchers for keeping their data under wraps for so long, and called on them to release the sequences again. “These data could have and should have been shared three years ago,” he said. And the fact that it wasn’t is “disturbing,” given just how much it might have aided investigations early on, says Gregory Koblentz, a biodefense expert at George Mason University, who wasn’t involved in the new analysis.

Publishing the current report has already gotten the researchers into trouble with GISAID, the database where they found the genetic sequences. During the pandemic, the database has been a crucial hub for researchers sharing viral genome data; founded to provide open access to avian influenza genomes, it is also where researchers from the China CDC published the first whole-genome sequences of SARS-CoV-2, back in January 2020. A few days after the researchers downloaded the sequences, they told me, several of them were contacted by a GISAID administrator who chastised them about not being sufficiently collaborative with the China CDC team and warned them against publishing a paper using the China CDC data. They were in danger, the email said, of violating the site’s terms of use and would risk getting their database access revoked. Distributing the data to any non-GISAID users—including the broader research community—would also be a breach.

This morning, hours after the researchers released their report online, many of them found that they could no longer log in to GISAID—they received an error message when they input their username and password. “They may indeed be accusing us of having violated their terms,” Moshiri told me, though he can’t be sure. The ban was instated with absolutely no warning. Moshiri and his colleagues maintain that they did act in good faith and haven’t violated any of the database’s terms—that, contrary to GISAID’s accusations, they reached out multiple times with offers to collaborate with the China CDC, which has “thus far declined,” per the international team’s report.

GISAID didn’t respond when I reached out about the data’s disappearing act, its emails to the international team, and the group-wide ban. But in a statement released shortly after I contacted the database—one that echoes language in the emails sent to researchers—GISAID doubled down on accusing the international team of violating its terms of use by posting “an analysis report in direct contravention of the terms they agreed to as a condition to accessing the data, and despite having knowledge that the data generators are undergoing peer review assessment of their own publication.” GISAID also “strongly” suggested “that the complete and updated dataset will be made available as soon as possible,” but gave no timeline.

Why, exactly, the sequences were first made public only so recently, and why they have yet to reappear, remain unclear. In a recent statement, the WHO said that access to the data was withdrawn “apparently to allow further data updates by China CDC” to its original analysis on the market samples, which went under review for publication at the journal Nature last week. There’s no clarity, however, on what will happen if the paper is not published at all. When I reached out to three of the Chinese researchers—George Gao, William Liu, and Guizhen Wu—to ask about their intentions for the data, I didn’t receive a response.

“We want the data to come out more than anybody,” says Saskia Popescu, an infectious-disease epidemiologist at George Mason University and one of the authors on the new analysis. Until then, the international team will be fielding accusations, already flooding in, that it falsified its analyses and overstated its conclusions.


Researchers around the world have been raising questions about these particular genetic sequences for at least a year. In February 2022, the Chinese researchers and their close collaborators released their analysis of the same market samples probed in the new report, as well as other bits of genetic data that haven’t yet been made public. But their interpretations deviate pretty drastically from the international team’s. The Chinese team contended that any shreds of virus found at the market had most likely been brought in by infected humans. “No animal host of SARS-CoV-2 can be deduced,” the researchers asserted at the time. Although the market had perhaps been an “amplifier” of the outbreak, their analysis read, “more work involving international coordination” would be needed to determine the “real origins of SARS-CoV-2.” When reached by Jon Cohen of Science magazine last week, Gao described the sequences that fleetingly appeared on GISAID as “[n]othing new. It had been known there was illegal animal dealing and this is why the market was immediately shut down.”

There is, then, a clear divergence between the two reports. Gao’s assessment indicates that finding animal genetic material in the market swabs merely confirms that live mammals were being illegally traded at the venue prior to January 2020. The researchers behind the new report insist that the narrative can now go a step further—they suggest not just that the animals were there, but that the animals, several of which are already known to be vulnerable to SARS-CoV-2, were there, in parts of the market where the virus was also found. That proximity, coupled with the virus’s inability to persist without a viable host, points to the possibility of an existing infection among animals, which could spark several more.

The Chinese researchers used this same logic of location—multiple types of genetic material pulled out of the same swab—to conclude that humans were carrying around the virus at Huanan. The reanalysis confirms that there probably were infected people at the market at some point before it closed. But they were unlikely to be the virus’s only chauffeurs: Across several samples, the amount of raccoon-dog genetic material dwarfs that of humans. At one stall in particular—located in the sector of the market where the most virus-positive swabs were found—the researchers discovered at least one sample that contained SARS-CoV-2 RNA, and was also overflowing with raccoon-dog genetic material, while containing very little DNA or RNA material matching the human genome. That same stall was photographically documented housing raccoon dogs in 2014. The case is not a slam dunk: No one has yet, for instance, identified a viral sample taken from a live animal that was swabbed at the market in 2019 before the venue was closed. Still, JHU’s Gronvall told me, the situation feels clearer than ever. “All of the science is pointed” in the direction of Huanan being the pandemic’s epicenter, she said.

To further untangle the significance of the sequences will require—you guessed it—the now-vanished genetic data. Some researchers are still withholding their judgment on the significance of the new analysis, because they haven’t gotten their hands on the genetic sequences themselves. Others are also wondering whether more data could yet emerge, given how long this particular set went unshared. “This is an indication to me in recent days that there is more data that exists,” Maria Van Kerkhove, the WHO’s COVID-19 technical lead, told me. Which means that she and her colleagues haven’t yet gotten the fullest picture of the pandemic’s early days that they could—and that they won’t be able to deliver much of a verdict until more information emerges. The new analysis does bolster the case for market animals acting as a conduit for the virus between bats (SARS-CoV-2’s likeliest original host, based on several studies on this coronavirus and others) and people; it doesn’t, however, “tell us that the other hypotheses didn’t happen. We can’t remove any of them,” Van Kerkhove told me.

More surveillance for the virus needs to be done in wild-animal populations, she said. Having the data from the market swabs could help with that, perhaps leading back to a population of mammals that might have caught the virus from bats or another intermediary in a particular part of China. At the same time, to further investigate the idea that SARS-CoV-2 first emerged out of a laboratory mishap, officials need to conduct intensive audits and investigations of virology laboratories in Wuhan and elsewhere. Last month, the U.S. Department of Energy ruled that such an accident was the likelier catalyst of the coronavirus outbreak than a natural spillover from wild animals to humans. The ruling echoed earlier judgments from the FBI and a Senate minority report. But it contrasted with the views of four other agencies, plus the National Intelligence Council, and it was made with “low confidence” and based on “new” evidence that has yet to be declassified.

[Read: The lab leak will haunt us forever]

The longer the investigation into the virus’s origins drags on, and the more distant the autumn of 2019 grows in our rearview, “the harder it becomes,” Van Kerkhove told me. Many in the research community were surprised that new information from market samples collected in early 2020 emerged at all, three years later. Settling the squabbles over SARS-CoV-2 will be especially tough because the Huanan market was so swiftly shut down after the outbreak began, and the traded animals at the venue rapidly culled, says Angela Rasmussen, a virologist at the University of Saskatchewan and one of the researchers behind the new analysis. Raccoon dogs, one of the most prominent potential hosts to have emerged from the new analysis, are not even known to have been sampled live at the market. “That evidence is gone now,” if it ever existed, Koblentz, of George Mason University, told me. For months, Chinese officials were even adamant that no mammals were being illegally sold at the region’s wet markets at all.

So researchers continue to work with what they have: swabs from surfaces that can, at the very least, point to a susceptible animal being in the right place, at the right time, with the virus potentially inside it. “Right now, to the best of my knowledge, this data is the only way that we can actually look,” Rasmussen told me. It may never be enough to fully settle this debate. But right now, the world doesn’t even know the extent of the evidence available—or what could, or should, still emerge.