The Lab Experiment That Already Happened
The experiment was over. The paper had been published. The samples had been stored away. And then Rhys Parry went looking.
In 2021, Parry — a virologist at the University of Queensland in Brisbane — was sifting through datasets he had no part in creating. What he found living inside a collaborator's published results would eventually become the foundation of a national research grant, with both of them as co-investigators. The collaborator was Alexander Khromykh, who studies how viruses manipulate tiny cellular packages called extracellular vesicles.
Parry had simply taken a second look at Khromykh's published transcriptomic data and noticed something odd: viral RNA seemed to be fragmenting in ways no one had described before. That reanalysis led to an introductory email, then a conversation, then a collaboration.
The Secondary Analysis Evangelist
This is the kind of story that rarely makes it into a paper's acknowledgments section, but Parry thinks it should be central to how early-career scientists think about building a research portfolio. Five years into a strategy built almost entirely on other people's data, he has become an unlikely evangelist for what the field calls secondary analysis.
A Mosquito Virus Discovery
The pivot began during his PhD, also at Queensland. While working with laboratory cell lines derived from Aedes aegypti mosquitoes — the same species that spreads dengue and Zika — Parry stumbled onto a virus that no one had characterized before. It couldn't infect mammalian cells, which made it unremarkable by the standards of insect virology.
But something unexpected jumped out: this unnamed virus seemed to modestly suppress dengue replication. For a researcher thinking about mosquito-borne disease, that interference pattern was intriguing.
His adviser, molecular virologist Sassan Asgari, encouraged him to widen the search. Transcriptomic datasets from Aedes aegypti cells were sitting in public repositories worldwide. Parry downloaded roughly 3,000 of them, traced the virus's evolutionary history across continents, and published the findings with coauthor M.E. James. The entire project — from initial discovery to publication — had been built on archived data that other researchers had uploaded and never touched again.
Serratus: Proof of Concept
Parry is not alone in this. In 2022, a project called Serratus demonstrated what large-scale secondary analysis could achieve by going bigger: researchers aligned billions of archived sequencing reads against viral reference genomes and uncovered thousands of previously unknown RNA virus sequences.
The work, published in Nature by R.C. Edgar and colleagues, expanded the known diversity of RNA viruses by an order of magnitude. The underlying data had been deposited by others, largely forgotten, waiting for someone to ask a different question.
The repositories are vast. The Sequence Read Archive, maintained by the National Center for Biotechnology Information at the U.S. National Institutes of Health, now holds more than 50 petabytes of genomic data. Clinical trial datasets, ecological surveys, medical imaging archives — much of it sits underused despite being publicly available.
Funding agencies and journals require data deposition for reproducibility, but Parry argues that reproducibility is only part of what archived data is good for. "Every dataset contains associations beyond those found by the researchers who generated it," he wrote. Methods evolve. Hypotheses shift. Old data becomes new.
Practical Advice for Secondary Analysis
The practical barriers are lower than many assume. What secondary analysis requires, Parry argues, is not a grant or a lab, but a question, a laptop, and the willingness to look at someone else's results from a fresh angle. R or Python will handle the programming.
The trickier part is knowing which datasets are worth the effort:
- Start with systems you understand scientifically, but ask something the original authors didn't.
- Check the metadata carefully — if the experimental conditions, time points, and replicates aren't reconstructable without extensive detective work, the dataset is probably not worth pursuing.
Not every search yields something publishable. Parry estimates he has downloaded thousands of datasets that went nowhere. But null results have their own value, and the cost of searching is negligible compared to generating new data from scratch.
A well-executed secondary analysis can be cited, published, and used as preliminary data for a grant application — indistinguishable from any other scientific output.
Building Collaborations
Most researchers, Parry has found, are surprisingly receptive when told their archived data sparked a new finding. Some collaborations begin with an email to the original authors; others lead to metadata that never made it into the published paper.
Occasionally, the original lab has the samples or reagents needed to verify a secondary finding with an actual experiment, turning a computational observation into funded research.
For scientists early in their careers facing the pressure to publish and compete for grants, Parry's message is simple: the data already exists. Someone just has to look.
Based on: "Secondary Analysis: How Scientists Are Mining Existing Datasets for New Discoveries"; Nature, 2024.