Turning DNA data into crucial evolutionary insights
Who Dr Minh Bui and Professor Robert Lanfear, Australian National University
What Dr Bui and Professor Lanfear combined their computer science and biology expertise to develop IQ-TREE2 – free, open-source software that turns DNA data into crucial evolutionary insights. Used to investigate everything from early life forms to the virus causing the COVID-19 pandemic, this user-friendly tool, first released in 2019, has become a staple for life scientists worldwide.
Winners of the 2023 Australian Research Data Commons Eureka Prize for Excellence in Research Software
What role does IQ-TREE2 play in the science community?
It allows researchers across the life sciences to infer evolutionary histories from DNA sequence data, often at dramatically different scales. For example, over the past few years IQ-TREE2 has been used to track the origin and spread of the coronavirus causing the COVID-19 pandemic, and it has also been used to make inferences about the origins of life on earth – a process which occurred billions of years ago.
IQ-TREE2 is open-source software. Can you explain what this means and how this feature influences its evolution?
Open source means that the software is free to use, modify and redistribute, if proper attributions are given to the original authors. In the case of IQ-TREE2, the source code is freely available on the hosting platform GitHub.
The fact that IQ-TREE2 is open source serves at least three purposes. First, it means others can easily build on and improve our work, facilitating faster scientific progress. Second, there is more trust in IQ-TREE2, because everything in the program is transparent to everyone who can understand the code. Finally, being open source means that anyone can contribute to the project. As IQ-TREE2 grows in popularity, the community of contributors also grows.
What have been some of the most challenging aspects of working on this software?
The number of things that can potentially go wrong! IQ-TREE has hundreds of options, which allow users to perform a myriad of different types of analyses. It’s almost impossible to test how all combinations of these options work with each other. And if you combine this with all the inventive ways that people try to use the software, it may break unpredictably.
A complex piece of software is like a large piece of infrastructure, for example, a telescope or other major facility.
Moreover, it’s difficult to find funding to keep the software working well. A complex piece of software is like a large piece of infrastructure, for example, a telescope or other major facility. Just like a telescope would need someone to keep it running, complex software requires constant maintenance to keep it working, because things like computer chips and other bits of software that IQ-TREE relies on are constantly evolving. On top of that, the more users we have, the more people ask us questions or have requests for small – or large – updates and changes. It’s extremely difficult to find funding to just maintain software in Australia, let alone improve it.
What might readers find surprising about the field of research software?
That it’s completely free to use, and that anyone can see every line of code which makes it work – and can make their own copy and edit it as much as they like! The open-source community is a huge benefit to research software, although it does bring the challenge of how one funds the development and maintenance of such software.
Most scientists rely on research software of one kind or another, but software can be surprisingly undervalued when it comes to things like funding and promotion.
What are some the practical benefits that the broader population might experience due to IQ-TREE2?
IQ-TREE2 is used to power many inferences in public health and the management of diseases. For example, at the beginning of the COVID-19 pandemic researchers around the world were using IQ-TREE2 both to figure out where the virus came from, and to track its spread from country to country and person to person. IQ-TREE was pivotal in many early contact tracing efforts, because often the genomes of the coronavirus had enough information to figure out who gave the virus to who. For example, it’s exactly this kind of analyses which pinpointed the source of the Melbourne outbreaks to leaks from quarantine hotels.
What does winning a Eureka Prize mean to you?
It’s great! It’s wonderful to be recognised, and on a personal level we hope it will help us to continue developing this software in the years to come. More importantly though, we don’t think it’s really the winning that counts here – there will only ever be a few finalists and one winner, and there’s far more amazing research software than can be recognised in a list that short.
The most important aspect of this Eureka Prize is that it raises the profile of research software. Most scientists rely on research software of one kind or another, but software can be surprisingly undervalued when it comes to things like funding and promotion. The establishment of the Eureka Prize for Excellence in Research Software really helps to raise the profile of software in Australia, and that helps underpin a crucial part of the scientific endeavour.
The Australian Museum Eureka Prizes are the country’s most comprehensive national science awards, honouring excellence across the areas of research & innovation, leadership, science engagement, and school science.