The models of biological processes that appear in scientific papers often contain serious errors that make it impossible to use them as is. And it's the system that is to blame.
Catherine Lloyd, who works in Peter Hunter's group at the University of Auckland, arrived at the BioSysBio conference in Cambridge today to argue for scientists to not just publish papers but the executable models that they used to create or explain their results.
The Auckland group has been working closely with Dennis Noble's team at the University of Oxford for many years. Noble pioneered the use of computer models in biology with his work on the electrical signals that move around the heart. Recently that work been assembled into animated models that can guide surgeons on where to operate on a diseased heart.
Although models are central to systems biology, the system for publishing research is not really set up to deal with them. Lloyd, who curates the models held by the Auckland team, said the current publishing process introduces problems. "To publish their research, [scientists] have to translate their model into text and equations for publication," she said.
One answer is to submit the model itself, or at least one that works the same way. Right now, researchers use Matlab, Mathematica and a grab bag of other tools to write theirs. The Auckland proposal is to use a derivative of XML, called not surprisingly CellML, to hold the guts of the model.
Lloyd said one problem with using something like Matlab is that there is a lot of procedural code in the typical model needed just to get it going. What researchers really want is just the the core of the model: the differential equations that replicate a biological system's behaviour.
It would streamline a lot of the work for the team at Auckland. According to Lloyd, of the almost 400 models in the cellml.org repository, only four made it from the textual description to executable model in one go. The source papers for most of the others – the majority, but not all appeared in journal papers – contained typos and other mistakes that meant the model did not behave as expected. Albert Goldbeter gets the award for providing two of the error-free models.
"Sometimes we get errors where we have to contact the model author and for some models we will never be able to access the code," said Lloyd.
In some cases, how universities license IP can cause problems with access to the actual models, even if they are only used for testing a CellML derivative. And sometimes, the model just isn't available, possibly because the original paper and model don't quite agree.
"It is surprising how many researchers 'lose' their code. They just can't find it despite all the years they have worked on it," said Lloyd.
According to Lloyd, some journals are interested in the idea of publishing CellML models alongside papers. One possible incentive for scientists to do it is to provide an additional reference for the model so teams wind up getting two citations for the price of one. Or journals could simply refuse to publish papers based on models that don't turn up with the model itself.
Although publishing a model along with a paper means extra work, it could streamline things as running the CellML version acts as a kind of proofreading process for the underlying equations. Getting it into CellML is another matter, but work is underway on a Matlab to CellML converter and there are already tools such as COR and PCEnv for writing and running CellML models.