Video | Sustainability

On 03 May, Yuhan Douglas Rao, from the North Carolina Institute for Climate Studies, took on the task of unpacking concepts and practices pertaining to the intersection of model-sharing and sustainability. Yuhan identified three crucial elements during the presentation, and audience members posed challenging questions that were later discussed off-camera. Continue reading for some of the discussion held.

Three important elements were contextualized by Yuhan. Firstly, Yuhan proposed a spectrum along which we speak of “models”. At one extreme, we have models that part from features of physical systems. Many complex meteorological models do this sort of work, as well as much simpler models, such as the water cycle model. At the other extreme, we have models that are data-driven, or result from collating and analyzing vast amounts of data. Modern machine learning models are prime examples of these latter sort of models. But modeling may take place between the two extremes. Models that represent physical systems may have data-driven elements that allow them not to just represent, but to predict changes in those systems – such as for weather forecasting. Meanwhile, data-driven models may be fine-tuned with parameters that result from our understanding of the systems they relate with.

Secondly, Yuhan suggested two notions of sustainability. On the one hand, there is a notion of sustainability as it pertains to environmental wellbeing. For computational modelers, this topic has recently raised a new contention. For example, whilst earth scientists do work that is necessary for us to better understand and often protect the natural environment, it is becoming clearer that the growing resources required to model complex systems have detrimental environmental impacts. On the other hand, sustainability is often used in the context of open science as shorthand for the preservation of data, models and knowledge over time. This raised a question of what is needed for the sustainability of model-sharing practices.

The third element Yuhan spoke of was the infrastructure needed by the scientific community to share computational models. Yuhan identified two types of infrastructure: technical and social. By technical infrastructure, Yuhan referred to how the open science community usually thinks about infrastructure; the need for repositories, data, computing power, and so on. By social infrastructure, Yuhan listed incentives, culture, capacity building and standards.

The presentation set the stage for a rich discussion with the audience (which is not shared in the recording), who grappled with three questions posed by Yuhan:

In the context of open science, how do we address and trace provenance in model sharing?
Is there a standard way of model documentation that can enable sustainable model sharing, improve transparency, and support model reuse and reproducibility?
Have you noticed an increasing demand for sharing data-driven models? Are they any different to other models?

The discussion took four interrelated directions:

Provenance is of great interest to large institutions – from tech companies to government – for cybersecurity reasons as well as for proper attribution;
Robust governance is key to tracking changes and decision-making processes that enable accountability;
Data-driven models are growing at an enormous pace because of the ease of fine-tuning them for unique applications, which raises the question: do we need to store and share all models?
Model-sharing as a topic is so complex that we must be careful to seek progress rather than respond to its myriad of questions perfectly – developing good practices akin to a “minimum viable product” may be more fruitful than taking everything on at once.

Watch Yuhan's presentation below, join us on 24 May for the final ModelShare workshop, and sign up to our Google Group to learn how you can get involved with the ModelShare program!

← Previous Post Next Post →