It’s official – data is now interesting! One of the striking things about the COVID-19 pandemic has been the central role that data has played in managing the epidemic. And how as a nation we have become fascinated by “the numbers”. How many times have you checked the news over the last year and turned first not to the sports news, or the stock market report, or the celebrity gossip column, but to the COVID dashboard tracking the numbers of cases, hospitalisations and deaths? Through the COVID experience, we have come to recognise and value the importance of high quality data so that we can understand the dynamics of the pandemic, adapt our behaviour in response to the level of risk, and make plans for the future.
David Olusoga and Steven Johnson’s fascinating BBC4 show Extra Life explores how data has contributed to enormous strides forward in life expectancy. But it’s not just medicine where we rely on data to help us manage real-world problems. Global expenditure on environmental monitoring is increasing at nearly 10% per year and is forecast to reach £19.3 billion by 2025 as businesses respond to stricter regulations and public pressure to do more to tackle water, air, noise and soil pollution. Just as with COVID, high quality, trusted data is vital for assessing the state of the environment, and understanding whether things are getting better or worse, and devising strategies to mitigate environmental risks and impacts.
So how do we ensure that we’re collecting the right data, and extracting the maximum information from it?
Gathering data costs money. Sensors are getting cheaper, but it’s still essential to plan any monitoring activities properly to ensure you get the biggest bang for your buck. As a statistician, a question I get asked regularly is “How many samples do I need?”. Unfortunately there isn’t a simple, one-size-fits-all answer. (Spoiler alert: it depends on what you’re measuring, how you’re measuring it, what questions you want to answer, and how much confidence you need in the results.) But by looking at historical, analogue or pilot data, it is possible to simulate and compare alternative monitoring strategies and find that sweet spot between gathering too much data and too little.
The past decade has witnessed something of a revolution in the world of data analysis as exponentially increasing computing power and new online tools such as H2O and Azure have made machine learning and artificial intelligence methods faster and more accessible than ever before. Like many shiny new things, the benefits of these techniques are sometimes hyped up, but their potential is clear. In one study of Atlantic salmon in Wales, for example, APEM found that a machine learning algorithm produced predictions of fish density that were 26% more accurate than those from a more traditional multiple regression model.
This is an impressive result, but it’s important to be aware that machine learning and artificial intelligence techniques are not a panacea. They tend to be more data-hungry than many conventional statistical methods, and can be difficult to interpret, making it harder to explain what is causing the patterns and trends observed in the data. Being aware of the pros and cons of alternative approaches can help tailor the right methods to your data.
But ultimately it’s about communication. If end users can’t access the data they need, or don’t understand or trust the outputs from a statistical model, then there’s a problem. A friend who works at the Office for National Statistics has the job of communicating complex survey results to a variety of audiences. To help her get inside the mind of her target audience, she has created three imaginary people: Doris (basically your nan, who needs simple, non-technical messages), Boris (the politician, who wants clear facts and figures) and Horace (the academic, who is interested in all the technical detail). Prior to publication, each output must pass the Doris-Boris-Horace test to ensure that it is accessible to all interested parties. Being mindful of who is going to use the data, and how, can help ensure that data is used to its full potential.
How APEM can help
APEM work with clients throughout the water, energy and environment industries to provide an end-to-end solution to capture, analyse, and utilise environmental monitoring data.
Our tried-and-tested approach, has helped organisations save money, reduce risk, achieve compliance and make better decisions.