Smartphone-collected Big Data has the potential to transform the way we can understand and predict weather systems. Five months ago, we at OpenSignal (a project to map global cell phone signal coverage) launched an app called WeatherSignal to collect atmospheric data from smartphones. WeatherSignal works by repurposing the sensors that already exist in Android devices in order to build a live map of atmospheric readings.

The most recent Galaxy phone, the S4, contains a barometer, hygrometer (humidity), ambient thermometer and lightmeter – all of which is important data for meteorology. While the S4 is the most advanced phone in terms of sensors, valuable readings can be gathered from many other phones as well. The prospect of a granular network of millions of inter-connected weather stations is an exciting one for meteorology.

We are often asked how we can trust the data, as mobile phones are often indoors or in pockets. The answer to this is twofold. First, we can combine sensor readings (if light reading is sub x then phone is not outdoors, for instance) and second, given appropriate volume we can arrive at valid averages – an answer that gets to the heart of what Big Data really means.

The philosophy of Big Data is that insights can be drawn from a large volume of ‘dirty’ (or ‘noisy’) data, rather than simply relying on a small number of precise observations – a subject covered in detail by Viktor Mayer-Schönberger and Kenneth Cukier in their recent book ‘Big Data’. One good example of the success of the ‘Big Data’ approach can be seen in Google’s Flu Trends which uses Google searches to track the spread of flu outbreaks worldwide. Despite the inevitable noise, the sheer volume of Google search data meant that flu outbreaks could now be successfully identified and tracked in near real-time. In comparison, relying on Doctors to report flu cases as they were observed resulted in a comparative lag of up two weeks in the identification of outbreaks. Despite this, however, the system is not perfect. Flu Trends recently majorly overestimated an epidemic in the US – possibly because increased media coverage led to an increase in false positive searches for flu symptoms. It is also important to remember that Big Data when used on its own can only provide probabilistic insights based on correlation.

The true benefit of Big Data is that it drives correlative insights, which are achieved through the comparison of independent datasets. It is this that buttresses the Big Data philosophy of ‘more data is better data’; you do not necessarily know what use the data you are collecting will have until you can investigate and compare it with other datasets.

One good example of this is the experiment which ultimately led to our creating WeatherSignal. We had been collecting readings of battery temperature from our connection-toolkit app called OpenSignal. On investigation, we identified a historical correlation between averaged smartphone battery temperatures and the ambient temperature readings made by dedicated weather stations.

Working from this starting point we published a paper in conjunction with the Royal Meteorological Society of the Netherlands that developed an algorithmic approach to converting battery temperature readings to ambient temperature. We use this approach to create averaged ambient temperature readings from phones that don’t contain an external thermometer; a result which would never have been possible if we hadn’t collected battery temperature readings and compared them to historic atmospheric data.

The ‘Big Data’ approach has already begun to be incorporated into weather nowcasting, and the Flu Trends disease example provides an excellent allegory for where it can initially prove most useful. The UK Met Office has started making use of various non-traditional sources to track the spread of snowfall, including geo-located tweets mentioning snow. It is instructive here to think of snowfall as an ‘outbreak’, a (relatively) unpredictable high-impact event that can be better-managed through more immediate and granular data. Such is the nature of these high-impact events that having more data, however dirty, is especially useful for helping to limit consequences through more effective immediate decision-making. Initially we believe that the WeatherSignal data will be most useful for ‘outbreak’ type events, using pressure readings for short-term storm forecasting and surface temperature readings to determine the spread of snowfall.

The next step lies in proving empirically that smartphone data has an important role to play in the future of weather forecasting. We are currently looking for more academic partners to come forward and make use of our data and already have an exciting group of collaborators lined up. We are working with Birmingham University Climate Lab (BUCL) to prove that crowdsourced smartphone sensor readings can be useful in studying urban climate. BUCL have established a dense network of weather stations and temperature sensors in their city, which will be used to test the crowdsourced readings from the WeatherSignal network.

We have also begun to supply pressure readings to the University of Washington to help prove their use in atmospheric modelling, and have announced plans to share our data with the Met office. The benefits that a crowdsourced approach can bring to the science of meteorology are only just becoming apparent, but the winds of change are blowing.