It seems to me that sensor readings tend to correlate between geographically separate sensors. I wonder whether or not it would be a good idea to calculate a variation value based on how closely readings match.
For example, let’s presume the following. Sensor 1 reports a one hour PM2.5 average of 10 and sensor 2 (located 500 meters away) reports a one hour PM2.5 average of 9. Take the maths with a pinch of salt as it’s only an example and it’s been a long time since I did statistics (or maths for that matter lol). I’m sure somebody cleverer than me could work out the specifics, but here’s my bash at it…
- The mean of these values is (9+10)/2 = 9.5
- Standard deviation can be worked by subtracting the mean from each sensor, squaring it, and calculating the mean of the squared differences to find the sample variance. Finally square rooting this number gets the standard deviation. We can ignore whether the end result is positive or negative so…
- (9 - 9.5)² = 0.25
- (10 - 9.5)² = 0.25 (the same as we only have two sensors in the example)
- sqrroot(0.25) = 0.5 (standard deviation)
- We could then use coefficient of variation (standard deviation divided by mean, or 0.5 / 9.5 = 0.053) or 5.3% to show (in this example) that there is little variation amongst samples.
If we did this, people building applications on top of the data (let’s say for example a system which alerts users to breached air quality in their area), rules could be put in place within the application to ensure that it doesn’t alert unless variation value is <10% for example. This would help prevent temporary and highly localised sources of degraded AQ like BBQs from triggering air quality warnings on systems built on top of our data.
Now waiting for somebody to point out the critical error in my maths and logic