Jigsaw Applied
 

Spatial Data-Mining and Data Anomaly Detection

Opportunity

There is an ever-growing need for "watch-dog" programs that can process truly vast data volumes and autonomously identify, or shortlist, anomalous points or areas in the data, sometimes without needing to be told specifically what constitutes an anomaly. Customers for such capabilities include banks, insurance companies, tax offices, homeland security, security organisations, organizations such as the INS, statistics bureaus, and scientific institutions.Example using Meteorological data.

These days it is important that such systems operate in real time alongside existing information processing systems, identifying patterns of behaviour that require deeper scrutiny for reasons of fraud, investment, defense, crime-fighting, or medical or scientific interest.

In some cases, specific sources of data may have no intrinsic anomalous qualities, but may show up as anomolous when compared with other sources of data. Making such comparisons normally leads to insurmountable computation requirements, particularly when the data may be updated several times per second. In other cases, anomalies may only show up in unusual behaviour patterns over time. Such cases are also difficult to identify with standard technologies.

The state of the art is quite limited. It is currently very difficult or impossible to:

automatically classify large numbers of data sources
automatically classify the data into coherent groupings in real time
automatically update these classifications as the nature of the data changes
identify which sources of information are actually sources of misinformation
visually highlight individual or groups of related sources of information
visually highlight data that is anomalous and unable to fit into a normal grouping.

This means that unusual behaviour, whether it be criminal, potentially of a terrorist nature, of scientific interest, statistically important, or otherwise instructive, can often go undetected.

JIGSAW benefits – short term

JIGSAW is unusual in several ways:

scales to very large data volumes
identifies unusual behaviour patterns, rather than just unusual data
requires no cue-ing or priming to search for pre-conceived anomalies
works in real time or batch mode
tolerates high levels of noise or incoherence
finds the nearest relations to any data point

When processing published meteorological data from Australia's weather stations, JIGSAW (without prior suspicion) remotely identified a poorly calibrated barometer at one of the outback stations, and also estimated the magnitude of error in the instrument.