Sponsored by BMBF Logo


Astrometric Matching in StarGlobe

(This use case is adapted from the Master's Thesis by Sebastian Huber [multiplestreams].)


Spatial matching is an important step in the whole process of SED (spectral energy distribution) assembly and classification. PMRs (primary match results) from various catalogs are cross-combined, which can lead to an exponential growth of data. Now, the challenge is to filter out those combinations which do not fit together and select only "valid" match candidates for further processing and classification. Traditional approaches like the GAVO crossmatcher calculate all possible combinations of PMRs en bloc in main memory and subsequently filter only good match candidates with the Χ²reduced-method. This approach sooner or later will run out of main memory when matching too many catalogs and/or sources.

With StarGlobe being a distributed system, a different very promising approach can be taken. By distributing the combination phase of the spatial matching over multiple peers, much more PMRs can be joined at the same time and therefore, more catalogs can be included to produce even better match candidates. In addition, Χ²reduced-filtering, which is done after the deterministic matching in the workflow of SED assembly, is split up and relocated at the join operators of the deterministic matching process (compare Figure 1).

This results in a tight integration of the deterministic as well as the statistical process. It improves selectivity and prevents bad match candidates from yielding unnecessary combinations at a very early stage. It also reduces network traffic and increases throughput of valid match candidates at single peers. But one has to be careful when filtering out match candidates, whose Χ²reduced lies slightly above the threshold, too early. At the first glance, these are bad match candidates, but as long as further counterparts could join a match candidate, its Χ²reduced value could drop below the threshold. This case eventuates when an existing match candidate is joined by another counterpart whose coordinates lie near or (in the worst case) exactly on the optimal center of the counterparts, being interpreted as the "same" object (a so-called spider). This would contribute to the "compactness" of the spider, such that its Χ²reduced value decreases. Therefore, thresholds on filter operators placed at inner nodes have to be specified more generously, i. e., to make sure that no possible good match candidate is dropped, the local Χ² is divided by the maximum degree of freedom a match candidate can reach throughout the assembly process which depends on the number of catalogs being spatially matched.

Another feature of the distributed approach is that first results are returned rather quickly in contrast to the traditional approach where no results are returned as long as the calculation goes on. This is due to the fact that StarGlobe is a DSMS and uses non-blocking query operators which process data streams on-the-fly. So the time the first tuple needs to travel through the network is much smaller than the time the last tuple needs to be returned. To realize real-life spatial matching scenarios for SED assembly, PMRs have been acquired from different catalogs shown in the table below.

Catalogs used in spatial matching scenarios
Catalog Spectral Band Amount of Objects Full Name
2MASS near-infrared 470992970 Two Micron All Sky Survey
FIRST radio 811117 Faint Images of the Radio Sky at Twenty centimeters
GSC-2 optical 455851237 The Guide Star Catalog Version 2.2
NVSS radio 1773484 1.4 GHz National Radio Astronomy Observatory Very Large Array Sky Survey
USNO B1.0 optical 1045913669 Whole-Sky United States Naval Observatory