EEMCS EPrints Service
Panse, F. and van Keulen, M. and de Keijzer, A. and Ritter, N. (2010) Duplicate Detection in Probabilistic Data. In: Proceedings of the 2nd International Workshop on New Trends in Information Integration (NTII 2010), 5-6 Mar 2010, Long Beach, California, USA. pp. 179-182. IEEE Computer Society. ISBN 978-1-4244-6522-4
Full text available as:
Official URL: http://dx.doi.org/10.1109/ICDEW.2010.5452759
Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities.
Export this item as:
To correct this item please ask your editor
Repository Staff Only: edit this item