EEMCS EPrints Service
van Keulen, M. (2009) Probabilistic Data Integration. (Invited) In: 08421 Abstracts Collection - Uncertainty Management in Information Systems, 12 - 17 Oct 2008, Dagstuhl, Germany. pp. 8-8. Dagstuhl Seminar Proceedings (08421). Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. ISSN 1862-4405
Full text available as:
Official URL: http://drops.dagstuhl.de/opus/volltexte/2009/1942
In data integration efforts such as in portal development, much development time is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates or solve other semantic conflicts. It proofs impossible, however, to automatically get rid of all semantic problems. An often-used rule of thumb states that about 90% of the development effort is devoted to semi-automatically resolving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that strives for a 'good enough' initial integration which stores any remaining semantic uncertainty and conflicts in a probabilistic XML database. The remaining cases are to be resolved during use with user feedback.
Export this item as:
To correct this item please ask your editor
Repository Staff Only: edit this item