EEMCS

Home > Publications
Home University of Twente
Education
Research
Prospective Students
Jobs
Publications
Intranet (internal)
 
 Nederlands
 Contact
 Search
 Organisation

EEMCS EPrints Service


17967 Indeterministic Handling of Uncertain Decisions in Duplicate Detection
Home Policy Brochure Browse Search User Area Contact Help

Panse, F. and van Keulen, M. and Ritter, N. (2010) Indeterministic Handling of Uncertain Decisions in Duplicate Detection. Technical Report TR-CTIT-10-21, Centre for Telematics and Information Technology University of Twente, Enschede. ISSN 1381-3625

Full text available as:

PDF

356 Kb
Open Access


Exported to Metis

Abstract

In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way.

Item Type:Internal Report (Technical Report)
Research Group:EWI-DB: Databases
Research Program:CTIT-NICE: Natural Interaction in Computer-mediated Environments
ID Code:17967
Deposited On:08 June 2010
More Information:statisticsmetis

Export this item as:

To correct this item please ask your editor

Repository Staff Only: edit this item