Visengeriyeva, L and Akbik, A and Kaul, Manohar and Rabl, T and Markl, V
(2016)
Improving Data Quality by Leveraging Statistical Relational
Learning.
In: International Conference on Information Quality, 22-23 June, 2016, Ciudad Real, Spain.
Abstract
Digitally collected data su
↵
ers from many data quality issues, such as duplicate, incorrect, or incomplete data. A common
approach for counteracting these issues is to formulate a set of data cleaning rules to identify and repair incorrect, duplicate and
missing data. Data cleaning systems must be able to treat data quality rules holistically, to incorporate heterogeneous constraints
within a single routine, and to automate data curation. We propose an approach to data cleaning based on statistical relational
learning (SRL). We argue that a formalism - Markov logic - is a natural fit for modeling data quality rules. Our approach
allows for the usage of probabilistic joint inference over interleaved data cleaning rules to improve data quality. Furthermore, it
obliterates the need to specify the order of rule execution. We describe how data quality rules expressed as formulas in first-order
logic directly translate into the predictive model in our SRL framework.
Actions (login required)
|
View Item |