"Computing Fuzzy Rough Approximations in Large Scale Information System" by Hasan Asfoor, Rajagopalan Srinivasan et al.

School of Engineering and Technology Publications

Title

Computing Fuzzy Rough Approximations in Large Scale Information Systems

Authors

Hasan Asfoor
Rajagopalan Srinivasan
Gayathri Vasudevan
Nele Verbiest
Chris Cornells
Matthew Tolentino, University of Washington TacomaFollow
Ankur Teredesai
Martine De Cock

Publication Date

10-1-2014

Document Type

Conference Proceeding

Abstract

Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem.

Publication Title

2014 IEEE International Conference on Big Data (Big Data)

First Page

Last Page

DOI

10.1109/BigData.2014.7004350

Recommended Citation

Asfoor, Hasan; Srinivasan, Rajagopalan; Vasudevan, Gayathri; Verbiest, Nele; Cornells, Chris; Tolentino, Matthew; Teredesai, Ankur; and Cock, Martine De, "Computing Fuzzy Rough Approximations in Large Scale Information Systems" (2014). School of Engineering and Technology Publications. 29.
https://digitalcommons.tacoma.uw.edu/tech_pub/29

This document is currently not available here.

Find in your library

COinS

UW Tacoma Digital Commons

School of Engineering and Technology Publications

Title

Authors

Publication Date

Document Type

Abstract

Publication Title

First Page

Last Page

DOI

Recommended Citation

Browse

Author Corner

Links

SelectedWorks Sites

UW Tacoma Digital Commons

School of Engineering and Technology Publications

Title

Authors

Publication Date

Document Type

Abstract

Publication Title

First Page

Last Page

DOI

Recommended Citation

Share

Browse

Author Corner

Links

SelectedWorks Sites