Benchmarking Large-Scale Data Management for Internet of Things
In the current era of the Internet of Things (IoT), massive number of sensors are used in our daily lives. Sensors are everywhere around us. They exist in our homes, work places, streets, cars, and even ourselves. Examples include home appliances, wearable devices, and medical sensors. These sensors generate huge amount of dynamic, heterogeneous, and unstructured data that need special handling beyond the capabilities of conventional relational databases. Thus, identification of suitable data management platform to store and query this data is necessary. Despite of its popularity and efficiency in processing various types of big data, there is no single-guided study of how NoSQL data stores will behave with the Internet of Things (IoT) datasets. IoT data have its own characteristics that make it special. IoT data come from various sensors, with a wide range of formats, high velocity, and require high throughput processing with low latency. NoSQL data stores are commonly used to provide flexibility and availability for big data handling. However, there is a lack of comprehensive studies about which NoSQL data store performs the best from the two scalability aspects (scale-up and scale-out) in a distributed and parallel processing environment. This paper benchmarks the commonly used NoSQL data stores (MongoDB, Cassandra, and HBase), and compares their performance with real industrial IoT dataset. In addition, we focus on comparing the throughput, latency, and run time of the evaluated NoSQL data stores.
The Journal of Supercomputing
pre print, post print
Teredesai, Ankur; Hendawi, Abdeltawab; Gupta, Jayant; Liu, Jiayi; Ramakrishnan, Naveen; Shah, Mohak; El-Sappagh, Shaker; Kwak, Kyung-Sup; and Ali, Mohamed, "Benchmarking Large-Scale Data Management for Internet of Things" (2019). School of Engineering and Technology Publications. 364.