Sensor Information Retrieval from Internet of Things: Representation and Indexing

 

ABSTRACT

 

 Billions of devices are connected in the Internet of Things (IoT) based sensor networks and they continuously generate a large volume of data. In order to get access to specific data, which is crucial to enable a myriad of new intelligent applications, efficient information retrieval becomes an imminent need for IoT. However, sensor information in the physical world can be heterogeneous, high dimensional and voluminous due to the complex and dynamic environments. In this paper, we first investigate several IoT search scenarios and propose a uniform representation model for sensor information recordings. Four query models are designed to represent all possible information query styles. With these models, we develop an information retrieval architecture for IoT. In essence, an indexing mechanism called efficiency maximization and cost minimization (EMCM) is proposed to solve the property selection problem in the process of index construction and update. Meanwhile, a novel real-time grid R-tree (RtGR-tree) structure is designed to support historical and real-time search for spatiotemporal observation data. Simulation results based on realworld IoT datasets show that storage space is considerably reduced with the sensor model. Furthermore, the proposed indexing mechanisms can improve retrieval efficiency and accuracy, and ensure scalability for large-sized data simultaneously.

EXISTING SYSTEM :

THE vision of the Internet of Things (IoT) will connect physical objects to the Internet and allow them to interact with the physical world, which can support many new intelligent applications, such as smart homes, online monitoring systems, and green transportation systems. According to a recent white paper from Cisco, more than 50 billion devices will connect to the Internet in 2020. Thereafter, a large volume of sensor data would be generated by these distributed sensors. To obtain valuable knowledge about physical objects and surrounding environments, information retrieval from distinct sensors is dominant and should be carefully addressed. Since massive sensors are normally connected by IoT applications, sensor information may have multiple sources with heterogeneous data formats. For example, in a green transportation system, sensor information can be generated by vehicles, passengers, roadside facilities, and the cloud center . The plug-in powered cameras installed in roadside facilities do not have to be marked with “battery life” information, while for energy-constrained sensors, such as a passenger’s mobile phone, “battery life” is an important property. As there is a lack of standard expression form, which would lead to conflicts and uncertainties in object searching, the representation of sensor information becomes fundamental and should be solved for information retrieval. Given that sensors are distributed in dynamic environments at a large scale, vast amounts of observed data would be generated with spatiotemporal characteristics. Therefore, efficient indexing design is essential in IoT-based information retrieval. However, it is extremely challenging to develop a sensor information representation model and design indexing methods for data retrieval under dynamic IoT environments. First, to fully utilize sensor information, information retrieval should support both sensor-related data or sensor property data, such as quality of measure and battery life, and sensor data or observation data which is observed by sensors/devices with spatial-temporal characteristics. To this end, a uniform representation model that is compatible with heterogeneous data format has to be set up. Second, the selection of appropriate properties for index construction is not trivial because sensor-related data possesses many properties. The indexing method needs to maximize retrieval efficiency while to minimize the maintenance cost. Third, since indexes can be easily affected by out-of-date information, they should be frequently updated due to the continuously generated sensor data. Finally, efficient and scalable retrieval for point and range queries1 must be guaranteed when the scale of sensor data size varies, especially for large-sized data.

PROPOSED SYSTEM :

In this work, we focus on retrieving sensor information from a distributed, diverse, and dynamic IoT environment. To record and store multi-source spatiotemporal data, we design a uniform sensor model that enables heterogeneous data formats. Through an overall investigation of search queries in different IoT applications, we summarize four basic query models that can be easily combined to compose all possible queries. Following these models, we propose an information retrieval architecture for IoT search service. The architecture involves a data storage unit, a query interface, and indexing schemes. In terms of indexing, for multi-source dynamic sensor-related data, we develop an indexing mechanism called efficiency maximization and cost minimization (EMCM), in which the most valuable properties for constructing indexes are selected based on their differential efficiency and maintenance cost. Index updating is executed with the adaptive pre-adjustment algorithm (APA) and it contributes to improve the retrieval efficacy and accuracy. For sensor data with a large scale and high dimensions, we devise a new index structure called real-time grid R-tree (RtGR-tree) that can support both point and range queries. On the basis of four real-world IoT datasets, we conduct extensive simulations to demonstrate the efficacy of proposed approaches. The four datasets are as follows: set A records from a public IoT platform (i.e., ThingSpeak); set B is the Intel Lab Dataset; set C is a multi-source sensor-related (MSD) dataset manually collected from sensor introduction pages in Amazon and set D is T-Drive Taxi Trajectories. Specifically, the sensor model can save more than 50% of storage space verified in datasets A and B. Retrieval efficiency and accuracy are improved by the EMCM mechanism for sensor-related data in datasets C and D. RtGR-tree outperforms state-of-theart indexing structures for large-scale and high-dimensional sensor observation data, as those in dataset B. The results prove that the proposed techniques can realize efficient and accurate retrieval of various sensor information under the IoT environment.

 CONCLUSION :

In this paper, we have implemented information retrieval by designing representation model and developing indexing schemes. In order to provide a uniform representation model for multi-source and complex information, we have proposed a heterogeneity-enabled sensor model based on OGC standards. The model involves the recording of sensor properties and observation data at the same time. In essence, four basic query models have been built from various queries in IoT, which can compose all possible queries. To support efficient and accurate data retrieval for all these queries, we have proposed the EMCM indexing mechanism and the RtGRtree indexing structure to construct and maintain indexes for respective sensor properties and observation data. Extensive simulations were conducted to demonstrate the effectiveness of the proposed approaches.

Our future work will focus on applying the sensor and query models to a real IoTbased sensor network to evaluate the efficacy of the proposed approaches.