Publication authored by Artjom Lind, Amnir Hadachi, Oleg Batrashev. You can find our publication here link.
In contemporary times, mobile devices come equipped with numerous active radio transceivers, operating ubiquitously. Consequently, mobile operators gather extensive volumes of data and technical records. This circumstance endows mobile phones with significant potential for sensing urban and rural mobility[1] and population shifts[2].
The proposed method involves three key steps: A Kalman filter was initially utilized to determine the movement model (Move or Stay)[3] associated with the coverage areas where the mobile device was connected. Subsequently, simultaneous estimation of the device’s location or position takes place. The outcomes of this approach are highly encouraging; nonetheless, there is room for improvement in movement models.
Enhancements could involve incorporating more sophisticated data-driven movement models and introducing a refined map matching by associating Stay locations with buildings and Move models with roads. These advancements could further elevate the accuracy and efficiency of the positioning process.
The study introduces an innovative method that hinges on the utilization of CDR data combined with an enhanced Kalman filter. This filter, known for its efficacy in processing random signals, is integrated with mobility models to accurately estimate mobile subscriber locations within a network’s coverage area. The uniqueness of this method lies in its reliance solely on network coverage cell data, eliminating the need for GPS or other additional location data.
The challenges of pinpointing a mobile subscriber’s precise location within a mobile network coverage area using solely CDR data are evident due to their sparseness[4] in time and space, as highlighted in Fig. 1, which indicates a dearth of references regarding the use of CDR data for exact location extraction in the mobile network.
Fig. 1: Example of CDR-based Trajectory in an Urban area.
Prior to outlining the employed methodology, an explanation of the utilized data and its purposes is necessary. Three distinct data sources served three specific objectives. Initially, GPS data collected through a customized mobile application, recording both CellID and GPS locations simultaneously, provided insights into the phone’s location based on GPS coordinates and connected CellID or coverage area. This data aided in optimizing coverage areas, addressing instances where phones were connected to coverage areas despite being located outside them. Secondly, CDR data was utilized with an advanced Kalman filter algorithm to estimate phone locations. A mobile application gathered GPS data from phones to evaluate the algorithm’s performance, serving as ground truth for estimation assessment.
The coverage area information within mobile operators’ CDR data does not accurately represent the actual coverage zone. A subscriber might be connected to a transceiver (BTS) via a Cell ID while being situated outside the operator-declared coverage area for that BTS. To rectify this discrepancy, a proposed solution involves enhancing coverage representation by linking GPS data with cell events. This entails defining a function, f(u), penalizing substantial distances between the cell coverage circle (defined by its center and radius) and the GPS coordinates during cell events. The function optimizes by minimizing a penalty function, considering that coverage should not exceed the area defined by GPS.
Conventional Kalman Filters allow only a single transition matrix per step, limiting their prediction scope to specific behavior models (e.g., stable directed movement or stationary). An adaptive Kalman Filter incorporating discrete random variables was introduced to address this limitation and accommodate various behavior types and changes within trajectories. This modification introduces a discrete random variable, St, defining the model for the transition at each step. The Kalman Filter then computes the probability of each model at a given time based on evidence and a probability distribution of the hidden state variable associated with each model.
The belief state of the hidden variable at any time is represented as a mixture of Gaussians of all models scaled by their respective probabilities. Additionally, a model transition probability matrix is defined to determine the probability of transitioning between different models. To manage the exponential growth of the belief state, filtering and smoothing methods are employed, and computed using predefined behavior models.
The algorithm considers location and velocity as hidden variables, evolving according to specified equations and noise characteristics. It utilizes Bayes Network principles to ascertain a moving user’s coordinates and velocities. Distinct transition matrices define different behavior models; for instance, a moving user model has a specific transition matrix compared to a stationary user model. The observation model should reflect the user’s connected antenna and its coverage zone as observation errors.
Ultimately, the algorithm computes the probabilities of each model at any given time and the distribution of coordinates and velocity based on evidence Fig. 2. This process is applied to all evidence, resulting in more accurate, smoothed results in testing and validation.
Fig. 2: Coordinates and probabilities for Stay and Move models
In the testing phase, GPS data collected (as detailed in Section III) serves as the benchmark for assessing the algorithm, while the Kalman filter utilizes CDR data to estimate subscriber locations. Ensuring synchronization between the CDR data and the GPS data timeframe and users enables accurate comparison for evaluating the algorithm’s performance.
B. Results and Discussion Our method underwent testing using GPS data as a reference alongside 271 CDR records from various users. Subsequently, we compared the algorithm’s estimated positions against the actual GPS data. In one instance (Fig. 3), triangles denote the user’s GPS data during a train journey, while circles represent positions estimated by our method. Each circle is connected to its respective ground truth GPS data point by a segment. This illustration showcases intriguing results, revealing the algorithm’s ability to utilize only mobile network coverage areas to estimate positions accurately.
The initial analysis involved estimating errors by comparing the algorithm’s estimations with the GPS data. ** Table 1 ** outlines the output without coverage optimization, indicating an average error of 0.9 kilometers in stay models and 1.9 kilometers in move models.
Model | Estimation Error | Avg in (km) | Number of CDR Records |
---|---|---|---|
Stay | [Min:0.002 ; Max: 13.543 ] | 0.931 | 147 |
Move | [Min:0.027 ; Max: 13.061 ] | 1.946 | 124 |
Total Performance | [Min:0.002 ; Max: 13.543] | 1.438 | 271 |
Table 1: PERFORMANCE OF THE PROPOSED ALGORITHM WITHOUT COVERAGE OPTIMIZATION.
However, when applying coverage optimization (** Table 2 **), the algorithm demonstrates improvement, reducing the average error to 0.4 kilometers in stay models and 1.2 kilometers in move models. The implementation of coverage optimization visibly enhances the algorithm’s estimation accuracy.
Model | Estimation Error | Avg in (km) | Number of CDR Records |
---|---|---|---|
Stay | [Min:0.001 ; Max: 4.958 ] | 0.432 | 147 |
Move | [Min:0.008 ; Max: 10.749 ] | 1.279 | 124 |
Total Performance | [Min:0.001 ; Max: 10.749] | 0.892 | 271 |
Table 2: PERFORMANCE OF THE PROPOSED ALGORITHM WITH COVERAGE OPTIMIZATION.
Furthermore, the algorithm performs relatively better in estimating locations when users remain stationary than when they are in motion. Overall, the coverage optimization step significantly bolsters the algorithm’s estimation capabilities, positively impacting its performance.
Fig. 3: The Algorithm output ”case of a user traveling by train”; - the polygons are the optimized mobile network coverage; - the circle dots are the algorithm’s estimations (Estimated Locations), and the triangles are the GPS data (Real Positions).
The results of this approach were strikingly encourraging. The average error in the location estimation was significantly reduced compared to traditional methods. For stationary (“stay”) models, the error was reduced to 0.4 kilometers, and for mobile (“move”) models, to 1.2 kilometers. These findings are not only promising for the field of mobile positioning but also have practical applications in intelligent transportation systems and location-based services.
This research marks a significant step forward in the realm of mobile positioning. By leveraging CDR data and enhancing the Kalman filter, the study provides a more reliable and efficient way to estimate mobile user locations, especially in urban environments. As the world becomes increasingly mobile-centric, such methodologies will be pivotal in shaping the future of urban mobility and location-based services.
Zhang, Sihai, et al. “Computing on base station behavior using erlang measurement and call detail record.” IEEE transactions on emerging topics in computing 3.3 (2015): 444-453. ↩︎
Furletti, Barbara, et al. “Identifying users profiles from mobile calls habits.” Proceedings of the ACM SIGKDD international workshop on urban computing. 2012. ↩︎
Batrashev, O., Hadachi, A., Lind, A., & Vainikko, E. (2015, November). Mobility episode detection from CDR’s data using switching Kalman filter. In Proceedings of the Fourth ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems (pp. 63-69). ↩︎
Ficek, Michal, and Lukas Kencl. “Inter-call mobility model: A spatio-temporal refinement of call data records using a Gaussian mixture model.” 2012 Proceedings IEEE INFOCOM. IEEE, 2012. ↩︎