Technical tags

Research Project Data Mining Linear Programming Gurobi Solver Eclipse IDE Java Matlab Latex

Publication and Source code

Introduction

Nowadays, mobile APPs extensively use cloud storage as their fundamental services to provide data access. Mobile cloud storage (MCS) is pervasively built for sharing and accessing user data for seamless user experience. The usages of MCS are mainly from smartphones/tablets. Current development of MCS does not show any difference than traditional cloud storage. In many practical scenarios, the access patterns from mobile devices are quite different from legacy desktop computers. In a mobile environment, users usually show more differential features in terms of usage patterns. However, in recent MCS, these differences are not introduced in optimizing cloud storage for mobile users. In other words, the mobile systems and desktop systems are still using traditional cloud storage that does not consider these new features in mobile users. This causes noticeable deterioration of data access from mobile devices.

To solve the aforementioned problems, we propose to use distributed storage. In the system, a large number of lightweight storage servers are widely deployed, with more density than traditional data centers. As we know, it is not practical to deploy many data centers around a region. Usually, we just have a couple of big data centers scattered geographically. It is more feasible to just deploy lightweight storage servers which are cheaper and more flexible than a data center. The basic idea is to deploy data nearer to users. It is obviously that we can do some optimization to increase the performance of data access by assigning different users' data onto different storage servers in the distributed system. To get the optimal result, we propose a centralized method based on mixed-integer linear programming (MILP). To compromise the scalability of the system, we propose a distributed one based on stable matching. The solutions for both methods are studied. Simulation results for the methods are gotten from simulations. The result supports our discovers that the context information can help us to optimize storage efficiency in MCS.

Methods

We developed methods by considering the following challenges.
  1. Mobile users stay at different places with various time spans.
  2. Storing data on the node closest to the mobile user may not be the best choice due to network bottleneck.
  3. A user presents different data access preferences (intensity to read or write/update) at different places.
We solve the above problems by introducing these mechanisms.
  1. We investigate three common user context factors: network condition, user mobility pattern, and data access preference. Then we developed a context-aware analytical model based on these context information.
  2. We apply Reed-Solomon erasure code as basic storage model in the proposed distributed storage.
  3. A mixed integer linear programming (MILP) for the system model is established to get the optimal data access.
  4. To solve the problem that MILP is not scalable to the dynamic changing cases, a stable matching algorithm is introduced as a heuristic solution of the system.

By using Reed-Solomon erasure code, we can get benefits in dividing users' data to numbers of fragments. Based on the policy of the erasure code, the user can recover the original file by accessing only a pre-defined size of the fragments, which is smaller than the total size of fragments. At the very initial stage of storage, a user assigns fragments to a set of selected storage servers. Then the user can retrieve data from a subset of these servers. In our proposed algorithms, the servers in the subset have the most efficient performance for the particular user at the particular place.

Formally speaking, the objective function of the optimization is to minimize the expected global transmission time in data access of all users. Assume all users are in the set \(U\). We use \(X\) and \(Y\) as matrices of decision variables denote selected storage nodes to store data chunks, and checksum chunks, respectively. We set \(Z\) as a matrix of decision variable for storage nodes chosen to provide read service. We use \( T_h^{op} \) to denote the expected transmission time of user \(h\), where \( h \in U \). We obtain the objective function as: \( \min_{ (X, Y, Z) } \sum_{h \in U} T^{op}_h \).

The constraints of the objective function follow the 3 listed context information.
  1. Network conditions: it is measured as bandwidths between place of interests (POIs) and storage nodes
  2. User mobility pattern: we discover users' frequently visited POIs by data mining methods.
  3. Data accessing preferences: the probability of a user read/write files at each POI based on the long-term file operating history.

MILP is a centralized method that is running on a server as management controller of the whole system. Because the MILP method is extremely expensive in computation, we propose a distributed method, that is running on each node. In the method, we build a bipartite graph. In this graph, we divide nodes into two sets. One set includes all users. Another set includes all storage servers. The edges between the two sets define the connected server stored the data from the connected user. To get the edges, we can use a stable matching algorithm to reach a status that all entries are satisfied.

Performance Evaluation

We run simulations of proposed methods on networking layout as shown in the above figures. The two figures show two different types of layouts. One is a simple layout shows a possible assignment of data on storage nodes. Another one a more complicated layout, that is randomly generated in the simulation. Then we evaluate different algorithms and their running efficiency under the same scenario. The algorithms used in the simulations are list here:
  1. \( \textbf{CNT-0} \): we randomly choose storage nodes to store data from users.
  2. \( \textbf{CNT-1} \): we just consider the constraints of network conditions.
  3. \( \textbf{CNT-2} \): we consider network conditions and user mobility pattern.
  4. \( \textbf{CNT-OPT} \): we consider all 3 context information, network condition, user mobility pattern and data access preference.
The performance evaluation shows the performance of different algorithms with the varying of the average distance between users' POIs. Under all cases, the proposed method \( \textbf{CNT-OPT} \) outperforms the other methods.

Conclusion

In this research, we propose a context-aware distributed file storage strategy based on Reed-Solomon coding to improve user’s file access efficiency in mobile cloud computing. A mixed integer linear programming is applied to optimize the file access in both reading and updating from/to mobile devices by exploiting user context information. We minimize the overhead of the file accessing in the distributed storage. Erasure code gives redundancy to the system and accelerates the file access as well. The simulation studies reveal that based on user context-aware information such as mobility pattern, network condition, data access pattern, the file accessing efficiency can be improved significantly in distributed storage for users of mobile devices.