Nowadays, mobile APPs extensively use cloud storage as their fundamental services to provide data access. Mobile cloud storage (MCS) is pervasively built for sharing and accessing user data for seamless user experience. The usages of MCS are mainly from smartphones/tablets. Current development of MCS does not show any difference than traditional cloud storage. In many practical scenarios, the access patterns from mobile devices are quite different from legacy desktop computers. In a mobile environment, users usually show more differential features in terms of usage patterns. However, in recent MCS, these differences are not introduced in optimizing cloud storage for mobile users. In other words, the mobile systems and desktop systems are still using traditional cloud storage that does not consider these new features in mobile users. This causes noticeable deterioration of data access from mobile devices.
To solve the aforementioned problems, we propose to use distributed storage. In the system, a large number of lightweight storage servers are widely deployed, with more density than traditional data centers. As we know, it is not practical to deploy many data centers around a region. Usually, we just have a couple of big data centers scattered geographically. It is more feasible to just deploy lightweight storage servers which are cheaper and more flexible than a data center. The basic idea is to deploy data nearer to users. It is obviously that we can do some optimization to increase the performance of data access by assigning different users' data onto different storage servers in the distributed system. To get the optimal result, we propose a centralized method based on mixed-integer linear programming (MILP). To compromise the scalability of the system, we propose a distributed one based on stable matching. The solutions for both methods are studied. Simulation results for the methods are gotten from simulations. The result supports our discovers that the context information can help us to optimize storage efficiency in MCS.
By using Reed-Solomon erasure code, we can get benefits in dividing users' data to numbers of fragments. Based on the policy of the erasure code, the user can recover the original file by accessing only a pre-defined size of the fragments, which is smaller than the total size of fragments. At the very initial stage of storage, a user assigns fragments to a set of selected storage servers. Then the user can retrieve data from a subset of these servers. In our proposed algorithms, the servers in the subset have the most efficient performance for the particular user at the particular place.
Formally speaking, the objective function of the optimization is to minimize the expected global transmission time in data access of all users. Assume all users are in the set \(U\). We use \(X\) and \(Y\) as matrices of decision variables denote selected storage nodes to store data chunks, and checksum chunks, respectively. We set \(Z\) as a matrix of decision variable for storage nodes chosen to provide read service. We use \( T_h^{op} \) to denote the expected transmission time of user \(h\), where \( h \in U \). We obtain the objective function as: \( \min_{ (X, Y, Z) } \sum_{h \in U} T^{op}_h \).
The constraints of the objective function follow the 3 listed context information.MILP is a centralized method that is running on a server as management controller of the whole system. Because the MILP method is extremely expensive in computation, we propose a distributed method, that is running on each node. In the method, we build a bipartite graph. In this graph, we divide nodes into two sets. One set includes all users. Another set includes all storage servers. The edges between the two sets define the connected server stored the data from the connected user. To get the edges, we can use a stable matching algorithm to reach a status that all entries are satisfied.