Towards General-Purpose Resource Management in Shared Cloud

HotDep'14: Proceedings of the 10th USENIX conference on Hot Topics in System Dependability |

Published by USENIX Association

In distributed services shared by multiple tenants, managing resource allocation is an important pre-requisite to providing dependability and quality of service guarantees. Many systems deployed today experience contention, slowdown, and even system outages due to aggressive tenants and a lack of resource management. Improperly throttled background tasks, such as data replication, can overwhelm a system; conversely, high-priority background tasks, such as heartbeats, can be subject to resource starvation. In this paper, we outline ve design principles necessary for ešective and e›cient resource management policies that could provide guaranteed performance, fairness, or isolation.We present Retro, a resource instrumentation framework that is guided by these principles. Retro instruments all system resources and exposes detailed, real-time statistics of pertenant resource consumption, and could serve as a base for the implementation of such policies.