03
  • Introduction
  • 3.1
  • 3.2
  • 3.3
  • Scalability

    Scalability and its impact a fundamental issue in the cloud.

    Several related lectures:-

    1.Reliability and Scalability (this one)

    2.Architecture and Scaling Techniques

    3.Data Scalability

    4.Consistency

    Overview

    Reliability
    Faults, Failures and Availability
    The Bathtub Curve Model

    Scalability
    Vertical and Horizontal Scalability

    Measuring Scalability
    Ahmdal’s Law
    Universal Scalability Law

    Reliability and Scalability

    In the cloud these are related.
    reliability is about avoiding failure in the presence of individual component faults
    scalability is about maintaining performance as demand grows

    Both have to be addressed by architecting applications correctly.

    Both mean some of the same techniques have to be used e.g. replication.

  • Reliability

    The cloud is a large-scale distributed system made out of many commodity components – individual solution component faults (hardware and/or software) will be commonplace.

    AWS say:-
    “the reality is that most software systems will degrade over time. This is due in part to any or all of the following reasons:
    Software will leak memory and/or resources. This includes software that you have written, as well as software that you depend on (e.g., application frameworks, operating systems, and device drivers).
    File systems will fragment over time and impact performance.
    Hardware (particularly storage) devices will physically degrade over time.”

    🔗 Building Fault-Tolerant Applications on AWS

    Faults and Failures

    AWS go on to say:-

    “Disciplined software engineering can mitigate some of these problems, but ultimately even the most sophisticated software system is dependent on a number of components that are out of its control (e.g., operating system, firmware, and hardware). Eventually, some combination of hardware, system software, and your software will cause a failure and interrupt the availability of your application.”

    So all the time we have a multitude of unavoidable faults happening and under these conditions we have to architect our cloud application to try to avoid failure.

    Fault tolerance is achieved by designing the system with a high degree of hardware redundancy.
    but the software has to be designed to use it
    basically replicated instances of services

    Note that some outages may be caused by upgrades in progress rather than faults or by hardware or software overloading.

    eBay:

    “Design to fail”: implement automated fault detection and a degraded “limp mode”

    The system should be distributed into fault domains (or fault zones).
    a fault domain is a set of hardware components – computers, switches, and more – that share a single point of failure
    should separate service instances into separate fault domains thereby improving reliability

    📷 The Bathtub Curve Model

    above diagram is really modelling fault occurrence i.e. independent faults of a particular hardware component type

    note difference between blue and commonly (naively) assumed green lines

    Measuring Reliability

    MTBF
    mean time between failures
    really meaning between component faults as defined previously...
    generally used to calculate the probability of failure for a single solution component
    average time interval, usually expressed in thousands or tens of thousands of hours that elapses before a component fails and requires service
    a basic measure of a system’s reliability and is usually represented as units of hours

    MTTR
    mean time to repair
    the average (expected) time taken to repair a failed component
    again usually represented as units of hours

    Availability
    a measure of the reliability of a service
    $\rho$ is the probability that the service is unavailable either through component failure or network partition
    calculated as a fraction of the time an individual service instance is unavailable i.e. dimensionless
    $\rho = MTTR / (MTBF + MTTR)$
    availability of the service is defined as $1 - \rho$
    availability as a quantity is popularly given as the equivalent percentage of uptime i.e. 99.5% rather than 0.995

    Availability of replicated services
    the overall reliability of a system composed of set of replicated, identical and independent service instances in a parallel configuration
    availability of a system of $n$ replicated services is $1 - \rho ^n$
    where probability of all instances having failed at any instant is $\rho ^n$

    Availability of dependent services
    when two (or more) connected services are required to be available together in order for the system to be available
    i.e. if any of these services fail the system is considered to be unavailable
    aggregate availability of a system of connected dependent services is
    $(1 - \rho^i) * (1 - \rho^j) * (1 - \rho^k) \dots$
    where $i$, $j$ and $k$ etc. are the replication factors of the separate sub-systems of services as above

    Availability

    Availability is often quoted in “nines”.
    e.g. a solution with an availability level of "three 9s" is capable of supporting its intended function 99.9 percent of the time—equivalent to an annual downtime of 8.76 hours per year on a $24 \times 365$ basis.

    Gartner suggest:-
    high availability = 99.9% availability – “three 9s”
    normal commercial availability = 99 – 99.5% availability –87.6 – 43.8 hours/year downtime

    “AWS will use commercially reasonable efforts to make Amazon EC2 and Amazon EBS each available with a Monthly Uptime Percentage of at least 99.95%”...

    Should not underestimate the importance put on high application availability by modern business.

    X
    The Bathtub Curve Model Diagram
  • Scalability

    Definition: A system is said to be scalable if it can handle the addition of users and resources without suffering a noticeable loss of performance or increase in administrative complexity.

    Factors:
    number of users/objects/machines in the system (congestion, load, failure points and probability of failure, heterogeneity)
    distance between the parts of the system (hard to manage, latency)
    number of organisations that have administrative control over the system (heterogeneity, management issues)
    Scalability Diagram

    Scalability Factor

    Every component whether processors, servers, storage drives or load-balancers have some kind of management/operational overhead.

    When attempting to increase the scale of a system by introducing new components or, increasing the capacity of existing ones, it is important to understand what percentage of the newly configured system is actually usable.

    This measurement is called the scalability factor i.e. if only 95% of the expected total new processor power is available when adding a new CPU to the system, then the scalability factor is 0.95.

    We can further quantify scalability factor as:-
    linear scalability: the performance improvement remains proportional to the upgrade as far as you scale i.e. a scalability factor of 1.0.
    sublinear scalability: a scalability factor below 1.0. In reality some components will not scale as well as others. In fact the scalability factor may itself tail off as more resources are progressively added.
    sub-linear scalability likely to be the reality
    see section on Amdahl/USL at end
    superlinear scalability: although rare, it is possible to get better than expected ( > 1.0) performance increases e.g. some Hadoop cloud applications
    however Gunther has shown that this behaviour is exhibited at small scale only with a penalty as the systems gets larger
    negative scalability: if the application is not designed for scalability, it is possible that performance can actually get worse as it scales.
    Scalability Factor Diagram

    Scalability

    There are two key primary ways of scaling architectures:-

    Vertical Scalability: adding resource within the same logical unit to increase capacity. An example of this would be to add CPUs to an existing server, and more memory and/or expanding storage by adding hard drive on an existing RAID/SAN storage. Sometimes this is termed scaling up.

    Horizontal Scalability: adding multiple logical units of resources and making them work as a single unit. Most clustering solutions, distributed file systems, load-balancers aid horizontal scalability. Usually this is achieved using less powerful machines than the single one used in vertical scalability above. Sometimes this is termed scaling out.

    Cloud computing is a key example of horizontal scalability.

    Vertical Scalability + and –

    Often can be achieved without considering new architectural considerations and without changing application code +

    Gets non-linearly more expensive as more resources added –

    Better for building a solution for a future capped number of users

    In reality the ability to scale vertically is limited.

    Horizontal Scalability + and –

    Software has to be designed to be split horizontally –
    this can be hard!
    what this lecture’s really about...

    Replica consistency a big issue –

    More points of failure (hardware/network) –

    Hardware cheaper (not necessarily cheap though) +

    a driving factor

    Better for building a solution to cater for many future concurrent users simultaneously.

    A multi-tier/layer system may in fact lend itself to being split horizontally.

    eBay:

    “If you can’t split it, you can’t scale it”: eBay realised early on that anything which cannot be split into smaller components can’t be scaled!

  • Quantifying Scalability

    Will look at how to measure the effectiveness of scaling i.e. to determine a cost/benefit analysis of adding hardware:-

    Amdahl’s Law: really a better measure for supercomputers and clusters involved in HPC (high-performance computing).

    however an interesting insight into the limits of scalability

    Universal Scalability Law: a more realistic measure of scalability in distributed systems taking into account overheads of communication and synchronisation.

    Scaling by Adding More Cores

    The clock speeds of microprocessors are not going to improve much in the foreseeable future.

    The number of transistors in a microprocessor is still growing at a high rate.

    one of the main uses of transistors has been to increase the number of computing cores the processor has

    However as cores are added the performance of the system does not scale linearly.
    governed by Amdahl’s Law and by the Universal Scalability Law
    a good example of sub-linear scalability

    Here, for simplicity, think of 1 core VMs - so the unit of increasing parallel computation is a VM instance.

    but this argument can be adapted/extended to multi-core processors or multi-processor VMs etc.

    Quantifying Scalability

    Speed-up $S$ measures the effectiveness of parallelization:

    $$ S = \frac{T(1)}{T(N)}$$

    $T(1)$: execution time of serial computation.

    $T(N)$: execution time of $N$ parallel cores.

    and of course $T(N) < T(1)$ so hence the arrangement of numerator and denominator

    Amdahl's Law

    (1). order to keep the ever increasing number of cores busy, the applications must be extremely parallel to exploit the benefits of increasing transistor counts.

    (2). quantify this Amdahl's law states that for a given application contention $\sigma$ i.e. the proportion of the code that cannot be parallelised (remains serial), then the maximum speedup that can be achieved by using N cores is:-

    $$ S = \frac{N}{1 + \sigma(N-1)} $$

    consequently $(1 - \sigma)$ is the proportion of a program that can be made parallel (i.e. benefit from parallelisation)
    notice the contention limits $S$ and what happens for large value $N$
    Amdahl's law Diagram

    Amdahl's Law - Cloud Context

    Andahl's law is an interesting basis for measuring scalability
    but it is ultimately "a law of diminishing returns"
    it was originally defined for use with large standalone computers solving hard mathematical problems

    By itself it is only applicable to certain workloads in the cloud as it does not model other factors such as network latency between nodes or synchronisation factors.
    e.g. good for embarrassingly parallelised applications
    little effort to distribute into many parallel tasks in separate VMs
    no (or little) dependency or communication between parallel task VMs

    Universal Scalability Law (USL)

    Gunther (2007/2015)

    (1). A more realistic extension of Amdahl’s Law for general cloud applications.

    (2). A new factor is included in equation:-

    coherence $(κ)$: the cost of consistency e.g. the time it takes to synchronise data in different nodes in a system or to introducing locking i.e. the penalty incurred to make resources consistent

    In practice coefficient values for contention and coherence are determined experimentally from controlled experiments and then used to determine how well the system will scale with more hardware added.

    allows determination of optimal optimisation and system bottlenecks

    $$ \frac{N}{1 + \sigma(N + 1) + kN(N-1)}$$

    Better describes real systems as the performance degrades beyond an optimal point

    If $κ = 0$ then USL reduces to Amdahl’s Law

    If $\sigma = 0$ and $κ = 0$ then USL and Amdahl’s Law reduce to model linear scalability

    Universal Scalability Law Diagram
School of Computing, Engineering and Built Environment