Commands in VM-MAD

Author:Tyanko Aleksiev <tyanko.alexiev@gmail.com>
Date:2012-04-29
Revision:$Revision$

Commands

This article explains the commands available inside VM-MAD. More precisely, it gives an initial overview of the command, followed by a description of the possible interactions with other commands. References to input/output files’ format is also being provided.

Simulation

VM-MAD has an integrated simulation suite which enables processing SGE accounting data. The main idea of this implemetation can be associated with the answer of the question: “What would be the evolution of my cluster’s queue during the time if I had on my disposal X always running servers and the possibility to spawn Y Virtual Machines on demand?”. Where X and Y are variables that can be chosen by the final user. The simulation process involves three different parts:

  • provided accounting data has to be first elaborated from the distil.py tool. For more information see the Distill section,
  • once the accounting data is available a simulation can be started using the simul.py tool,
  • finally the plot_workload.R R script is used for graphically represent the results.

The output produced by the distil.py tool is needed before starting a new simulation. The Distill Output section describes in more detail what kind of information the distill tool is providing to the simulator suite.

A new simulation can be set-up by using the provided options, to see all of them:

(vm-mad)vm-user@test:~ ./simul.py --help
usage: simul.py [-h] [--max-vms N] [--max-delta N] [--max-idle NUM_SECS]
            [--startup-delay NUM_SECS] [--csv-file String]
            [--output-file String] [--cluster-size NUM_CPUS]
            [--start-time String] [--time-interval NUM_SECS] [--version]

Simulates a cloud orchestrator

optional arguments:
-h, --help            show this help message and exit
--max-vms N, -mv N    Maximum number of VMs to be started, default is 10
--max-delta N, -md N  Cap the number of VMs that can be started or stopped
                    in a single orchestration cycle. Default is 1.
--max-idle NUM_SECS, -mi NUM_SECS
                    Maximum idle time (in seconds) before swithing off a
                    VM, default is 7200
--startup-delay NUM_SECS, -s NUM_SECS
                    Time (in seconds) delay before a started VM is READY.
                    Default is 60
--csv-file String, -csvf String
                    File containing the CSV information, accounting.csv
--output-file String, -o String
                    File name where the output of the simulation will be
                    stored, main_sim.txt
--cluster-size NUM_CPUS, -cs NUM_CPUS
                    Number of VMs, used for the simulation of real
                    available cluster: 20
--start-time String, -stime String
                    Start time for the simulation, default: -1
--time-interval NUM_SECS, -timei NUM_SECS
                    UNIX interval in seconds used as parsing interval for
                    the jobs in the CSV file, default: 3600
--version, -V         show program's version number and exit

The --max-vms and --cluster-size options are probably the most important as they permit you to simulate different configuration scenarios. The --max-vms allows you to set how expandable, in terms of VMs, your cluster could be. The --cluster-size options permits you to fix the simulated dimension of your locally availbale cluster.

Once the simulation is completed you can compute the results using the plot_workload.R script:

(vm-mad)vm-user@test:~ ./plot_workload.R simulation_output_file output_file

Two files are produced at the end: output_file.pdf and output_file.eps. They represent what would be the graphical evolution of your queue with the specified options.

Distill

The purpose of the distil.py tool is to elaborate different kind of scheduling information and produce an output in CSV format legible from the simulator suite. The following data input formats are currently recognized by the tool:

  • accounting data provided by SGE,
  • the output given by querying the SGE scheduler with the qstat -xml command. (working in progress)

You can see all the provided options by simply doing ./distil.py -h

Distill Output

The output produced by the distil.py is in the CSV format tool has the following aspect:

JOBID, SUBMITTED_AT, RUNNING_AT, FINISHED_AT, WAIT_DURATION, RUN_DURATION
1,     1282733694,   1282733707, 1282733785,  13,            78
4,     1282736899,   1282736911, 1282737239,  12,            328
6,     1282738136,   1282738141, 1282738141,  5,             0
7,     1282738434,   1282738441, 1282738568,  7,             127
8,     1282739338,   1282739342, 1282740438,  4,             1096

The first row of the file is quite self-explaining about what kind of information, each of the columns, is containing.

Project Versions

Table Of Contents

Previous topic

VM-MAD modules

This Page