++++++++++++++++++++
 Commands in VM-MAD
++++++++++++++++++++

:Author:   Tyanko Aleksiev <tyanko.alexiev@gmail.com>
:Date:     2012-04-29
:Revision: $Revision$


.. This file follows reStructuredText markup syntax; see
   http://docutils.sf.net/rst.html for more information


Commands
========

This article explains the commands available inside VM-MAD. More precisely, it gives an
initial overview of the command, followed by a description of the possible interactions with other commands.
References to input/output files' format is also being provided.

Simulation
==========

VM-MAD has an integrated simulation suite which enables processing SGE accounting data. The main idea of this implemetation can be associated with the answer of the question: 
"What would be the evolution of my cluster's queue during the time if I had on my disposal X always running servers and the possibility to
spawn Y Virtual Machines on demand?". Where X and Y are variables that can be chosen by the final user. The simulation process involves three
different parts:

    * provided accounting data has to be first elaborated from the distil.py tool. For more information see the :ref:`distill` section,
 

    * once the accounting data is available a simulation can be started using the simul.py tool,


    * finally the plot_workload.R R script is used for graphically represent the results. 

The output produced by the distil.py tool is needed before starting a new simulation. The :ref:`distoutput` section
describes in more detail what kind of information the distill tool is providing to the simulator suite. 

A new simulation can be set-up by using the provided options, to see all of them::

    (vm-mad)vm-user@test:~ ./simul.py --help
    usage: simul.py [-h] [--max-vms N] [--max-delta N] [--max-idle NUM_SECS]
                [--startup-delay NUM_SECS] [--csv-file String]
                [--output-file String] [--cluster-size NUM_CPUS]
                [--start-time String] [--time-interval NUM_SECS] [--version]

    Simulates a cloud orchestrator

    optional arguments:
    -h, --help            show this help message and exit
    --max-vms N, -mv N    Maximum number of VMs to be started, default is 10
    --max-delta N, -md N  Cap the number of VMs that can be started or stopped
                        in a single orchestration cycle. Default is 1.
    --max-idle NUM_SECS, -mi NUM_SECS
                        Maximum idle time (in seconds) before swithing off a
                        VM, default is 7200
    --startup-delay NUM_SECS, -s NUM_SECS
                        Time (in seconds) delay before a started VM is READY.
                        Default is 60
    --csv-file String, -csvf String
                        File containing the CSV information, accounting.csv
    --output-file String, -o String
                        File name where the output of the simulation will be
                        stored, main_sim.txt
    --cluster-size NUM_CPUS, -cs NUM_CPUS
                        Number of VMs, used for the simulation of real
                        available cluster: 20
    --start-time String, -stime String
                        Start time for the simulation, default: -1
    --time-interval NUM_SECS, -timei NUM_SECS
                        UNIX interval in seconds used as parsing interval for
                        the jobs in the CSV file, default: 3600
    --version, -V         show program's version number and exit

The ``--max-vms`` and ``--cluster-size`` options are probably the most important as they permit you 
to simulate different configuration scenarios. The ``--max-vms`` allows you to set how expandable,
in terms of VMs, your cluster could be. The ``--cluster-size`` options permits you to fix the simulated 
dimension of your locally availbale cluster.

Once the simulation is completed you can compute the results using the plot_workload.R script::

    (vm-mad)vm-user@test:~ ./plot_workload.R simulation_output_file output_file 

Two files are produced at the end: output_file.pdf and output_file.eps. They represent what would be the graphical
evolution of your queue with the specified options. 


.. _distill:

Distill
-------
The purpose of the ``distil.py`` tool is to elaborate different kind of scheduling information and produce
an output in CSV format legible from the simulator suite. The following data input formats are currently
recognized by the tool:

    *  accounting data provided by SGE,
    *  the output given by querying the SGE scheduler with the ``qstat -xml`` command. (working in progress)

You can see all the provided options by simply doing ``./distil.py -h`` 

.. _distoutput:

^^^^^^^^^^^^^^
Distill Output
^^^^^^^^^^^^^^

The output produced by the distil.py is in the CSV format tool has the following aspect::

        JOBID, SUBMITTED_AT, RUNNING_AT, FINISHED_AT, WAIT_DURATION, RUN_DURATION
        1,     1282733694,   1282733707, 1282733785,  13,            78
        4,     1282736899,   1282736911, 1282737239,  12,            328
        6,     1282738136,   1282738141, 1282738141,  5,             0
        7,     1282738434,   1282738441, 1282738568,  7,             127
        8,     1282739338,   1282739342, 1282740438,  4,             1096

The first row of the file is quite self-explaining about what kind of information, each of the columns, is containing.
 
.. References

.. _subversion: http://subversion.tigris.org/
.. _sphinx: http://sphinx.pocoo.org/
.. _virtualenv: http://pypi.python.org/pypi/virtualenv/1.5.1


.. (for Emacs only)
..
  Local variables:
  mode: rst