++++++++++++++++++++ Commands in VM-MAD ++++++++++++++++++++ :Author: Tyanko Aleksiev :Date: 2012-04-29 :Revision: $Revision$ .. This file follows reStructuredText markup syntax; see http://docutils.sf.net/rst.html for more information Commands ======== This article explains the commands available inside VM-MAD. More precisely, it gives an initial overview of the command, followed by a description of the possible interactions with other commands. References to input/output files' format is also being provided. Simulation ========== VM-MAD has an integrated simulation suite which enables processing SGE accounting data. The main idea of this implemetation can be associated with the answer of the question: "What would be the evolution of my cluster's queue during the time if I had on my disposal X always running servers and the possibility to spawn Y Virtual Machines on demand?". Where X and Y are variables that can be chosen by the final user. The simulation process involves three different parts: * provided accounting data has to be first elaborated from the distil.py tool. For more information see the :ref:`distill` section, * once the accounting data is available a simulation can be started using the simul.py tool, * finally the plot_workload.R R script is used for graphically represent the results. The output produced by the distil.py tool is needed before starting a new simulation. The :ref:`distoutput` section describes in more detail what kind of information the distill tool is providing to the simulator suite. A new simulation can be set-up by using the provided options, to see all of them:: (vm-mad)vm-user@test:~ ./simul.py --help usage: simul.py [-h] [--max-vms N] [--max-delta N] [--max-idle NUM_SECS] [--startup-delay NUM_SECS] [--csv-file String] [--output-file String] [--cluster-size NUM_CPUS] [--start-time String] [--time-interval NUM_SECS] [--version] Simulates a cloud orchestrator optional arguments: -h, --help show this help message and exit --max-vms N, -mv N Maximum number of VMs to be started, default is 10 --max-delta N, -md N Cap the number of VMs that can be started or stopped in a single orchestration cycle. Default is 1. --max-idle NUM_SECS, -mi NUM_SECS Maximum idle time (in seconds) before swithing off a VM, default is 7200 --startup-delay NUM_SECS, -s NUM_SECS Time (in seconds) delay before a started VM is READY. Default is 60 --csv-file String, -csvf String File containing the CSV information, accounting.csv --output-file String, -o String File name where the output of the simulation will be stored, main_sim.txt --cluster-size NUM_CPUS, -cs NUM_CPUS Number of VMs, used for the simulation of real available cluster: 20 --start-time String, -stime String Start time for the simulation, default: -1 --time-interval NUM_SECS, -timei NUM_SECS UNIX interval in seconds used as parsing interval for the jobs in the CSV file, default: 3600 --version, -V show program's version number and exit The ``--max-vms`` and ``--cluster-size`` options are probably the most important as they permit you to simulate different configuration scenarios. The ``--max-vms`` allows you to set how expandable, in terms of VMs, your cluster could be. The ``--cluster-size`` options permits you to fix the simulated dimension of your locally availbale cluster. Once the simulation is completed you can compute the results using the plot_workload.R script:: (vm-mad)vm-user@test:~ ./plot_workload.R simulation_output_file output_file Two files are produced at the end: output_file.pdf and output_file.eps. They represent what would be the graphical evolution of your queue with the specified options. .. _distill: Distill ------- The purpose of the ``distil.py`` tool is to elaborate different kind of scheduling information and produce an output in CSV format legible from the simulator suite. The following data input formats are currently recognized by the tool: * accounting data provided by SGE, * the output given by querying the SGE scheduler with the ``qstat -xml`` command. (working in progress) You can see all the provided options by simply doing ``./distil.py -h`` .. _distoutput: ^^^^^^^^^^^^^^ Distill Output ^^^^^^^^^^^^^^ The output produced by the distil.py is in the CSV format tool has the following aspect:: JOBID, SUBMITTED_AT, RUNNING_AT, FINISHED_AT, WAIT_DURATION, RUN_DURATION 1, 1282733694, 1282733707, 1282733785, 13, 78 4, 1282736899, 1282736911, 1282737239, 12, 328 6, 1282738136, 1282738141, 1282738141, 5, 0 7, 1282738434, 1282738441, 1282738568, 7, 127 8, 1282739338, 1282739342, 1282740438, 4, 1096 The first row of the file is quite self-explaining about what kind of information, each of the columns, is containing. .. References .. _subversion: http://subversion.tigris.org/ .. _sphinx: http://sphinx.pocoo.org/ .. _virtualenv: http://pypi.python.org/pypi/virtualenv/1.5.1 .. (for Emacs only) .. Local variables: mode: rst