Welcome to pyfarm.agent’s documentation!¶
This package contains PyFarm’s agent and job types which are responsible for the execution of tasks allocated to a host by the master.
Contents
Commands¶
Note
The default values provided are based on the configuration at the time this page was generated. They may not be the same defaults you will see.
Standard Commands¶
pyfarm-agent¶
usage: pyfarm-agent [status|start|stop]
positional arguments:
{start,stop,status} individual operations pyfarm-agent can run
start starts the agent
stop stops the agent
status query the 'running' state of the agent
optional arguments:
-h, --help show this help message and exit
Agent Network Service:
Main flags which control the network services running on the agent.
--port PORT The port number which the agent is either running on
or will run on when started. This port is also
reported the master when an agent starts. [default:
None]
--host HOST The host to communicate with or hostname to present to
the master when starting. Defaults to the fully
qualified hostname.
--agent-api-username AGENT_API_USERNAME
The username required to access or manipulate the
agent using REST. [default: agent]
--agent-api-password AGENT_API_PASSWORD
The password required to access manipulate the agent
using REST. [default: agent]
--systemid SYSTEMID The system identification value. This is used to help
identify the system itself to the master when the
agent connects. [default: auto]
--systemid-cache SYSTEMID_CACHE
The location to cache the value for --systemid.
[default: None]
Network Resources:
Resources which the agent will be communicating with.
--master MASTER This is a convenience flag which will allow you to set
the hostname for the master. By default this value
will be substituted in --master-api
--master-api MASTER_API
The location where the master's REST api is located.
[default: None]
--master-api-version MASTER_API_VERSION
Sets the version of the master's REST api the agent
shoulduse [default: None]
Process Control:
These settings apply to the parent process of the agent and contribute to
allowing the process to run as other users or remain isolated in an
environment. They also assist in maintaining the 'running state' via a
process id file.
--pidfile PIDFILE The file to store the process id in. [default: None]
-n, --no-daemon If provided then do not run the process in the
background.
--chdir CHDIR The working directory to change the agent into upon
launch
--uid UID The user id to run the agent as. *This setting is
ignored on Windows.*
--gid GID The group id to run the agent as. *This setting is
ignored on Windows.*
--pdb-on-unhandled When set pdb.set_trace() will be called if an
unhandled error is caught in the logger
pyfarm-agent is a command line client for working with a local agent. You can
use it to stop, start, and report the general status of a running agent
process.
usage: pyfarm-agent [status|start|stop] status [-h]
optional arguments:
-h, --help show this help message and exit
usage: pyfarm-agent [status|start|stop] start [-h]
[--projects PROJECTS [PROJECTS ...]]
[--state STATE]
[--time-offset TIME_OFFSET]
[--ntp-server NTP_SERVER]
[--ntp-server-version NTP_SERVER_VERSION]
[--no-pretty-json]
[--shutdown-timeout SHUTDOWN_TIMEOUT]
[--updates-drop-dir UPDATES_DROP_DIR]
[--cpus CPUS] [--ram RAM]
[--ram-check-interval RAM_CHECK_INTERVAL]
[--ram-max-report-frequency RAM_MAX_REPORT_FREQUENCY]
[--ram-report-delta RAM_REPORT_DELTA]
[--master-reannounce MASTER_REANNOUNCE]
[--log LOG]
[--capture-process-output]
[--task-log-dir TASK_LOG_DIR]
[--ip-remote IP_REMOTE]
[--enable-manhole]
[--manhole-port MANHOLE_PORT]
[--manhole-username MANHOLE_USERNAME]
[--manhole-password MANHOLE_PASSWORD]
[--html-templates-reload]
[--static-files STATIC_FILES]
[--http-retry-delay HTTP_RETRY_DELAY]
[--jobtype-no-cache]
optional arguments:
-h, --help show this help message and exit
General Configuration:
These flags configure parts of the agent related to hardware, state, and
certain timing and scheduling attributes.
--projects PROJECTS [PROJECTS ...]
The project or projects this agent is dedicated to. By
default the agent will service any project however
specific projects may be specified. For example if you
wish this agent to service 'Foo Part I' and 'Foo Part
II' only just specify it as `--projects "Foo Part I"
"Foo Part II"`
--state STATE The current agent state, valid values are ['disabled',
'offline', 'running', 'online']. [default: online]
--time-offset TIME_OFFSET
If provided then don't talk to the NTP server at all
to calculate the time offset. If you know for a fact
that this host's time is always up to date then
setting this to 0 is probably a safe bet.
--ntp-server NTP_SERVER
The default network time server this agent should
query to retrieve the real time. This will be used to
help determine the agent's clock skew if any. Setting
this value to '' will effectively disable this query.
[default: None]
--ntp-server-version NTP_SERVER_VERSION
The version of the NTP server in case it's running an
olderor newer version. [default: None]
--no-pretty-json If provided do not dump human readable json via the
agent's REST api
--shutdown-timeout SHUTDOWN_TIMEOUT
How many seconds the agent should spend attempting to
inform the master that it's shutting down.
--updates-drop-dir UPDATES_DROP_DIR
The directory to drop downloaded updates in. This
should be the same directory pyfarm-supervisor will
look for updates in. [default: None]
Physical Hardware:
Command line flags which describe the hardware of the agent.
--cpus CPUS The total amount of cpus installed on the system.
Defaults to the number of cpus installed on the
system.
--ram RAM The total amount of ram installed on the system in
megabytes. Defaults to the amount of ram the system
has installed.
Interval Controls:
Controls which dictate when certain internal intervals should occur.
--ram-check-interval RAM_CHECK_INTERVAL
How often ram resources should be checked for changes.
The amount of memory currently being consumed on the
system is checked after certain events occur such as a
process but this flag specifically controls how often
we should check when no such events are occurring.
[default: None]
--ram-max-report-frequency RAM_MAX_REPORT_FREQUENCY
This is a limiter that prevents the agent from
reporting memory changes to the master more often than
a specific time interval. This is done in order to
ensure that when 100s of events fire in a short period
of time cause changes in ram usage only one or two
will be reported to the master. [default: None]
--ram-report-delta RAM_REPORT_DELTA
Only report a change in ram if the value has changed
at least this many megabytes. [default: None]
--master-reannounce MASTER_REANNOUNCE
Controls how often the agent should reannounce itself
to the master. The agent may be in contact with the
master more often than this however during long period
of inactivity this is how often the agent will
'inform' the master the agent is still online.
Logging Options:
Settings which control logging of the agent's parent process and/or any
subprocess it runs.
--log LOG If provided log all output from the agent to this
path. This will append to any existing log data.
[default: None]
--capture-process-output
If provided then all log output from each process
launched by the agent will be sent through agent's
loggers.
--task-log-dir TASK_LOG_DIR
The directory tasks should log to.
Network Service:
Controls how the agent is seen or interacted with by external services
such as the master.
--ip-remote IP_REMOTE
The remote IPv4 address to report. In situation where
the agent is behind a firewall this value will
typically be different.
Manhole Service:
Controls the manhole service which allows a telnet connection to be made
directly into the agent as it's running.
--enable-manhole When provided the manhole service will be started once
the reactor is running.
--manhole-port MANHOLE_PORT
The port the manhole service should run on if enabled.
--manhole-username MANHOLE_USERNAME
The telnet username that's allowed to connect to the
manhole service running on the agent.
--manhole-password MANHOLE_PASSWORD
The telnet password to use when connecting to the
manhole service running on the agent.
HTTP Configuration:
Options for how the agent will interact with the master's REST api and how
it should run it's own REST api.
--html-templates-reload
If provided then force Jinja2, the html template
system, to check the file system for changes with
every request. This flag should not be used in
production but is useful for development and debugging
purposes.
--static-files STATIC_FILES
The default location where the agent's http server
should find static files to serve.
--http-retry-delay HTTP_RETRY_DELAY
If a http request to the master has failed, wait this
amount of time before trying again
Job Types:
--jobtype-no-cache If provided then do not cache job types, always
directly retrieve them. This is beneficial if you're
testing the agent or a new job type class.
usage: pyfarm-agent [status|start|stop] stop [-h] [--no-wait]
optional arguments:
-h, --help show this help message and exit
optional flags:
Flags that control how the agent is stopped
--no-wait If provided then don't wait on the agent to shut itself down. By
default we would want to wait on each task to stop so we can
catch any errors and then finally wait on the agent to shutdown
too. If you're in a hurry or stopping a bunch of agents at once
then setting this flag will let the agent continue to stop
itself without waiting for each agent
usage: pyfarm-supervisor [-h] [--updates-drop-dir UPDATES_DROP_DIR]
[--agent-package-dir AGENT_PACKAGE_DIR]
[--pidfile PIDFILE] [-n] [--chdir CHDIR] [--uid UID]
[--gid GID]
Start and monitor the agent process
optional arguments:
-h, --help show this help message and exit
--updates-drop-dir UPDATES_DROP_DIR
Where to look for agent updates
--agent-package-dir AGENT_PACKAGE_DIR
Path to the actual agent code
--pidfile PIDFILE The file to store the process id in. [default: None]
-n, --no-daemon If provided then do not run the process in the
background.
--chdir CHDIR The directory to chdir to upon launch.
--uid UID The user id to run the supervisor as. *This setting is
ignored on Windows.*
--gid GID The group id to run the supervisor as. *This setting
is ignored on Windows.*
Development Commands¶
pyfarm-dev-fakerender¶
usage: pyfarm-dev-fakerender [-h] [--ram RAM] [--duration DURATION]
[--return-code RETURN_CODE]
[--duration-jitter DURATION_JITTER]
[--ram-jitter RAM_JITTER] -s START [-e END]
[-b BY] [--spew] [--segfault]
Very basic command line tool which vaguely simulates a render.
optional arguments:
-h, --help show this help message and exit
--ram RAM How much ram in megabytes the fake command should
consume
--duration DURATION How many seconds it should take to run this command
--return-code RETURN_CODE
The return code to return, declaring this flag
multiple times will result in a random return code.
[default: [0]]
--duration-jitter DURATION_JITTER
Randomly add or subtract this amount to the total
duration
--ram-jitter RAM_JITTER
Randomly add or subtract this amount to the ram
-s START, --start START
The start frame. If no other flags are provided this
will also be the end frame.
-e END, --end END The end frame
-b BY, --by BY The by frame
--spew Spews lots of random output to stdout which is
generally a decent stress test for log processing
issues. Do note however that this will disable the
code which is consuming extra CPU cycles. Also, use
this option with care as it can generate several
gigabytes of data per frame.
--segfault If provided then there's a 25% chance of causing a
segmentation fault.
pyfarm-dev-fakework¶
usage: pyfarm-dev-fakework [-h] [--master-api MASTER_API]
[--agent-api AGENT_API] [--jobtype JOBTYPE]
[--job JOB]
Quick and dirty script to create a job type, a job, and some tasks which are
then posted directly to the agent. The primary purpose of this script is to
test the internal of the job types
optional arguments:
-h, --help show this help message and exit
--master-api MASTER_API
The url to the master's api [default:
http://127.0.0.1/api/v1]
--agent-api AGENT_API
The url to the agent's api [default:
http://127.0.0.1:50000/api/v1]
--jobtype JOBTYPE The job type to use [default: FakeRender]
--job JOB If provided then this will be the job we pull tasks
from and assign to the agent. Please note we'll only
be pulling tasks that aren't running or assigned.
Environment Variables¶
PyFarm’s agent has several environment variables which can be used to change the operation at runtime. For more information see the individual sections below.
- PYFARM_JOBTYPE_ALLOW_CODE_EXECUTION_IN_MODULE_ROOT¶
If True, then function calls in the root of a job types’s source code will result in an error when the work is assigned. By default, this value is set to True.
- PYFARM_JOBTYPE_SUBCLASSES_BASE_CLASS¶
If True then job types which do not subclass from pyfarm.jobtypes.core.jobtype.JobType will raise an exception when work is assigned. By default, this value is set to True.
Configuration Files¶
Below are the configuration files for this subproject. These files are installed along side the source code when the package is installed. These are only the defaults however, you can always override these values in your own environment. See the Configuration object documentation for more detailed information.
Agent¶
The below is the current configuration file for the agent. This file lives at pyfarm/agent/etc/agent.yml in the source tree.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | # The platform specific locations where the agent uuid
# file is default. This can be overridden with --agent-id-file flag.
agent_id_file_platform_defaults:
linux: /etc/pyfarm/agent/uuid.dat
mac: /Library/pyfarm/agent/uuid.dat
bsd: /etc/pyfarm/agent/uuid.dat
windows: $LOCALAPPDATA/pyfarm/agent/uuid.dat
# The default location to store data. $temp will expand to
# whatever pyfarm's data root is plus the application
# name (agent). For example on Linux this would expand to
# /tmp/pyfarm/agent
agent_data_root: $temp
# Defines the number of seconds between iterations of pyfarm-supervisor's
# agent status check.
supervisor_interval: 5
# The location where the agent should change directories
# into upon starting. If this value is not set then no
# changes will be made.
agent_chdir:
# The location where static web files should be served from. This
# will default to using PyFarm's installation root.
agent_static_root: auto
# The default location where lock files should be stored. By
# default these will be stored alone side other data
# inside the `agent_data_root` value above.
lock_file_root: $agent_data_root/lock
# Locations of specific lock files
agent_lock_file: $lock_file_root/agent.pid
supervisor_lock_file: $lock_file_root/supervisor.pid
# Where user data for the agent is stored. ~ will be expanded
# to the current users's home directory.
agent_user_data: ~/.pyfarm/agent
# The default location where the agent should save logs to. This
# includes both logs from processes and the agent log itself.
agent_logs_root: $agent_data_root/logs
# The location where agent updates should be stored.
agent_updates_dir: $agent_data_root/updates
# The default port which the agent should use to serve the
# REST api.
agent_api_port: 50000
# The location where the the agent should save its own
# logging output to.
agent_log: $agent_logs_root/agent.log
# The user agent the master will use when connecting to the agent's
# REST api. This value should only be changed if the master's code
# is updated with a new user agent. Change this value has not effect
# on the master.
master_user_agent: PyFarm/1.0 (master)
# Configuration values which control how the url
# for the master is constructed. If 'master' is not set
# the --master flag will be required to start the agent.
master:
master_api_version: 1
master_api: http://$master/api/v$master_api_version
# The user agent the master uses to talke to the agent's
# REST api. This value should not be modified unless
# there's a specific reason to do so.
master_user_agent: PyFarm/1.0 (master)
# Controls how often the agent should reannounce itself
# to the master. The agent may be in contact with the master
# more often than this however during long period of
# inactivity this is how often the agent will 'inform' the
# master the agent is still online.
agent_master_reannounce: 120
# How many seconds the agent should spend attempting to inform
# the master that it's shutting down.
agent_shutdown_timeout: 15
# If an http request fails, use this as the base value
# to help determine how long we should wait before retrying
agent_http_retry_delay: 5
# Controls if the http client connection should be persistent or
# not. Generally this should always be True because the connection
# self-terminates after a short period of time anyway. For higher
# latency situations or with larger deployments this value should
# be False.
agent_http_persistent_connections: True
# If True then html templates will be reloaded with
# every request instead of cached.
agent_html_template_reload: False
# If True then reformat json output to be more human
# readable.
agent_pretty_json: True
# How often the agent should check for changes in ram. This value
# is used to ensure ram usage is checked at least this often though
# it may be checked more often due to other events (such as jobs
# running)
agent_ram_check_interval: 30
# If the ram has changed this may megabytes since the last
# check then report the change to the master.
agent_ram_report_delta: 100
# How much the agent should wait, in seconds, between
# each report about a change in ram.
agent_ram_max_report_frequency: 10
# The default network time server and version the agent
# should use to calcuate its clock skew.
agent_ntp_server: pool.ntp.org
agent_ntp_server_version: 2
# The amount of time this agent is offset from what
# would be considered correct based on an atomic
# clock. If this value is set to auto the time will
# be calculated using NTP.
agent_time_offset: auto
# Physical and network information about the host the agent
# is running on. Setting these values to 'auto' will cause
# them to be initilized to the system's current
# configuration values.
agent_ram: auto
agent_cpus: auto
agent_hostname: auto
# When True this will enable a telnet connection
# to the agent which will present a Python interpreter
# upon connection. This is mainly used for debugging
# and direct manipulation of the agent. You can use
# the show() function once connected to see what
# objects are available.
agent_manhole: False
agent_manhole_port: 50001
agent_manhole_username: admin
agent_manhole_password: admin
# NOTE: The following values are used by the unittests and should be
# generally ignored for anything other than development.
agent_unittest:
dns_test_hostname: example.com
client_redirect_target: http://example.com
client_api_test_url_https: https://httpbin.org
client_api_test_url_http: http://httpbin.org
# A list of paths or names where the `lspci` command can
# be called from on Linux. This is used to retrieve information
# about graphics cards installed on the system in
# `pyfarm.agent.sysinfo.graphics.graphics_cards`.
# If you need run the command with sudo you may also specify an entry
# like this:
# - sudo lspci
sysinfo_command_lspci:
- lspci
- /bin/lspci
- /sbin/lspci
- /usr/sbin/lspci
- /usr/bin/lspci
|
Job Types¶
The below is the current configuration file for job types. This file lives at pyfarm/jobtypes/etc/jobtypes.yml in the source tree.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | # When set to True caching of job types will be enabled. When set to
# False caching is disabled and every job type will retrieved from
# the master directly.
jobtype_enable_cache: True
# If True then output from all processes will be sent directly to
# the agent's logger(s) instead of to the log file assoicated
# with each process.
jobtype_capture_process_output: False
# The location where tasks should be logged
jobtype_task_logs: $agent_logs_root/tasks
# The filename to an individual log file. This filename supports several
# internal variables:
#
# $YEAR - The current year
# $MONTH - The current month
# $DAY - The current day
# $HOUR - The current hour
# $MINUTE - The current hour
# $JOB - The id of the job this log is for
# $PROCESS - The uuid of the process object responsible for creating the log
#
# In addition to the above you can, as with any configuration variable,
# also use environment variables in the filename.
# Path separators ("/" and "\") are not allowed.
jobtype_task_log_filename:
$YEAR-$MONTH-$DAY_$HOUR-$MINUTE-$SECOND_$JOB_$PROCESS.csv
# store cached source code from the master. Note
# that $temp will be expanded to the local system's
# temp directory. If this directory does not exist
# it will be created. Leaving this value blank will
# disable job type caching.
jobtype_cache_directory: $temp/jobtype_cache
# The root directory that the default implementation of JobType.tempdir()
# will create a path using tempfile.mkdtemp.
jobtype_tempdir_root: $temp/tempdir/$JOBTYPE_UUID
# If True then expand environment variables in file paths.
jobtype_expandvars: True
# If True, then ignore any errors produced when tring
# to map users and groups to IDs. This will cause the
# underlying methods in the job type to instead run
# as the job type's owner instead, ignoring what the
# incoming job requests.
# NOTE: This value is not used on Windows.
jobtype_ignore_id_mapping_errors: False
# Any additional key/value pairs to include
# in the environment of a process launched
# by a job type.
jobtype_default_environment: {}
# Configures the thread pool used by job types
# for logging.
jobtype_logging_threadpool:
# Setting this value to something smaller than `1` will result
# in an exception being raised. This value also cannot be larger
# than `max_threads` below.
min_threads: 3
# This value must be greater than or equal to `min_threads`
# above. You may also set this value to 'auto' meaning the
# number of processors times 1.5 or 20 (whichever is lower).
max_threads: auto
# As log messages are sent from processes they are stored
# in an in memory queue. When the number of messages is higher
# than this number a thread will be spawned to consume the
# data and flush it into a file object.
max_queue_size: 10
# Most often the operating system will control how often data
# is written to disk from a file object. This value overrides
# that behavior and forces the file object to flush to disk
# after this many messages have been processed.
flush_lines: 100
|
pyfarm.agent package¶
Subpackages¶
pyfarm.agent.entrypoints package¶
Submodules¶
pyfarm.agent.entrypoints.development module¶
pyfarm.agent.entrypoints.main module¶
pyfarm.agent.entrypoints.parser module¶
Module which forms the basis of a custom argparse based command line parser which handles setting configuration values automatically.
- pyfarm.agent.entrypoints.parser.assert_parser(func)[source]¶
ensures that the instance argument passed along to the validation function contains data we expect
- pyfarm.agent.entrypoints.parser.ip(*args, **kwargs)[source]¶
make sure the ip address provided is valid
- pyfarm.agent.entrypoints.parser.port(*args, **kwargs)[source]¶
convert and check to make sure the provided port is valid
- pyfarm.agent.entrypoints.parser.uuid_type(*args, **kwargs)[source]¶
validates that a string is a valid UUID type
- pyfarm.agent.entrypoints.parser.uidgid(*args, **kwargs)[source]¶
Retrieves and validates the user or group id for a command line flag
- pyfarm.agent.entrypoints.parser.direxists(*args, **kwargs)[source]¶
checks to make sure the directory exists
- pyfarm.agent.entrypoints.parser.fileexists(*args, **kwargs)[source]¶
checks to make sure the provided file exists
- pyfarm.agent.entrypoints.parser.number(*args, **kwargs)[source]¶
convert the given value to a number
- pyfarm.agent.entrypoints.parser.enum(*args, **kwargs)[source]¶
ensures that value is a valid entry in enum
- class pyfarm.agent.entrypoints.parser.ActionMixin(*args, **kwargs)[source]¶
Bases: object
A mixin which overrides the __init__ and __call__ methods on an action so we can:
- Setup attributes to manipulate the config object when the arguments are parsed
- Ensure we all required arguments are present
- Convert the type keyword into an internal representation so we don’t require as much work when we add arguments to the parser
- TYPE_MAPPING = {<function isdir at 0x7f098048a0c8>: <function direxists at 0x7f097a4006e0>, <function isfile at 0x7f098048a050>: <function fileexists at 0x7f097a4007d0>, <type 'int'>: <functools.partial object at 0x7f097a404260>}¶
- pyfarm.agent.entrypoints.parser.mix_action(class_)¶
- pyfarm.agent.entrypoints.parser.StoreAction¶
alias of _StoreAction
- pyfarm.agent.entrypoints.parser.SubParsersAction¶
alias of _SubParsersAction
- pyfarm.agent.entrypoints.parser.StoreConstAction¶
alias of _StoreConstAction
- pyfarm.agent.entrypoints.parser.StoreTrueAction¶
alias of _StoreTrueAction
- pyfarm.agent.entrypoints.parser.StoreFalseAction¶
alias of _StoreFalseAction
- pyfarm.agent.entrypoints.parser.AppendAction¶
alias of _AppendAction
- pyfarm.agent.entrypoints.parser.AppendConstAction¶
alias of _AppendConstAction
- class pyfarm.agent.entrypoints.parser.AgentArgumentParser(*args, **kwargs)[source]¶
Bases: argparse.ArgumentParser
A modified ArgumentParser which interfaces with the agent’s configuration.
pyfarm.agent.entrypoints.supervisor module¶
pyfarm.agent.entrypoints.utility module¶
Small objects and functions which facilitate operations on the main entry point class.
- pyfarm.agent.entrypoints.utility.start_daemon_posix(log, chdir, uid, gid)[source]¶
Runs the agent process via a double fork. This basically a duplicate of Marcechal’s original code with some adjustments:
http://www.jejik.com/articles/2007/02/ a_simple_unix_linux_daemon_in_python/- Source files from his post are here:
- http://www.jejik.com/files/examples/daemon.py http://www.jejik.com/files/examples/daemon3x.py
pyfarm.agent.http package¶
Subpackages¶
pyfarm.agent.http.api package¶
Contains the base resources used for building up the root of the agent’s api.
- class pyfarm.agent.http.api.base.APIResource[source]¶
Bases: pyfarm.agent.http.core.resource.Resource
Base class for all api resources
- isLeaf = True¶
- CONTENT_TYPES = set(['application/json'])¶
- class pyfarm.agent.http.api.base.APIRoot[source]¶
Bases: pyfarm.agent.http.api.base.APIResource
- isLeaf = False¶
- class pyfarm.agent.http.api.base.Versions[source]¶
Bases: pyfarm.agent.http.api.base.APIResource
Returns a list of api versions which this agent will support
- GET /api/v1/versions/ HTTP/1.1¶
Request
GET /api/v1/versions/HTTP/1.1 Accept: application/json
Response
HTTP/1.1 200 OK Content-Type: application/json { "versions": [1] }
- isLeaf = True¶
- class pyfarm.agent.http.api.state.Stop[source]¶
Bases: pyfarm.agent.http.api.base.APIResource
- isLeaf = False¶
- SCHEMAS = {'POST': <voluptuous.Schema object at 0x7f0979d5aad0>}¶
- class pyfarm.agent.http.api.tasks.Tasks[source]¶
Bases: pyfarm.agent.http.api.base.APIResource
- delete(**kwargs)[source]¶
HTTP endpoint for stopping and deleting an individual task from this agent. ... warning:: If the specified task is part of a multi-task assignment, all tasks in this assignment will be stopped, not just the specified one.
This will try to asynchronously stop the assignment by killing all its child processes. If that isn’t successful, this will have no effect.
This endpoint is used to instruct the agent to download and apply an update.
- class pyfarm.agent.http.api.update.Update[source]¶
Bases: pyfarm.agent.http.api.base.APIResource
Requests the agent to download and apply the specified version of itself. Will make the agent restart at the next opportunity.
- POST /api/v1/update HTTP/1.1¶
Request
POST /api/v1/update HTTP/1.1 Accept: application/json { "version": 1.2.3 }
Response
HTTP/1.1 200 ACCEPTED Content-Type: application/json
- SCHEMAS = {'POST': <voluptuous.Schema object at 0x7f0979ca9510>}¶
- isLeaf = False¶
pyfarm.agent.http.core package¶
The client library the manager uses to communicate with the master server.
- pyfarm.agent.http.core.client.build_url(url, params=None)[source]¶
Builds the full url when provided the base url and some url parameters:
>>> build_url("/foobar", {"first": "foo", "second": "bar"}) '/foobar?first=foo&second=bar' >>> build_url("/foobar bar/") ''/foobar%20bar/'
Parameters: - url (str) – The url to build off of.
- params (dict) – A dictionary of parameters that should be added on to url. If this value is not provided url will be returned by itself. Arguments to a url are unordered by default however they will be sorted alphabetically so the results are repeatable from call to call.
- pyfarm.agent.http.core.client.http_retry_delay(initial=None, uniform=False, get_delay=<built-in method random of Random object at 0x124fab0>, minimum=1)[source]¶
Returns a floating point value that can be used to delay an http request. The main purpose of this is to ensure that not all requests are run with the same interval between then. This helps to ensure that if the same request, such as agents coming online, is being run on multiple systems they should be staggered a little more than they would be without the non-uniform delay.
Parameters: - initial (int) – The initial delay value to start off with before any extra calculations are done. If this value is not provided the value provided to --http-retry-delay at startup will be used.
- uniform (bool) – If True then use the value produced by get_delay as a multiplier.
- get_delay (callable) – A function which should produce a number to multiply delay by. By default this uses random.random()
- minimum – Ensures that the value returned from this function is greater than or equal to a minimum value.
- class pyfarm.agent.http.core.client.Request[source]¶
Bases: pyfarm.agent.http.core.client.Request
Contains all the information used to perform a request such as the method, url, and original keyword arguments (kwargs). These values contain the basic information necessary in order to retry() a request.
- class pyfarm.agent.http.core.client.Response(deferred, response, request)[source]¶
Bases: twisted.internet.protocol.Protocol
This class receives the incoming response body from a request constructs some convenience methods and attributes around the data.
Parameters: - deferred (Deferred) – The deferred object which contains the target callback and errback.
- response – The initial response object which will be passed along to the target deferred.
- request (Request) – Named tuple object containing the method name, url, headers, and data.
- data()[source]¶
Returns the data currently contained in the buffer.
Raises RuntimeError: Raised if this method id called before all data has been received.
- json(loader=<function loads at 0x7f097cdd4b18>)[source]¶
Returns the json data from the incoming request
Raises: - RuntimeError – Raised if this method id called before all data has been received.
- ValueError – Raised if the content type for this request is not application/json.
- pyfarm.agent.http.core.client.request(method, url, **kwargs)[source]¶
Wrapper around treq.request() with some added arguments and validation.
Parameters: - method (str) – The HTTP method to use when making the request.
- url (str) – The url this request will be made to.
- data (str, list, tuple, set, dict) – The data to send along with some types of requests such as POST or PUT
- headers (dict) – The headers to send along with the request to url. Currently only single values per header are supported.
- callback (function) – The function to deliver an instance of Response once we receive and unpack a response.
- errback (function) – The function to deliver an error message to. By default this will use log.err().
- response_class (class) – The class to use to unpack the internal response. This is mainly used by the unittests but could be used elsewhere to add some custom behavior to the unpack process for the incoming response.
- pyfarm.agent.http.core.client.random() → x in the interval [0, 1).¶
Base resources which can be used to build top leve documents, pages, or other types of data for the web.
- class pyfarm.agent.http.core.resource.Resource[source]¶
Bases: twisted.web.resource.Resource
Basic subclass of _Resource for passing requests to specific methods. Unlike _Resource however this will will also handle:
- rewriting of request objects
- templating
- content type discovery and validation
- unpacking of request data
- rerouting of request to specific internal methods
- TEMPLATE = NotImplemented¶
- CONTENT_TYPES = set(['application/json', 'text/html'])¶
- LOAD_DATA_FOR_METHODS = set(['PUT', 'POST'])¶
- SCHEMAS = {}¶
- putChild(path, child)[source]¶
Overrides the builtin putChild() so we can return the results for each call and use them externally
HTTP server responsible for serving requests that control or query the running agent. This file produces a service that the pyfarm.agent.manager.service.ManagerServiceMaker class can consume on start.
- class pyfarm.agent.http.core.server.RewriteRequest(*args, **kw)[source]¶
Bases: twisted.web.server.Request
A custom implementation of _Request that will allow us to modify an incoming request before it reaches the HTTP server..
- REPLACE_REPEATED_DELIMITER = <_sre.SRE_Pattern object at 0x7f097ab134b0>¶
- class pyfarm.agent.http.core.server.Site(resource, *args, **kwargs)[source]¶
Bases: twisted.web.server.Site
Site object similar to Twisted’s except it also carries along some of the internal agent data.
- displayTracebacks = True¶
- requestFactory¶
alias of RewriteRequest
- class pyfarm.agent.http.core.server.StaticPath(*args, **kwargs)[source]¶
Bases: twisted.web.static.File
More secure version of File that does not list directories. In addition this will also sending along a response header asking clients to cache to data.
- EXPIRES = 604800¶
- ALLOW_DIRECTORY_LISTING = False¶
Interface methods for working with the Jinja template engine.
- class pyfarm.agent.http.core.template.InMemoryCache[source]¶
Bases: jinja2.bccache.BytecodeCache
Caches Jinja templates into memory after they have been loaded and compiled.
- cache = {}¶
- class pyfarm.agent.http.core.template.DeferredTemplate[source]¶
Bases: jinja2.environment.Template
Overrides the default PackageLoader so we can produced the rendered result as a deferred call.
- class pyfarm.agent.http.core.template.Environment(**kwargs)[source]¶
Bases: jinja2.environment.Environment
Implementation of Jinja’s _Environment class which reads from our configuration object and establishes the default functions we can use in a template.
- template_class¶
alias of DeferredTemplate
Submodules¶
pyfarm.agent.http.system module¶
- class pyfarm.agent.http.system.Index[source]¶
Bases: pyfarm.agent.http.core.resource.Resource
serves request for the root, ‘/’, target
- TEMPLATE = 'index.html'¶
- class pyfarm.agent.http.system.Configuration[source]¶
Bases: pyfarm.agent.http.core.resource.Resource
- TEMPLATE = 'configuration.html'¶
- HIDDEN_FIELDS = ('agent', 'agent_pretty_json')¶
- EDITABLE_FIELDS = ('agent_cpus', 'agent_hostname', 'agent_http_retry_delay', 'master_api', 'master', 'agent_ram_check_interval', 'agent_ram', 'agent_ram_report_delta', 'agent_time_offset', 'state', 'agent_http_retry_delay')¶
pyfarm.agent.sysinfo package¶
Submodules¶
pyfarm.agent.sysinfo.cpu module¶
Contains information about the cpu and its relation to the operating system such as load, processing times, etc.
- pyfarm.agent.sysinfo.cpu.cpu_name()[source]¶
Returns the full name of the CPU installed in the system.
- pyfarm.agent.sysinfo.cpu.total_cpus(logical=True)[source]¶
Returns the total number of cpus installed on the system.
Parameters: logical (bool) – If True the return the number of cores the system has. Setting this value to False will instead return the number of physical cpus present on the system.
- pyfarm.agent.sysinfo.cpu.load(interval=1)[source]¶
Returns the load across all cpus value from zero to one. A value of 1.0 means the average load across all cpus is 100%.
- pyfarm.agent.sysinfo.cpu.user_time()[source]¶
Returns the amount of time spent by the cpu in user space
- pyfarm.agent.sysinfo.cpu.system_time()[source]¶
Returns the amount of time spent by the cpu in system space
pyfarm.agent.sysinfo.graphics module¶
pyfarm.agent.sysinfo.memory module¶
pyfarm.agent.sysinfo.network module¶
Returns information about the network including ip address, dns, data sent/received, and some error information.
const IP_PRIVATE: | |
---|---|
set of private class A, B, and C network ranges See also |
|
const IP_NONNETWORK: | |
set of non-network address ranges including all of the above constants except the IP_PRIVATE |
- pyfarm.agent.sysinfo.network.mac_addresses(long_addresses=False, as_integers=False)[source]¶
Returns a tuple of all mac addresses on the system.
Parameters:
- pyfarm.agent.sysinfo.network.hostname(trust_name_from_ips=True)[source]¶
Returns the hostname which the agent should send to the master.
Parameters: trust_resolved_name (bool) – If True and all addresses provided by addresses() resolve to a single hostname then just return that name as it’s the most likely hostname to be accessible by the rest of the network.
pyfarm.agent.sysinfo.system module¶
Information about the operating system including type, filesystem information, and other relevant information. This module may also contain os specific information such as the Linux distribution, Windows version, bitness, etc.
- pyfarm.agent.sysinfo.system.filesystem_is_case_sensitive()[source]¶
returns True if the file system is case sensitive
- pyfarm.agent.sysinfo.system.environment_is_case_sensitive()[source]¶
returns True if the environment is case sensitive
- pyfarm.agent.sysinfo.system.machine_architecture(arch='x86_64')[source]¶
returns the architecture of the host itself
- pyfarm.agent.sysinfo.system.interpreter_architecture()[source]¶
returns the architecture of the interpreter itself (32 or 64)
- pyfarm.agent.sysinfo.system.uptime()[source]¶
Returns the amount of time the system has been running in seconds.
- pyfarm.agent.sysinfo.system.operating_system(plat='linux2')[source]¶
Returns the operating system for the given platform. Please note that while you can call this function directly you’re more likely better off using values in pyfarm.core.enums instead.
pyfarm.agent.sysinfo.user module¶
Returns information about the current user such as the user name, admin access, or other related information.
Module contents¶
Top level module which provides information about the operating system, system memory, network, and processor related information
Submodules¶
pyfarm.agent.config module¶
Configuration¶
Central module for storing and working with a live configuration objects. This module instances ConfigurationWithCallbacks onto config. Attempting to reload this module will not reinstance the config object.
The config object should be directly imported from this module to be used:
>>> from pyfarm.agent.config import config
- class pyfarm.agent.config.LoggingConfiguration(data=None, environment=None, load=True)[source]¶
Bases: pyfarm.core.config.Configuration
Special configuration object which logs when a key is changed in a dictionary. If the reactor is not running then log messages will be queued until they can be emitted so they are not lost.
- _expandvars(value)¶
Performs variable expansion for value. This method is run when a string value is returned from get() or __getitem__(). The default behavior of this method is to recursively expand variables using sources in the following order:
- The environment, os.environ
- The environment (from the configuration), env
- Other values in the configuration
- ~ to the user’s home directory
For example, the following configuration:
foo: foo bar: bar foobar: $foo/$bar path: ~/$foobar/$TEST
Would result in the following assuming $TEST is an environment variable set to somevalue and the current user’s name is user:
{ "foo": "foo", "bar": "bar", "foobar": "foo/bar", "path": "/home/user/foo/bar/somevalue" }
- MODIFIED = 'modified'¶
- CREATED = 'created'¶
- DELETED = 'deleted'¶
- clear()[source]¶
Deletes all keys in this object and triggers a delete event using changed() for each one.
- update(data=None, **kwargs)[source]¶
Updates the data held within this object and triggers the appropriate events with changed().
- class pyfarm.agent.config.ConfigurationWithCallbacks(data=None, environment=None, load=True)[source]¶
Bases: pyfarm.agent.config.LoggingConfiguration
Subclass of LoggingDictionary that provides the ability to run a function when a value is changed.
- callbacks = {}¶
- classmethod register_callback(key, callback, append=False)[source]¶
Register a function as a callback for key. When key is set the given callback will be run by changed()
Parameters:
- classmethod deregister_callback(key, callback)[source]¶
Removes any callback(s) that are registered with the provided key
- clear(callbacks=False)[source]¶
Performs the same operations as dict.clear() except this method can also clear any registered callbacks if requested.
pyfarm.agent.manhole module¶
Manhole¶
Provides a way to access the internals of the agent via the telnet protocol.
- class pyfarm.agent.manhole.LoggingManhole(namespace=None)[source]¶
Bases: twisted.conch.manhole.ColoredManhole
A slightly modified implementation of ColoredManhole which logs information to the logger so we can track activity in the agent’s log.
- class pyfarm.agent.manhole.TransportProtocolFactory(portal)[source]¶
Bases: object
Glues together a portal along with the TelnetTransport and AuthenticatingTelnetProtocol objects. This class is instanced onto the protocol attribute of the ServerFactory class in build_manhole().
- class pyfarm.agent.manhole.TelnetRealm[source]¶
Bases: object
Wraps together ITelnetProtocol, TelnetBootstrapProtocol, ServerProtocol and ColoredManhole in requestAvatar() which will provide the interface to the manhole.
- NAMESPACE = None¶
pyfarm.agent.service module¶
Manager Service¶
Sends and receives information from the master and performs systems level tasks such as log reading, system information gathering, and management of processes.
- class pyfarm.agent.service.Agent[source]¶
Bases: object
Main class associated with getting getting the internals of the agent’s operations up and running including adding or updating itself with the master, starting the periodic task manager, and handling shutdown conditions.
- classmethod agent_api()[source]¶
Return the API url for this agent or None if agent_id has not been set
- classmethod agents_endpoint()[source]¶
Returns the API endpoint for used for updating or creating agents on the master
- should_reannounce()[source]¶
Small method which acts as a trigger for reannounce()
- reannounce()[source]¶
Method which is used to periodically contact the master. This method is generally called as part of a scheduled task.
- system_data(requery_timeoffset=False)[source]¶
Returns a dictionary of data containing information about the agent. This is the information that is also passed along to the master.
- start(shutdown_events=True, http_server=True)[source]¶
Internal code which starts the agent, registers it with the master, and performs the other steps necessary to get things running.
Parameters:
- stop()[source]¶
Internal code which stops the agent. This will terminate any running processes, inform the master of the terminated tasks, update the state of the agent on the master.
- post_shutdown_to_master(stop_reactor=True)[source]¶
This method is called before the reactor shuts down and lets the master know that the agent’s state is now offline
- errback_post_agent_to_master(failure)[source]¶
Called when there’s a failure trying to post the agent to the master. This is often because of some lower level issue but it may be recoverable to we retry the request.
- callback_post_agent_to_master(response)[source]¶
Called when we get a response after POSTing the agent to the master.
- post_agent_to_master()[source]¶
Runs the POST request to contact the master. Running this method multiple times should be considered safe but is generally something that should be avoided.
- callback_post_free_ram(response)[source]¶
Called when we get a response back from the master after POSTing a change for free_ram
- errback_post_free_ram(failure)[source]¶
Error handler which is called if we fail to post a ram update to the master for some reason
- callback_free_ram_changed(change_type, key, new_value, old_value)[source]¶
Callback used to decide and act on changes to the config['ram'] value.
- errback_post_cpu_count_change(failure)[source]¶
Error handler which is called if we fail to post a cpu count update to an existing agent for some reason.
- callback_post_cpu_count_change(response)[source]¶
Called when we received a response from the master after
- pyfarm.agent.service.random() → x in the interval [0, 1).¶
pyfarm.agent.tasks module¶
Tasks¶
Simple tasks which are run at a scheduled interval by ScheduledTaskManager
- class pyfarm.agent.tasks.ScheduledTaskManager[source]¶
Bases: object
Manages and keeps track of several scheduled tasks.
- test_clock = None¶
- register(function, interval, start=False, clock=None, func_args=None, func_kwargs=None)[source]¶
Register a callable function to run at a given interval. This function will do nothing if function has already been registered.
Parameters: - function – a callable function that should be run on an interval
- interval (int or float) – the interval in which function should be urn
- start (bool) – if True, start the interval timer after it has been added
- clock – optional keyword that will replace the looping call’s clock
- func_args (tuple) – the positional arguments to pass into function
- func_kwargs (dict) – the keyword arguments to pass into function
Raises AssertionError: raised if function is not callable
- start(now=True)[source]¶
start all LoopingCall instances stored from register()
- stop()[source]¶
stop all LoopingCall instances stored from register()
pyfarm.agent.testutil module¶
- class pyfarm.agent.testutil.skipIf(should_skip, reason)[source]¶
Bases: object
Wrapping a test with this class will allow the test to be skipped if should_skip evals as True.
- pyfarm.agent.testutil.requires_master(function)[source]¶
Any test decorated with this function will fail if the master could not be contacted or returned a response other than 200 OK for “/”
- pyfarm.agent.testutil.create_jobtype(classname=None, sourcecode=None)[source]¶
Creates a job type on the master and fires a deferred when finished
- class pyfarm.agent.testutil.FakeRequest(test, method, uri, headers=None, data=None)[source]¶
Bases: object
- class pyfarm.agent.testutil.TestCase(methodName='runTest')[source]¶
Bases: twisted.trial._asynctest.TestCase
- POP_CONFIG_KEYS = []¶
- RAND_LENGTH = 8¶
- timeout = 15¶
- assertRaisesRegexp(expected_exception, expected_regexp, callable_obj=None, *args, **kwargs)[source]¶
- create_file(content=None, dir=None, suffix='')[source]¶
Creates a test file on disk using tempfile.mkstemp() and uses the lower level file interfaces to manage it. This is done to ensure we have more control of the file descriptor itself so on platforms such as Windows we don’t have to worry about running out of file handles.
- class pyfarm.agent.testutil.BaseRequestTestCase(methodName='runTest')[source]¶
Bases: pyfarm.agent.testutil.TestCase
- HTTP_SCHEME = 'http'¶
- DNS_HOSTNAME = 'example.com'¶
- TEST_URL = 'http://httpbin.org'¶
- REDIRECT_TARGET = 'http://example.com'¶
- RESOLVED_DNS_NAME = True¶
- HTTP_REQUEST_SUCCESS = True¶
- class pyfarm.agent.testutil.BaseHTTPTestCase(methodName='runTest')[source]¶
Bases: pyfarm.agent.testutil.TestCase
- URI = NotImplemented¶
- CLASS = NotImplemented¶
- CLASS_FACTORY = NotImplemented¶
- CONTENT_TYPES = NotImplemented¶
- class pyfarm.agent.testutil.BaseAPITestCase(methodName='runTest')[source]¶
Bases: pyfarm.agent.testutil.BaseHTTPTestCase
- CONTENT_TYPES = ['application/json']¶
pyfarm.agent.utility module¶
Utilities¶
Top level utilities for the agent to use internally. Many of these are copied over from the master (which we can’t import here).
- pyfarm.agent.utility.validate_environment(values)[source]¶
Ensures that values is a dictionary and that it only contains string keys and values.
- pyfarm.agent.utility.validate_uuid(value)[source]¶
Ensures that value can be converted to or is a UUID object.
- pyfarm.agent.utility.TASKS_SCHEMA(values)¶
- pyfarm.agent.utility.json_safe(source)[source]¶
Recursively converts source into something that should be safe for json.dumps() to handle. This is used in conjunction with default_json_encoder() to also convert keys to something the json encoder can understand.
- pyfarm.agent.utility.quote_url(source_url)[source]¶
This function serves as a wrapper around urlsplit() and quote() and a url that has the path quoted.
- pyfarm.agent.utility.dumps(*args, **kwargs)[source]¶
Agent’s implementation of json.dumps() or pyfarm.master.utility.jsonify()
- pyfarm.agent.utility.request_from_master(request)[source]¶
Returns True if the request appears to be coming from the master
- class pyfarm.agent.utility.UTF8Recoder(f, encoding)[source]¶
Bases: object
Iterator that reads an encoded stream and reencodes the input to UTF-8
- class pyfarm.agent.utility.UnicodeCSVReader(f, dialect=<class csv.excel at 0x7f097a958598>, encoding='utf-8', **kwds)[source]¶
Bases: object
A CSV reader which will iterate over lines in the CSV file “f”, which is encoded in the given encoding.
- class pyfarm.agent.utility.UnicodeCSVWriter(f, dialect=<class csv.excel at 0x7f097a958598>, encoding='utf-8', **kwds)[source]¶
Bases: object
A CSV writer which will write rows to CSV file “f”, which is encoded in the given encoding.
- pyfarm.agent.utility.total_seconds(td)[source]¶
Returns the total number of seconds in the time delta object. This function is provided for backwards comparability with Python 2.6.
- class pyfarm.agent.utility.AgentUUID[source]¶
Bases: object
This class wraps all the functionality required to load, cache and retrieve an Agent’s UUID.
- log = <pyfarm.agent.logger.python.Logger object at 0x7f097a6bc710>¶
- classmethod load(path)[source]¶
A classmethod to load a UUID object from a path. If the provided path does not exist or does not contain data which can be converted into a UUID object None will be returned.
- classmethod save(agent_uuid, path)[source]¶
Saves agent_uuid to path. This classmethod will also create the necessary parent directories and handle conversion from the input type uuid.UUID.
- classmethod generate()[source]¶
Generates a UUID object. This simply wraps uuid.uuid4() and logs a warning.
pyfarm.jobtypes package¶
Subpackages¶
pyfarm.jobtypes.core package¶
Submodules¶
pyfarm.jobtypes.core.internals module¶
Contains classes which contain internal methods for the pyfarm.jobtypes.core.jobtype.JobType class.
- class pyfarm.jobtypes.core.internals.ProcessData¶
Bases: tuple
ProcessData(protocol, started, stopped, log_identifier)
- log_identifier¶
Alias for field number 3
- protocol¶
Alias for field number 0
- started¶
Alias for field number 1
- stopped¶
Alias for field number 2
- class pyfarm.jobtypes.core.internals.Cache[source]¶
Bases: object
Internal methods for caching job types
- cache = {}¶
- JOBTYPE_VERSION_URL = '%(master_api)s/jobtypes/%(name)s/versions/%(version)s'¶
- CACHE_DIRECTORY = '/tmp/pyfarm/agent/jobtype_cache'¶
- e = OSError(17, 'File exists')¶
pyfarm.jobtypes.core.jobtype module¶
This module contains the core job type from which all other job types are built. All other job types must inherit from the JobType class in this modle.
- class pyfarm.jobtypes.core.jobtype.CommandData(command, *arguments, **kwargs)[source]¶
Bases: object
Stores data to be returned by JobType.get_command_data(). Instances of this class are alosed used by JobType.spawn_process_inputs() at execution time.
Note
This class does not perform any key of path resolution by default. It is assumed this has already been done using something like JobType.map_path()
Parameters: - command (string) – The command that will be executed when the process runs.
- arguments – Any additional arguments to be passed along to the command being launched.
- env (dict) – If provided, this will be the environment to launch the command with. If this value is not provided then a default environment will be setup using set_default_environment() when JobType.start() is called. JobType.start() itself will use JobType.set_default_environment() to generate the default environment.
- cwd (string) – The working directory the process should execute in. If not provided the process will execute in whatever the directory the agent is running inside of.
- user (string or integer) – The username or user id that the process should run as. On Windows this keyword is ignored and on Linux this requires the agent to be executing as root. The value provided here will be run through JobType.get_uid_gid() to map the incoming value to an integer.
- group (string or integer) – Same as user above except this sets the group the process will execute.
- id – An arbitrary id to associate with the resulting process protocol. This can help identify
- validate()[source]¶
Validates that the attributes on an instance of this class contain values we expect. This method is called externally by the job type in JobType.start() and may correct some instance attributes.
- class pyfarm.jobtypes.core.jobtype.JobType(assignment)[source]¶
Bases: pyfarm.jobtypes.core.internals.Cache, pyfarm.jobtypes.core.internals.System, pyfarm.jobtypes.core.internals.Process, pyfarm.jobtypes.core.internals.TypeChecks
Base class for all other job types. This class is intended to abstract away many of the asynchronous necessary to run a job type on an agent.
Variables: - PERSISTENT_JOB_DATA (set) – A dictionary of job ids and data that prepare_for_job() has produced. This is used during __init__() to set persistent_job_data.
- COMMAND_DATA_CLASS (CommandData) – If you need to provide your own class to represent command data you should override this attribute. This attribute is used by by methods within this class to do type checking.
- PROCESS_PROTOCOL (ProcessProtocol) – The protocol object used to communicate with each process spawned
- ASSIGNMENT_SCHEMA (voluptuous.Schema) – The schema of an assignment. This object helps to validate the incoming assignment to ensure it’s not missing any data.
- uuid (UUID) – This is the unique identifier for the job type instance and is automatically set when the class is instanced. This is used by the agent to track assignments and job type instances.
- finished_tasks (set) – A set of tasks that have had their state changed to finished through set_task_state(). At the start of the assignment, this list is empty.
- failed_tasks (set) – This is analogous to finished_tasks except it contains failed tasks only.
Parameters: assignment (dict) – This attribute is a dictionary the keys “job”, “jobtype” and “tasks”. self.assignment[“job”] is itself a dict with keys “id”, “title”, “data”, “environ” and “by”. The most important of those is usually “data”, which is the dict specified when submitting the job and contains jobtype specific data. self.assignment[“tasks”] is a list of dicts representing the tasks in the current assignment. Each of these dicts has the keys “id” and “frame”. The list is ordered by frame number.
- PERSISTENT_JOB_DATA = {}¶
- COMMAND_DATA¶
alias of CommandData
- PROCESS_PROTOCOL¶
alias of ProcessProtocol
- ASSIGNMENT_SCHEMA = <voluptuous.Schema object at 0x7f0979f82310>¶
- classmethod load(assignment)[source]¶
Given an assignment this class method will load the job type either from cache or from the master.
Parameters: assignment (dict) – The dictionary containing the assignment. This will be passed into an instance of ASSIGNMENT_SCHEMA to validate that the internal data is correct.
- classmethod prepare_for_job(job)[source]¶
Note
This method is not yet implemented
Called before a job executes on the agent first the first time. Whatever this classmethod returns will be available as persistent_job_data on the job type instance.
Parameters: job (int) – The job id which prepare_for_job is being run for By default this method does nothing.
- classmethod cleanup_after_job(persistent_data)[source]¶
Note
This method is not yet implemented
This classmethod will be called after the last assignment from a given job has finished on this node.
Parameters: persistent_data – The persistent data that prepare_for_job() produced. The value for this data may be None if prepare_for_job() returned None or was not implemented.
- classmethod spawn_persistent_process(job, command_data)[source]¶
Note
This method is not yet implemented
Starts one child process using an instance of CommandData or similiar input. This process is intended to keep running until the last task from this job has been processed, potentially spanning more than one assignment. If the spawned process is still running then we’ll cleanup the process after cleanup_after_job()
- node()[source]¶
Returns live information about this host, the operating system, hardware, and several other pieces of global data which is useful inside of the job type. Currently data from this method includes:
- master_api - The base url the agent is using to communicate with the master.
- hostname - The hostname as reported to the master.
- agent_id - The unique identifier used to identify. this agent to the master.
- id - The database id of the agent as given to us by the master on startup of the agent.
- cpus - The number of CPUs reported to the master
- ram - The amount of ram reported to the master.
- total_ram - The amount of ram, in megabytes, that’s installed on the system regardless of what was reported to the master.
- free_ram - How much ram, in megabytes, is free for the entire system.
- consumed_ram - How much ram, in megabytes, is being consumed by the agent and any processes it has launched.
- admin - Set to True if the current user is an administrator or ‘root’.
- user - The username of the current user.
- case_sensitive_files - True if the file system is case sensitive.
- case_sensitive_env - True if environment variables are case sensitive.
- machine_architecture - The architecture of the machine the agent is running on. This will return 32 or 64.
- operating_system - The operating system the agent is executing on. This value will be ‘linux’, ‘mac’ or ‘windows’. In rare circumstances this could also be ‘other’.
Raises KeyError: Raised if one or more keys are not present in the global configuration object. This should rarely if ever be a problem under normal circumstances. The exception to this rule is in unittests or standalone libraries with the global config object may not be populated.
- tempdir(new=False, remove_on_finish=True)[source]¶
Returns a temporary directory to be used within a job type. By default once called the directory will be created on disk and returned from this method.
Calling this method multiple times will return the same directory instead of creating a new directory unless new is set to True.
Parameters:
- get_uid_gid(user, group)[source]¶
Overridable. This method to convert a named user and group into their respective user and group ids.
- get_environment()[source]¶
Constructs an environment dictionary that can be used when a process is spawned by a job type.
- get_command_list(cmdlist)[source]¶
Return a list of command to be used when running the process as a read-only tuple.
- get_csvlog_path(protocol_uuid, create_time=None)[source]¶
Returns the path to the comma separated value (csv) log file. The agent stores logs from processes in a csv format so we can store additional information such as a timestamp, line number, stdout/stderr identification and the the log message itself.
Note
This method should not attempt to create the parent directories of the resulting path. This is already handled by the logger pool in a non-blocking fashion.
- get_command_data()[source]¶
Overridable. This method returns the arguments necessary for executing a command. For job types which execute a single process per assignment, this is the most important method to implement.
Warning
This method should not be used when this jobtype requires more than one process for one assignment and may not get called at all if start() was overridden.
The default implementation does nothing. When overriding this method you should return an instance of COMMAND_DATA_CLASS:
return self.COMMAND_DATA( "/usr/bin/python", "-c", "print 'hello world'", env={"FOO": "bar"}, user="bob")
See CommandData‘s class documentation for a full description of possible arguments.
Please note however the default command data class, CommandData does not perform path expansion. So instead you have to handle this yourself with map_path().
- map_path(path)[source]¶
Takes a string argument. Translates a given path for any OS to what it should be on this particular node. This does not communicate with the master.
- expandvars(value, environment=None, expand=None)[source]¶
Expands variables inside of a string using an environment. Exp
Parameters: - value (string) – The path to expand
- environment (dict) – The environment to use for expanding value. If this value is None (the default) then we’ll use get_environment() to build this value.
- expand (bool) – When not provided we use the jobtype_expandvars configuration value to set the default. When this value is True we’ll perform environment variable expansion otherwise we return value untouched.
- start()[source]¶
This method is called when the job type should start working. Depending on the job type’s implementation this will prepare and start one more more processes.
- stop(assignment_failed=False, error=None, signal='KILL')[source]¶
This method is called when the job type should stop running. This will terminate any processes associated with this job type and also inform the master of any state changes to an associated task or tasks.
Parameters: - assignment_failed (boolean) – Whether this means the assignment has genuinely failed. By default, we assume that stopping this assignment was the result of deliberate user action (like stopping the job or shutting down the agent), and won’t treat it as a failed assignment.
- error (string) – If the assignment has failed, this string is upload as last_error for the failed tasks.
- signal (string) – The signal to send the any running processes. Valid options are KILL, TERM or INT.
- format_error(error)[source]¶
Takes some kind of object, typically an instance of Exception or :class`.Failure` and produces a human readable string. If we don’t know how to format the request object an error will be logged and nothing will be returned
- set_states(tasks, state, error=None)[source]¶
Wrapper around set_state() that that allows you to the state on the master for multiple tasks at once.
- set_task_state(task, state, error=None, dissociate_agent=False)[source]¶
Sets the state of the given task
Parameters: - task (dict) – The dictionary containing the task we’re changing the state for.
- state (string) – The state to change task to
- error (string, Exception) – If the state is changing to ‘error’ then also set the last_error column. Any exception instance that is passed to this keyword will be passed through format_exception() first to format it.
- get_local_task_state(task_id)[source]¶
Returns None if the state of this task has not been changed locally since this assignment has started. This method does not communicate with the master.
- is_successful(reason)[source]¶
Overridable. This method that determines whether the process referred to by a protocol instance has exited successfully.
The default implementation returns True if the process’s return code was 0 and False` in all other cases. If you need to modify this behavior please be aware that ``reason may be an integer or an instance of twisted.internet.error.ProcessTerminated if the process terminated without errors or an instance of twisted.python.failure.Failure if there were problems.
Raises NotImplementedError: Raised if we encounter a condition that the base implementation is unable to handle.
- before_start()[source]¶
Overridable. This method called directly before start() itself is called.
The default implementation does nothing and values returned from this method are ignored.
- before_spawn_process(command, protocol)[source]¶
Overridable. This method called directly before a process is spawned.
By default this method does nothing except log information about the command we’re about to launch both the the agent’s log and to the log file on disk.
Parameters: - command (CommandData) – An instance of CommandData which contains the environment to use, command and arguments. Modifications to this object will be applied to the process being spawned.
- protocol (ProcessProtocol) – An instance of pyfarm.jobtypes.core.process.ProcessProtocol which contains the protocol used to communicate between the process and this job type.
- process_stopped(protocol, reason)[source]¶
Overridable. This method called when a child process stopped running.
The default implementation will mark all tasks in the current assignment as done or failed of there was at least one failed process.
- process_started(protocol)[source]¶
Overridable. This method is called when a child process started running.
The default implementation will mark all tasks in the current assignment as running.
- process_output(protocol, output, line_fragments, line_handler)[source]¶
This is a mid-level method which takes output from a process protocol then splits and processes it to ensure we pass complete output lines to the other methods.
Implementors who wish to process the output line by line should override preprocess_stdout_line(), preprocess_stdout_line(), process_stdout_line() or process_stderr_line() instead. This method is a glue method between other parts of the job type and should only be overridden if there’s a problem or you want to change how lines are split.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced output
- output (string) – The blob of text or line produced
- line_fragments (dict) – The line fragment dictionary containing individual line fragments. This will be either self._stdout_line_fragments or self._stderr_line_fragments.
- line_handler (callable) – The function to handle any lines produced. This will be either handle_stdout_line() or handle_stderr_line()
Returns: This method returns nothing by default and any return value produced by this method will not be consumed by other methods.
- handle_stdout_line(protocol, stdout)[source]¶
Takes a ProcessProtocol instance and stdout line produced by process_output() and runs it through all the steps necessary to preprocess, format, log and handle the line.
The default implementation will run stdout through several methods in order:
Warning
This method is not private however it’s advisable to override the methods above instead of this one. Unlike this method, which is more generalized and invokes several other methods, the above provide more targeted functionality.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced stdout
- stderr (string) – A complete line to stderr being emitted by the process
Returns: This method returns nothing by default and any return value produced by this method will not be consumed by other methods.
- handle_stderr_line(protocol, stderr)[source]¶
Overridable. Takes a ProcessProtocol instance and stderr produced by process_output() and runs it through all the steps necessary to preprocess, format, log and handle the line.
The default implementation will run stderr through several methods in order:
Warning
This method is overridable however it’s advisable to override the methods above instead. Unlike this method, which is more generalized and invokes several other methods, the above provide more targeted functionality.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced stdout
- stderr (string) – A complete line to stderr being emitted by the process
Returns: This method returns nothing by default and any return value produced by this method will not be consumed by other methods.
- preprocess_stdout_line(protocol, stdout)[source]¶
Overridable. Provides the ability to manipulate stdout or protocol before it’s passed into any other line handling methods.
The default implementation does nothing.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced stdout
- stderr (string) – A complete line to stdout before any formatting or logging has occurred.
Return type: string
Returns: This method returns nothing by default but when overridden should return a string which will be used in line handling methods such as format_stdout_line(), log_stdout_line() and process_stdout_line().
- preprocess_stderr_line(protocol, stderr)[source]¶
Overridable. Formats a line from stdout before it’s passed onto methods such as log_stdout_line() and process_stdout_line().
The default implementation does nothing.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced stderr
- stderr (string) – A complete line to stderr before any formatting or logging has occurred.
Return type: string
Returns: This method returns nothing by default but when overridden should return a string which will be used in line handling methods such as format_stderr_line(), log_stderr_line() and process_stderr_line().
- format_stdout_line(protocol, stdout)[source]¶
Overridable. Formats a line from stdout before it’s passed onto methods such as log_stdout_line() and process_stdout_line().
The default implementation does nothing.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced stdout
- stdout (string) – A complete line from process to format and return.
Return type: string
Returns: This method returns nothing by default but when overridden should return a string which will be used in log_stdout_line() and process_stdout_line()
- format_stderr_line(protocol, stderr)[source]¶
Overridable. Formats a line from stderr before it’s passed onto methods such as log_stderr_line() and process_stderr_line().
The default implementation does nothing.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced stderr
- stderr (string) – A complete line from the process to format and return.
Return type: string
Returns: This method returns nothing by default but when overridden should return a string which will be used in log_stderr_line() and process_stderr_line()
- log_stdout_line(protocol, stdout)[source]¶
Overridable. Called when we receive a complete line on stdout from the process.
The default implementation will use the global logging pool to log stdout to a file.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced stdout
- stderr (string) – A complete line to stdout that has been formatted and is ready to log to a file.
Returns: This method returns nothing by default and any return value produced by this method will not be consumed by other methods.
- log_stderr_line(protocol, stderr)[source]¶
Overridable. Called when we receive a complete line on stderr from the process.
The default implementation will use the global logging pool to log stderr to a file.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced stderr
- stderr (string) – A complete line to stderr that has been formatted and is ready to log to a file.
Returns: This method returns nothing by default and any return value produced by this method will not be consumed by other methods.
- process_stderr_line(protocol, stderr)[source]¶
Overridable. This method is called when we receive a complete line to stderr. The line will be preformatted and will already have been sent for logging.
The default implementation sends ``stderr`` and ``protocol`` to :meth:`process_stdout_line`.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced stderr
- stderr (string) – A complete line to stderr after it has been formatted and logged.
Returns: This method returns nothing by default and any return value produced by this method will not be consumed by other methods.
- process_stdout_line(protocol, stdout)[source]¶
Overridable. This method is called when we receive a complete line to stdout. The line will be preformatted and will already have been sent for logging.
The default implementation does nothing.
Parameters: - protocol (ProcessProtocol) – The protocol instance which produced stderr
- stderr (string) – A complete line to stdout after it has been formatted and logged.
Returns: This method returns nothing by default and any return value produced by this method will not be consumed by other methods.
pyfarm.jobtypes.core.process module¶
Module responsible for connecting a Twisted process object and a job type. Additionally this module contains other classes which are useful in starting or managing a process.
- class pyfarm.jobtypes.core.process.ReplaceEnvironment(frozen_environment, environment=None)[source]¶
Bases: object
A context manager which will replace os.environ‘s, or dictionary of your choosing, for a short period of time. After exiting the context manager the original environment will be restored.
This is useful if you have something like a process that’s using global environment and you want to ensure that global environment is always consistent.
Parameters: environment (dict) – If provided, use this as the environment dictionary instead of os.environ
- class pyfarm.jobtypes.core.process.ProcessProtocol(jobtype)[source]¶
Bases: twisted.internet.protocol.ProcessProtocol
Subclass of Protocol which hooks into the various systems necessary to run and manage a process. More specifically, this helps to act as plumbing between the process being run and the job type.
- connectionMade()[source]¶
Called when the process first starts and the file descriptors have opened.
Module contents¶
Submodules¶
Module contents¶
Job Types¶
This package, pyfarm.jobtypes, contains the code which executes a task on an agent.