diff options
Diffstat (limited to 'taskcluster/docs')
-rw-r--r-- | taskcluster/docs/attributes.rst | 124 | ||||
-rw-r--r-- | taskcluster/docs/caches.rst | 43 | ||||
-rw-r--r-- | taskcluster/docs/docker-images.rst | 42 | ||||
-rw-r--r-- | taskcluster/docs/how-tos.rst | 220 | ||||
-rw-r--r-- | taskcluster/docs/index.rst | 30 | ||||
-rw-r--r-- | taskcluster/docs/kinds.rst | 144 | ||||
-rw-r--r-- | taskcluster/docs/loading.rst | 31 | ||||
-rw-r--r-- | taskcluster/docs/parameters.rst | 97 | ||||
-rw-r--r-- | taskcluster/docs/reference.rst | 12 | ||||
-rw-r--r-- | taskcluster/docs/taskgraph.rst | 276 | ||||
-rw-r--r-- | taskcluster/docs/transforms.rst | 198 | ||||
-rw-r--r-- | taskcluster/docs/yaml-templates.rst | 49 |
12 files changed, 1266 insertions, 0 deletions
diff --git a/taskcluster/docs/attributes.rst b/taskcluster/docs/attributes.rst new file mode 100644 index 0000000000..d93964d653 --- /dev/null +++ b/taskcluster/docs/attributes.rst @@ -0,0 +1,124 @@ +=============== +Task Attributes +=============== + +Tasks can be filtered, for example to support "try" pushes which only perform a +subset of the task graph or to link dependent tasks. This filtering is the +difference between a full task graph and a target task graph. + +Filtering takes place on the basis of attributes. Each task has a dictionary +of attributes and filters over those attributes can be expressed in Python. A +task may not have a value for every attribute. + +The attributes, and acceptable values, are defined here. In general, attribute +names and values are the short, lower-case form, with underscores. + +kind +==== + +A task's ``kind`` attribute gives the name of the kind that generated it, e.g., +``build`` or ``spidermonkey``. + +run_on_projects +=============== + +The projects where this task should be in the target task set. This is how +requirements like "only run this on inbound" get implemented. These are +either project names or the aliases + + * `integration` -- integration branches + * `release` -- release branches including mozilla-central + * `all` -- everywhere (the default) + +For try, this attribute applies only if ``-p all`` is specified. All jobs can +be specified by name regardless of ``run_on_projects``. + +If ``run_on_projects`` is set to an empty list, then the task will not run +anywhere, unless its build platform is specified explicitly in try syntax. + +task_duplicates +=============== + +This is used to indicate that we want multiple copies of the task created. +This feature is used to track down intermittent job failures. + +If this value is set to N, the task-creation machinery will create a total of N +copies of the task. Only the first copy will be included in the taskgraph +output artifacts, although all tasks will be contained in the same taskGroup. + +While most attributes are considered read-only, target task methods may alter +this attribute of tasks they include in the target set. + +build_platform +============== + +The build platform defines the platform for which the binary was built. It is +set for both build and test jobs, although test jobs may have a different +``test_platform``. + +build_type +========== + +The type of build being performed. This is a subdivision of ``build_platform``, +used for different kinds of builds that target the same platform. Values are + + * ``debug`` + * ``opt`` + +test_platform +============= + +The test platform defines the platform on which tests are run. It is only +defined for test jobs and may differ from ``build_platform`` when the same binary +is tested on several platforms (for example, on several versions of Windows). +This applies for both talos and unit tests. + +Unlike build_platform, the test platform is represented in a slash-separated +format, e.g., ``linux64/opt``. + +unittest_suite +============== + +This is the unit test suite being run in a unit test task. For example, +``mochitest`` or ``cppunittest``. + +unittest_flavor +=============== + +If a unittest suite has subdivisions, those are represented as flavors. Not +all suites have flavors, in which case this attribute should be set to match +the suite. Examples: ``mochitest-devtools-chrome-chunked`` or ``a11y``. + +unittest_try_name +================= + +This is the name used to refer to a unit test via try syntax. It +may not match either of ``unittest_suite`` or ``unittest_flavor``. + +talos_try_name +============== + +This is the name used to refer to a talos job via try syntax. + +test_chunk +========== + +This is the chunk number of a chunked test suite (talos or unittest). Note +that this is a string! + +e10s +==== + +For test suites which distinguish whether they run with or without e10s, this +boolean value identifies this particular run. + +image_name +========== + +For the ``docker_image`` kind, this attribute contains the docker image name. + +nightly +======= + +Signals whether the task is part of a nightly graph. Useful when filtering +out nightly tasks from full task set at target stage. diff --git a/taskcluster/docs/caches.rst b/taskcluster/docs/caches.rst new file mode 100644 index 0000000000..9f19035d72 --- /dev/null +++ b/taskcluster/docs/caches.rst @@ -0,0 +1,43 @@ +.. taskcluster_caches: + +============= +Common Caches +============= + +There are various caches used by the in-tree tasks. This page attempts to +document them and their appropriate use. + +Version Control Caches +====================== + +``level-{{level}}-checkouts-{{version}}`` + This cache holds version control checkouts, each in a subdirectory named + after the repo (e.g., ``gecko``). + + Checkouts should be read-only. If a task needs to create new files from + content of a checkout, this content should be written in a separate + directory/cache (like a workspace). + + A ``version`` parameter appears in the cache name to allow + backwards-incompatible changes to the cache's behavior. + +``level-{{level}}-{{project}}-tc-vcs`` (deprecated) + This cache is used internally by ``tc-vcs``. This tool is deprecated and + should be replaced with ``hg robustcheckout``. + +Workspace Caches +================ + +``level-{{level}}-*-workspace`` + These caches (of various names typically ending with ``workspace``) + contain state to be shared between task invocations. Use cases are + dependent on the task. + +Other +===== + +``tooltool-cache`` + Tooltool invocations should use this cache. Tooltool will store files here + indexed by their hash, and will verify hashes before copying files from + this directory, so there is no concern with sharing the cache between jobs + of different levels. diff --git a/taskcluster/docs/docker-images.rst b/taskcluster/docs/docker-images.rst new file mode 100644 index 0000000000..22dea4dead --- /dev/null +++ b/taskcluster/docs/docker-images.rst @@ -0,0 +1,42 @@ +.. taskcluster_dockerimages: + +============= +Docker Images +============= + +TaskCluster Docker images are defined in the source directory under +``testing/docker``. Each directory therein contains the name of an +image used as part of the task graph. + +Adding Extra Files to Images +============================ + +Dockerfile syntax has been extended to allow *any* file from the +source checkout to be added to the image build *context*. (Traditionally +you can only ``ADD`` files from the same directory as the Dockerfile.) + +Simply add the following syntax as a comment in a Dockerfile:: + + # %include <path> + +e.g. + + # %include mach + # %include testing/mozharness + +The argument to ``# %include`` is a relative path from the root level of +the source directory. It can be a file or a directory. If a file, only that +file will be added. If a directory, every file under that directory will be +added (even files that are untracked or ignored by version control). + +Files added using ``# %include`` syntax are available inside the build +context under the ``topsrcdir/`` path. + +Files are added as they exist on disk. e.g. executable flags should be +preserved. However, the file owner/group is changed to ``root`` and the +``mtime`` of the file is normalized. + +Here is an example Dockerfile snippet:: + + # %include mach + ADD topsrcdir/mach /home/worker/mach diff --git a/taskcluster/docs/how-tos.rst b/taskcluster/docs/how-tos.rst new file mode 100644 index 0000000000..6b143dd427 --- /dev/null +++ b/taskcluster/docs/how-tos.rst @@ -0,0 +1,220 @@ +How Tos +======= + +All of this equipment is here to help you get your work done more efficiently. +However, learning how task-graphs are generated is probably not the work you +are interested in doing. This section should help you accomplish some of the +more common changes to the task graph with minimal fuss. + +.. important:: + + If you cannot accomplish what you need with the information provided here, + please consider whether you can achieve your goal in a different way. + Perhaps something simpler would cost a bit more in compute time, but save + the much more expensive resource of developers' mental bandwidth. + Task-graph generation is already complex enough! + + If you want to proceed, you may need to delve into the implementation of + task-graph generation. The documentation and code are designed to help, as + are the authors - ``hg blame`` may help track down helpful people. + + As you write your new transform or add a new kind, please consider the next + developer. Where possible, make your change data-driven and general, so + that others can make a much smaller change. Document the semantics of what + you are changing clearly, especially if it involves modifying a transform + schema. And if you are adding complexity temporarily while making a + gradual transition, please open a new bug to remind yourself to remove the + complexity when the transition is complete. + +Hacking Task Graphs +------------------- + +The recommended process for changing task graphs is this: + +1. Find a recent decision task on the project or branch you are working on, + and download its ``parameters.yml`` from the Task Inspector. This file + contains all of the inputs to the task-graph generation process. Its + contents are simple enough if you would like to modify it, and it is + documented in :doc:`parameters`. + +2. Run one of the ``mach taskgraph`` subcommands (see :doc:`taskgraph`) to + generate a baseline against which to measure your changes. For example: + + .. code-block:: none + + ./mach taskgraph tasks --json -p parameters.yml > old-tasks.json + +3. Make your modifications under ``taskcluster/``. + +4. Run the same ``mach taskgraph`` command, sending the output to a new file, + and use ``diff`` to compare the old and new files. Make sure your changes + have the desired effect and no undesirable side-effects. + +5. When you are satisfied with the changes, push them to try to ensure that the + modified tasks work as expected. + +Common Changes +-------------- + +Changing Test Characteristics +............................. + +First, find the test description. This will be in +``taskcluster/ci/*/tests.yml``, for the appropriate kind (consult +:doc:`kinds`). You will find a YAML stanza for each test suite, and each +stanza defines the test's characteristics. For example, the ``chunks`` +property gives the number of chunks to run. This can be specified as a simple +integer if all platforms have the same chunk count, or it can be keyed by test +platform. For example: + +.. code-block:: yaml + + chunks: + by-test-platform: + linux64/debug: 10 + default: 8 + +The full set of available properties is in +``taskcluster/taskgraph/transform/tests/test_description.py``. Some other +commonly-modified properties are ``max-run-time`` (useful if tests are being +killed for exceeding maxRunTime) and ``treeherder-symbol``. + +.. note:: + + Android tests are also chunked at the mozharness level, so you will need to + modify the relevant mozharness config, as well. + +Adding a Test Suite +................... + +To add a new test suite, you will need to know the proper mozharness invocation +for that suite, and which kind it fits into (consult :doc:`kinds`). + +Add a new stanza to ``taskcluster/ci/<kind>/tests.yml``, copying from the other +stanzas in that file. The meanings should be clear, but authoritative +documentation is in +``taskcluster/taskgraph/transform/tests/test_description.py`` should you need +it. The stanza name is the name by which the test will be referenced in try +syntax. + +Add your new test to a test set in ``test-sets.yml`` in the same directory. If +the test should only run on a limited set of platforms, you may need to define +a new test set and reference that from the appropriate platforms in +``test-platforms.yml``. If you do so, include some helpful comments in +``test-sets.yml`` for the next person. + +Greening Up a New Test +...................... + +When a test is not yet reliably green, configuration for that test should not +be landed on integration branches. Of course, you can control where the +configuration is landed! For many cases, it is easiest to green up a test in +try: push the configuration to run the test to try along with your work to fix +the remaining test failures. + +When working with a group, check out a "twig" repository to share among your +group, and land the test configuration in that repository. Once the test is +green, merge to an integration branch and the test will begin running there as +well. + +Adding a New Task +................. + +If you are adding a new task that is not a test suite, there are a number of +options. A few questions to consider: + + * Is this a new build platform or variant that will produce an artifact to + be run through the usual test suites? + + * Does this task depend on other tasks? Do other tasks depend on it? + + * Is this one of a few related tasks, or will you need to generate a large + set of tasks using some programmatic means (for example, chunking)? + + * How is the task actually excuted? Mozharness? Mach? + + * What kind of environment does the task require? + +Armed with that information, you can choose among a few options for +implementing this new task. Try to choose the simplest solution that will +satisfy your near-term needs. Since this is all implemented in-tree, it +is not difficult to refactor later when you need more generality. + +Existing Kind +````````````` + +The simplest option is to add your task to an existing kind. This is most +practical when the task "makes sense" as part of that kind -- for example, if +your task is building an installer for a new platform using mozharness scripts +similar to the existing build tasks, it makes most sense to add your task to +the ``build`` kind. If you need some additional functionality in the kind, +it's OK to modify the implementation as necessary, as long as the modification +is complete and useful to the next developer to come along. + +New Kind +```````` + +The next option to consider is adding a new kind. A distinct kind gives you +some isolation from other task types, which can be nice if you are adding an +experimental kind of task. + +Kinds can range in complexity. The simplest sort of kind uses the +``TransformTask`` implementation to read a list of jobs from the ``jobs`` key, +and applies the standard ``job`` and ``task`` transforms: + +.. code-block:: yaml + + implementation: taskgraph.task.transform:TransformTask + transforms: + - taskgraph.transforms.job:transforms + - taskgraph.transforms.task:transforms + jobs: + - ..your job description here.. + +Custom Kind Implementation +`````````````````````````` + +If your task depends on other tasks, then the decision of which tasks to create +may require some code. For example, the ``upload-symbols`` kind iterates over +the builds in the graph, generating a task for each one. This specific +post-build behavior is implemented in the general +``taskgraph.task.post_build:PostBuildTask`` kind implementation. If your task +needs something more purpose-specific, then it may be time to write a new kind +implementation. + +Custom Transforms +````````````````` + +If your task needs to create many tasks from a single description, for example +to implement chunking, it is time to implement some custom transforms. Ideally +those transforms will produce job descriptions, so you can use the existing ``job`` +and ``task`` transforms: + +.. code-block:: yaml + + transforms: + - taskgraph.transforms.my_stuff:transforms + - taskgraph.transforms.job:transforms + - taskgraph.transforms.task:transforms + +Similarly, if you need to include dynamic task defaults -- perhaps some feature +is only available in level-3 repositories, or on specific projects -- then +custom transforms are the appropriate tool. Try to keep transforms simple, +single-purpose and well-documented! + +Custom Run-Using +```````````````` + +If the way your task is executed is unique (so, not a mach command or +mozharness invocation), you can add a new implementation of the job +description's "run" section. Before you do this, consider that it might be a +better investment to modify your task to support invocation via mozharness or +mach, instead. If this is not possible, then adding a new file in +``taskcluster/taskgraph/transforms/jobs`` with a structure similar to its peers +will make the new run-using option available for job descriptions. + +Something Else? +............... + +If you make another change not described here that turns out to be simple or +common, please include an update to this file in your patch. diff --git a/taskcluster/docs/index.rst b/taskcluster/docs/index.rst new file mode 100644 index 0000000000..d1a5c600b4 --- /dev/null +++ b/taskcluster/docs/index.rst @@ -0,0 +1,30 @@ +.. taskcluster_index: + +TaskCluster Task-Graph Generation +================================= + +The ``taskcluster`` directory contains support for defining the graph of tasks +that must be executed to build and test the Gecko tree. This is more complex +than you might suppose! This implementation supports: + + * A huge array of tasks + * Different behavior for different repositories + * "Try" pushes, with special means to select a subset of the graph for execution + * Optimization -- skipping tasks that have already been performed + * Extremely flexible generation of a variety of tasks using an approach of + incrementally transforming job descriptions into task definitions. + +This section of the documentation describes the process in some detail, +referring to the source where necessary. If you are reading this with a +particular goal in mind and would rather avoid becoming a task-graph expert, +check out the :doc:`how-to section <how-tos>`. + +.. toctree:: + + taskgraph + loading + transforms + yaml-templates + docker-images + how-tos + reference diff --git a/taskcluster/docs/kinds.rst b/taskcluster/docs/kinds.rst new file mode 100644 index 0000000000..44bddb360b --- /dev/null +++ b/taskcluster/docs/kinds.rst @@ -0,0 +1,144 @@ +Task Kinds +========== + +This section lists and documents the available task kinds. + +build +------ + +Builds are tasks that produce an installer or other output that can be run by +users or automated tests. This is more restrictive than most definitions of +"build" in a Mozilla context: it does not include tasks that run build-like +actions for static analysis or to produce instrumented artifacts. + +artifact-build +-------------- + +This kind performs an artifact build: one based on precompiled binaries +discovered via the TaskCluster index. This task verifies that such builds +continue to work correctly. + +hazard +------ + +Hazard builds are similar to "regular' builds, but use a compiler extension to +extract a bunch of data from the build and then analyze that data looking for +hazardous behaviors. + +l10n +---- + +TBD (Callek) + +source-check +------------ + +Source-checks are tasks that look at the Gecko source directly to check +correctness. This can include linting, Python unit tests, source-code +analysis, or measurement work -- basically anything that does not require a +build. + +upload-symbols +-------------- + +Upload-symbols tasks run after builds and upload the symbols files generated by +build tasks to Socorro for later use in crash analysis. + +valgrind +-------- + +Valgrind tasks produce builds instrumented by valgrind. + +static-analysis +--------------- + +Static analysis builds use the compiler to perform some detailed analysis of +the source code while building. The useful output from these tasks are their +build logs, and while they produce a binary, they do not upload it as an +artifact. + +toolchain +--------- + +Toolchain builds create the compiler toolchains used to build Firefox. These +will eventually be dependencies of the builds themselves, but for the moment +are run manually via try pushes and the results uploaded to tooltool. + +spidermonkey +------------ + +Spidermonkey tasks check out the full gecko source tree, then compile only the +spidermonkey portion. Each task runs specific tests after the build. + +marionette-harness +------------------ + +TBD (Maja) + +Tests +----- + +Test tasks for Gecko products are divided into several kinds, but share a +common implementation. The process goes like this, based on a set of YAML +files named in ``kind.yml``: + + * For each build task, determine the related test platforms based on the build + platform. For example, a Windows 2010 build might be tested on Windows 7 + and Windows 10. Each test platform specifies a "test set" indicating which + tests to run. This is configured in the file named + ``test-platforms.yml``. + + * Each test set is expanded to a list of tests to run. This is configured in + the file named by ``test-sets.yml``. + + * Each named test is looked up in the file named by ``tests.yml`` to find a + test description. This test description indicates what the test does, how + it is reported to treeherder, and how to perform the test, all in a + platform-independent fashion. + + * Each test description is converted into one or more tasks. This is + performed by a sequence of transforms defined in the ``transforms`` key in + ``kind.yml``. See :doc:`transforms`: for more information on these + transforms. + + * The resulting tasks become a part of the task graph. + +.. important:: + + This process generates *all* test jobs, regardless of tree or try syntax. + It is up to a later stage of the task-graph generation (the target set) to + select the tests that will actually be performed. + +desktop-test +............ + +The ``desktop-test`` kind defines tests for Desktop builds. Its ``tests.yml`` +defines the full suite of desktop tests and their particulars, leaving it to +the transforms to determine how those particulars apply to Linux, OS X, and +Windows. + +android-test +............ + +The ``android-test`` kind defines tests for Android builds. + +It is very similar to ``desktop-test``, but the details of running the tests +differ substantially, so they are defined separately. + +docker-image +------------ + +Tasks of the ``docker-image`` kind build the Docker images in which other +Docker tasks run. + +The tasks to generate each docker image have predictable labels: +``build-docker-image-<name>``. + +Docker images are built from subdirectories of ``testing/docker``, using +``docker build``. There is currently no capability for one Docker image to +depend on another in-tree docker image, without uploading the latter to a +Docker repository + +The task definition used to create the image-building tasks is given in +``image.yml`` in the kind directory, and is interpreted as a :doc:`YAML +Template <yaml-templates>`. diff --git a/taskcluster/docs/loading.rst b/taskcluster/docs/loading.rst new file mode 100644 index 0000000000..1fa3c50f1e --- /dev/null +++ b/taskcluster/docs/loading.rst @@ -0,0 +1,31 @@ +Loading Tasks +============= + +The full task graph generation involves creating tasks for each kind. Kinds +are ordered to satisfy ``kind-dependencies``, and then the ``implementation`` +specified in ``kind.yml`` is used to load the tasks for that kind. + +Specifically, the class's ``load_tasks`` class method is called, and returns a +list of new ``Task`` instances. + +TransformTask +------------- + +Most kinds generate their tasks by starting with a set of items describing the +jobs that should be performed and transforming them into task definitions. +This is the familiar ``transforms`` key in ``kind.yml`` and is further +documented in :doc:`transforms`. + +Such kinds generally specify their tasks in a common format: either based on a +``jobs`` property in ``kind.yml``, or on YAML files listed in ``jobs-from``. +This is handled by the ``TransformTask`` class in +``taskcluster/taskgraph/task/transform.py``. + +For kinds producing tasks that depend on other tasks -- for example, signing +tasks depend on build tasks -- ``TransformTask`` has a ``get_inputs`` method +that can be overridden in subclasses and written to return a set of items based +on tasks that already exist. You can see a nice example of this behavior in +``taskcluster/taskgraph/task/post_build.py``. + +For more information on how all of this works, consult the docstrings and +comments in the source code itself. diff --git a/taskcluster/docs/parameters.rst b/taskcluster/docs/parameters.rst new file mode 100644 index 0000000000..8514259cef --- /dev/null +++ b/taskcluster/docs/parameters.rst @@ -0,0 +1,97 @@ +========== +Parameters +========== + +Task-graph generation takes a collection of parameters as input, in the form of +a JSON or YAML file. + +During decision-task processing, some of these parameters are supplied on the +command line or by environment variables. The decision task helpfully produces +a full parameters file as one of its output artifacts. The other ``mach +taskgraph`` commands can take this file as input. This can be very helpful +when working on a change to the task graph. + +When experimenting with local runs of the task-graph generation, it is always +best to find a recent decision task's ``parameters.yml`` file, and modify that +file if necessary, rather than starting from scratch. This ensures you have a +complete set of parameters. + +The properties of the parameters object are described here, divided rougly by +topic. + +Push Information +---------------- + +``triggered_by`` + The event that precipitated this decision task; one of ``"nightly"`` or + ``"push"``. + +``base_repository`` + The repository from which to do an initial clone, utilizing any available + caching. + +``head_repository`` + The repository containing the changeset to be built. This may differ from + ``base_repository`` in cases where ``base_repository`` is likely to be cached + and only a few additional commits are needed from ``head_repository``. + +``head_rev`` + The revision to check out; this can be a short revision string + +``head_ref`` + For Mercurial repositories, this is the same as ``head_rev``. For + git repositories, which do not allow pulling explicit revisions, this gives + the symbolic ref containing ``head_rev`` that should be pulled from + ``head_repository``. + +``owner`` + Email address indicating the person who made the push. Note that this + value may be forged and *must not* be relied on for authentication. + +``message`` + The commit message + +``pushlog_id`` + The ID from the ``hg.mozilla.org`` pushlog + +``pushdate`` + The timestamp of the push to the repository that triggered this decision + task. Expressed as an integer seconds since the UNIX epoch. + +``build_date`` + The timestamp of the build date. Defaults to ``pushdate`` and falls back to present time of + taskgraph invocation. Expressed as an integer seconds since the UNIX epoch. + +``moz_build_date`` + A formatted timestamp of ``build_date``. Expressed as a string with the following + format: %Y%m%d%H%M%S + +Tree Information +---------------- + +``project`` + Another name for what may otherwise be called tree or branch or + repository. This is the unqualified name, such as ``mozilla-central`` or + ``cedar``. + +``level`` + The `SCM level + <https://www.mozilla.org/en-US/about/governance/policies/commit/access-policy/>`_ + associated with this tree. This dictates the names of resources used in the + generated tasks, and those tasks will fail if it is incorrect. + +Target Set +---------- + +The "target set" is the set of task labels which must be included in a task +graph. The task graph generation process will include any tasks required by +those in the target set, recursively. In a decision task, this set can be +specified programmatically using one of a variety of methods (e.g., parsing try +syntax or reading a project-specific configuration file). + +``target_tasks_method`` + The method to use to determine the target task set. This is the suffix of + one of the functions in ``tascluster/taskgraph/target_tasks.py``. + +``optimize_target_tasks`` + If true, then target tasks are eligible for optimization. diff --git a/taskcluster/docs/reference.rst b/taskcluster/docs/reference.rst new file mode 100644 index 0000000000..813a3f630a --- /dev/null +++ b/taskcluster/docs/reference.rst @@ -0,0 +1,12 @@ +Reference +========= + +These sections contain some reference documentation for various aspects of +taskgraph generation. + +.. toctree:: + + kinds + parameters + attributes + caches diff --git a/taskcluster/docs/taskgraph.rst b/taskcluster/docs/taskgraph.rst new file mode 100644 index 0000000000..5d3e7c7d3f --- /dev/null +++ b/taskcluster/docs/taskgraph.rst @@ -0,0 +1,276 @@ +====================== +TaskGraph Mach Command +====================== + +The task graph is built by linking different kinds of tasks together, pruning +out tasks that are not required, then optimizing by replacing subgraphs with +links to already-completed tasks. + +Concepts +-------- + +* *Task Kind* - Tasks are grouped by kind, where tasks of the same kind do not + have interdependencies but have substantial similarities, and may depend on + tasks of other kinds. Kinds are the primary means of supporting diversity, + in that a developer can add a new kind to do just about anything without + impacting other kinds. + +* *Task Attributes* - Tasks have string attributes by which can be used for + filtering. Attributes are documented in :doc:`attributes`. + +* *Task Labels* - Each task has a unique identifier within the graph that is + stable across runs of the graph generation algorithm. Labels are replaced + with TaskCluster TaskIds at the latest time possible, facilitating analysis + of graphs without distracting noise from randomly-generated taskIds. + +* *Optimization* - replacement of a task in a graph with an equivalent, + already-completed task, or a null task, avoiding repetition of work. + +Kinds +----- + +Kinds are the focal point of this system. They provide an interface between +the large-scale graph-generation process and the small-scale task-definition +needs of different kinds of tasks. Each kind may implement task generation +differently. Some kinds may generate task definitions entirely internally (for +example, symbol-upload tasks are all alike, and very simple), while other kinds +may do little more than parse a directory of YAML files. + +A ``kind.yml`` file contains data about the kind, as well as referring to a +Python class implementing the kind in its ``implementation`` key. That +implementation may rely on lots of code shared with other kinds, or contain a +completely unique implementation of some functionality. + +The full list of pre-defined keys in this file is: + +``implementation`` + Class implementing this kind, in the form ``<module-path>:<object-path>``. + This class should be a subclass of ``taskgraph.kind.base:Kind``. + +``kind-dependencies`` + Kinds which should be loaded before this one. This is useful when the kind + will use the list of already-created tasks to determine which tasks to + create, for example adding an upload-symbols task after every build task. + +Any other keys are subject to interpretation by the kind implementation. + +The result is a nice segmentation of implementation so that the more esoteric +in-tree projects can do their crazy stuff in an isolated kind without making +the bread-and-butter build and test configuration more complicated. + +Dependencies +------------ + +Dependencies between tasks are represented as labeled edges in the task graph. +For example, a test task must depend on the build task creating the artifact it +tests, and this dependency edge is named 'build'. The task graph generation +process later resolves these dependencies to specific taskIds. + +Decision Task +------------- + +The decision task is the first task created when a new graph begins. It is +responsible for creating the rest of the task graph. + +The decision task for pushes is defined in-tree, in ``.taskcluster.yml``. That +task description invokes ``mach taskcluster decision`` with some metadata about +the push. That mach command determines the optimized task graph, then calls +the TaskCluster API to create the tasks. + +Note that this mach command is *not* designed to be invoked directly by humans. +Instead, use the mach commands described below, supplying ``parameters.yml`` +from a recent decision task. These commands allow testing everything the +decision task does except the command-line processing and the +``queue.createTask`` calls. + +Graph Generation +---------------- + +Graph generation, as run via ``mach taskgraph decision``, proceeds as follows: + +#. For all kinds, generate all tasks. The result is the "full task set" +#. Create dependency links between tasks using kind-specific mechanisms. The + result is the "full task graph". +#. Select the target tasks (based on try syntax or a tree-specific + specification). The result is the "target task set". +#. Based on the full task graph, calculate the transitive closure of the target + task set. That is, the target tasks and all requirements of those tasks. + The result is the "target task graph". +#. Optimize the target task graph based on kind-specific optimization methods. + The result is the "optimized task graph" with fewer nodes than the target + task graph. +#. Create tasks for all tasks in the optimized task graph. + +Transitive Closure +.................. + +Transitive closure is a fancy name for this sort of operation: + + * start with a set of tasks + * add all tasks on which any of those tasks depend + * repeat until nothing changes + +The effect is this: imagine you start with a linux32 test job and a linux64 test job. +In the first round, each test task depends on the test docker image task, so add that image task. +Each test also depends on a build, so add the linux32 and linux64 build tasks. + +Then repeat: the test docker image task is already present, as are the build +tasks, but those build tasks depend on the build docker image task. So add +that build docker image task. Repeat again: this time, none of the tasks in +the set depend on a task not in the set, so nothing changes and the process is +complete. + +And as you can see, the graph we've built now includes everything we wanted +(the test jobs) plus everything required to do that (docker images, builds). + +Optimization +------------ + +The objective of optimization to remove as many tasks from the graph as +possible, as efficiently as possible, thereby delivering useful results as +quickly as possible. For example, ideally if only a test script is modified in +a push, then the resulting graph contains only the corresponding test suite +task. + +A task is said to be "optimized" when it is either replaced with an equivalent, +already-existing task, or dropped from the graph entirely. + +A task can be optimized if all of its dependencies can be optimized and none of +its inputs have changed. For a task on which no other tasks depend (a "leaf +task"), the optimizer can determine what has changed by looking at the +version-control history of the push: if the relevant files are not modified in +the push, then it considers the inputs unchanged. For tasks on which other +tasks depend ("non-leaf tasks"), the optimizer must replace the task with +another, equivalent task, so it generates a hash of all of the inputs and uses +that to search for a matching, existing task. + +In some cases, such as try pushes, tasks in the target task set have been +explicitly requested and are thus excluded from optimization. In other cases, +the target task set is almost the entire task graph, so targetted tasks are +considered for optimization. This behavior is controlled with the +``optimize_target_tasks`` parameter. + +Action Tasks +------------ + +Action Tasks are tasks which help you to schedule new jobs via Treeherder's +"Add New Jobs" feature. The Decision Task creates a YAML file named +``action.yml`` which can be used to schedule Action Tasks after suitably replacing +``{{decision_task_id}}`` and ``{{task_labels}}``, which correspond to the decision +task ID of the push and a comma separated list of task labels which need to be +scheduled. + +This task invokes ``mach taskgraph action-task`` which builds up a task graph of +the requested tasks. This graph is optimized using the tasks running initially in +the same push, due to the decision task. + +So for instance, if you had already requested a build task in the ``try`` command, +and you wish to add a test which depends on this build, the original build task +is re-used. + +Action Tasks are currently scheduled by +[pulse_actions](https://github.com/mozilla/pulse_actions). This feature is only +present on ``try`` pushes for now. + +Mach commands +------------- + +A number of mach subcommands are available aside from ``mach taskgraph +decision`` to make this complex system more accesssible to those trying to +understand or modify it. They allow you to run portions of the +graph-generation process and output the results. + +``mach taskgraph tasks`` + Get the full task set + +``mach taskgraph full`` + Get the full task graph + +``mach taskgraph target`` + Get the target task set + +``mach taskgraph target-graph`` + Get the target task graph + +``mach taskgraph optimized`` + Get the optimized task graph + +Each of these commands taskes a ``--parameters`` option giving a file with +parameters to guide the graph generation. The decision task helpfully produces +such a file on every run, and that is generally the easiest way to get a +parameter file. The parameter keys and values are described in +:doc:`parameters`; using that information, you may modify an existing +``parameters.yml`` or create your own. + +Task Parameterization +--------------------- + +A few components of tasks are only known at the very end of the decision task +-- just before the ``queue.createTask`` call is made. These are specified +using simple parameterized values, as follows: + +``{"relative-datestamp": "certain number of seconds/hours/days/years"}`` + Objects of this form will be replaced with an offset from the current time + just before the ``queue.createTask`` call is made. For example, an + artifact expiration might be specified as ``{"relative-timestamp": "1 + year"}``. + +``{"task-reference": "string containing <dep-name>"}`` + The task definition may contain "task references" of this form. These will + be replaced during the optimization step, with the appropriate taskId for + the named dependency substituted for ``<dep-name>`` in the string. + Multiple labels may be substituted in a single string, and ``<<>`` can be + used to escape a literal ``<``. + +Taskgraph JSON Format +--------------------- + +Task graphs -- both the graph artifacts produced by the decision task and those +output by the ``--json`` option to the ``mach taskgraph`` commands -- are JSON +objects, keyed by label, or for optimized task graphs, by taskId. For +convenience, the decision task also writes out ``label-to-taskid.json`` +containing a mapping from label to taskId. Each task in the graph is +represented as a JSON object. + +Each task has the following properties: + +``task_id`` + The task's taskId (only for optimized task graphs) + +``label`` + The task's label + +``attributes`` + The task's attributes + +``dependencies`` + The task's in-graph dependencies, represented as an object mapping + dependency name to label (or to taskId for optimized task graphs) + +``task`` + The task's TaskCluster task definition. + +``kind_implementation`` + The module and the class name which was used to implement this particular task. + It is always of the form ``<module-path>:<object-path>`` + +The results from each command are in the same format, but with some differences +in the content: + +* The ``tasks`` and ``target`` subcommands both return graphs with no edges. + That is, just collections of tasks without any dependencies indicated. + +* The ``optimized`` subcommand returns tasks that have been assigned taskIds. + The dependencies array, too, contains taskIds instead of labels, with + dependencies on optimized tasks omitted. However, the ``task.dependencies`` + array is populated with the full list of dependency taskIds. All task + references are resolved in the optimized graph. + +The output of the ``mach taskgraph`` commands are suitable for processing with +the `jq <https://stedolan.github.io/jq/>`_ utility. For example, to extract all +tasks' labels and their dependencies: + +.. code-block:: shell + + jq 'to_entries | map({label: .value.label, dependencies: .value.dependencies})' + diff --git a/taskcluster/docs/transforms.rst b/taskcluster/docs/transforms.rst new file mode 100644 index 0000000000..1679c55894 --- /dev/null +++ b/taskcluster/docs/transforms.rst @@ -0,0 +1,198 @@ +Transforms +========== + +Many task kinds generate tasks by a process of transforming job descriptions +into task definitions. The basic operation is simple, although the sequence of +transforms applied for a particular kind may not be! + +Overview +-------- + +To begin, a kind implementation generates a collection of items; see +:doc:`loading`. The items are simply Python dictionaries, and describe +"semantically" what the resulting task or tasks should do. + +The kind also defines a sequence of transformations. These are applied, in +order, to each item. Early transforms might apply default values or break +items up into smaller items (for example, chunking a test suite). Later +transforms rewrite the items entirely, with the final result being a task +definition. + +Transform Functions +................... + +Each transformation looks like this: + +.. code-block:: + + @transforms.add + def transform_an_item(config, items): + """This transform ...""" # always a docstring! + for item in items: + # .. + yield item + +The ``config`` argument is a Python object containing useful configuration for +the kind, and is a subclass of +:class:`taskgraph.transforms.base.TransformConfig`, which specifies a few of +its attributes. Kinds may subclass and add additional attributes if necessary. + +While most transforms yield one item for each item consumed, this is not always +true: items that are not yielded are effectively filtered out. Yielding +multiple items for each consumed item implements item duplication; this is how +test chunking is accomplished, for example. + +The ``transforms`` object is an instance of +:class:`taskgraph.transforms.base.TransformSequence`, which serves as a simple +mechanism to combine a sequence of transforms into one. + +Schemas +....... + +The items used in transforms are validated against some simple schemas at +various points in the transformation process. These schemas accomplish two +things: they provide a place to add comments about the meaning of each field, +and they enforce that the fields are actually used in the documented fashion. + +Keyed By +........ + +Several fields in the input items can be "keyed by" another value in the item. +For example, a test description's chunks may be keyed by ``test-platform``. +In the item, this looks like: + +.. code-block:: yaml + + chunks: + by-test-platform: + linux64/debug: 12 + linux64/opt: 8 + default: 10 + +This is a simple but powerful way to encode business rules in the items +provided as input to the transforms, rather than expressing those rules in the +transforms themselves. If you are implementing a new business rule, prefer +this mode where possible. The structure is easily resolved to a single value +using :func:`taskgraph.transform.base.get_keyed_by`. + +Organization +------------- + +Task creation operates broadly in a few phases, with the interfaces of those +stages defined by schemas. The process begins with the raw data structures +parsed from the YAML files in the kind configuration. This data can processed +by kind-specific transforms resulting, for test jobs, in a "test description". +For non-test jobs, the next step is a "job description". These transformations +may also "duplicate" tasks, for example to implement chunking or several +variations of the same task. + +In any case, shared transforms then convert this into a "task description", +which the task-generation transforms then convert into a task definition +suitable for ``queue.createTask``. + +Test Descriptions +----------------- + +The transforms configured for test kinds proceed as follows, based on +configuration in ``kind.yml``: + + * The test description is validated to conform to the schema in + ``taskcluster/taskgraph/transforms/tests/test_description.py``. This schema + is extensively documented and is a the primary reference for anyone + modifying tests. + + * Kind-specific transformations are applied. These may apply default + settings, split tests (e.g., one to run with feature X enabled, one with it + disabled), or apply across-the-board business rules such as "all desktop + debug test platforms should have a max-run-time of 5400s". + + * Transformations generic to all tests are applied. These apply policies + which apply to multiple kinds, e.g., for treeherder tiers. This is also the + place where most values which differ based on platform are resolved, and + where chunked tests are split out into a test per chunk. + + * The test is again validated against the same schema. At this point it is + still a test description, just with defaults and policies applied, and + per-platform options resolved. So transforms up to this point do not modify + the "shape" of the test description, and are still governed by the schema in + ``test_description.py``. + + * The ``taskgraph.transforms.tests.make_task_description:transforms`` then + take the test description and create a *task* description. This transform + embodies the specifics of how test runs work: invoking mozharness, various + worker options, and so on. + + * Finally, the ``taskgraph.transforms.task:transforms``, described above + under "Task-Generation Transforms", are applied. + +Test dependencies are produced in the form of a dictionary mapping dependency +name to task label. + +Job Descriptions +---------------- + +A job description says what to run in the task. It is a combination of a +``run`` section and all of the fields from a task description. The run section +has a ``using`` property that defines how this task should be run; for example, +``mozharness`` to run a mozharness script, or ``mach`` to run a mach command. +The remainder of the run section is specific to the run-using implementation. + +The effect of a job description is to say "run this thing on this worker". The +job description must contain enough information about the worker to identify +the workerType and the implementation (docker-worker, generic-worker, etc.). +Any other task-description information is passed along verbatim, although it is +augmented by the run-using implementation. + +The run-using implementations are all located in +``taskcluster/taskgraph/transforms/job``, along with the schemas for their +implementations. Those well-commented source files are the canonical +documentation for what constitutes a job description, and should be considered +part of the documentation. + +Task Descriptions +----------------- + +Every kind needs to create tasks, and all of those tasks have some things in +common. They all run on one of a small set of worker implementations, each +with their own idiosyncracies. And they all report to TreeHerder in a similar +way. + +The transforms in ``taskcluster/taskgraph/transforms/task.py`` implement +this common functionality. They expect a "task description", and produce a +task definition. The schema for a task description is defined at the top of +``task.py``, with copious comments. Go forth and read it now! + +In general, the task-description transforms handle functionality that is common +to all Gecko tasks. While the schema is the definitive reference, the +functionality includes: + +* TreeHerder metadata + +* Build index routes + +* Information about the projects on which this task should run + +* Optimizations + +* Defaults for ``expires-after`` and and ``deadline-after``, based on project + +* Worker configuration + +The parts of the task description that are specific to a worker implementation +are isolated in a ``task_description['worker']`` object which has an +``implementation`` property naming the worker implementation. Each worker +implementation has its own section of the schema describing the fields it +expects. Thus the transforms that produce a task description must be aware of +the worker implementation to be used, but need not be aware of the details of +its payload format. + +The ``task.py`` file also contains a dictionary mapping treeherder groups to +group names using an internal list of group names. Feel free to add additional +groups to this list as necessary. + +More Detail +----------- + +The source files provide lots of additional detail, both in the code itself and +in the comments and docstrings. For the next level of detail beyond this file, +consult the transform source under ``taskcluster/taskgraph/transforms``. diff --git a/taskcluster/docs/yaml-templates.rst b/taskcluster/docs/yaml-templates.rst new file mode 100644 index 0000000000..515999e608 --- /dev/null +++ b/taskcluster/docs/yaml-templates.rst @@ -0,0 +1,49 @@ +Task Definition YAML Templates +============================== + +A few kinds of tasks are described using templated YAML files. These files +allow some limited forms of inheritance and template substitution as well as +the usual YAML features, as described below. + +Please do not use these features in new kinds. If you are tempted to use +variable substitution over a YAML file to define tasks, please instead +implement a new kind-specific transform to accopmlish your goal. For example, +if the current push-id must be included as an argument in +``task.payload.command``, write a transform function that makes that assignment +while building a job description, rather than parameterizing that value in the +input to the transforms. + +Inheritance +----------- + +One YAML file can "inherit" from another by including a top-level ``$inherits`` +key. That key specifies the parent file in ``from``, and optionally a +collection of variables in ``variables``. For example: + +.. code-block:: yaml + + $inherits: + from: 'tasks/builds/base_linux32.yml' + variables: + build_name: 'linux32' + build_type: 'dbg' + +Inheritance proceeds as follows: First, the child document has its template +substitutions performed and is parsed as YAML. Then, the parent document is +parsed, with substitutions specified by ``variables`` added to the template +substitutions. Finally, the child document is merged with the parent. + +To merge two JSON objects (dictionaries), each value is merged individually. +Lists are merged by concatenating the lists from the parent and child +documents. Atomic values (strings, numbers, etc.) are merged by preferring the +child document's value. + +Substitution +------------ + +Each document is expanded using the PyStache template engine before it is +parsed as YAML. The parameters for this expansion are specific to the task +kind. + +Simple value substitution looks like ``{{variable}}``. Function calls look +like ``{{#function}}argument{{/function}}``. |