apftask - task coordination service

The apftask service is used by discrete APF processes to declare themselves as “tasks”. The objective is to provide two-way feedback: status reports from the task process itself, and task control directives from a higher-level control mechanism. Note that starting a task is not presently supported by the apftask service, though that could be implemented with a modest extension of the task architecture.

Each defined task has a templated set of keywords established on its behalf; custom keywords can be added to support the needs of individual tasks. Establishing new tasks and adding keywords to existing tasks are both straightforward, and require minimal time to push through.

What is an APF task?

A task is any discrete operation whose status is of interest to other software components. At the highest level, a single piece of software may be stepping through a sequence of observations and preparatory steps; if each subsystem used by that high level software is a defined task, it can use the same fundamental approach to monitor and interact with each of these subsystems. That high-level operations sequencer should itself be established as a task, even if only to simplify monitoring its status.

Example tasks could include taking calibrations, running a focus cube, or performing an observation of a designated star. In each of these cases, the operation takes an extended time to complete, and is by no means atomic. Each of these examples also works through phases of operation, and in some cases, iterative steps within a discrete phase.

Simple operations that are reasonably atomic are not well suited to representation as a task; if an operation can be described by a single KTL keyword, or even a small set of related KTL keywords, it will be more straightforward to use those keywords directly. For example, it would make little sense to establish a task that reports the status of setting (or clearing) the emergency stop at the APF.

The full set of tasks known to the apftask service is published via the TASKS keyword.

Task keywords

There are two classes of keywords established for every task: keywords of both internal and external interest, and internal keywords used by the apftask dispatcher and its support applications to monitor tasks. Unless stated otherwise, all task-related keyword values are configured to be cached, and will persist not only across task restarts, but also across restarts of the apftask dispatcher itself.

In the descriptions below, the task prefix will be TASK, where normally it would be the prefix appropriate for that specific task; for example, the documentation below lists TASK_CONTROL, but the keyword for a specific task may be CALIBRATE_CONTROL, or SCRIPTOBS_CONTROL.

External keywords

  • TASK_CONTROL: the CONTROL keyword is set to one of a few discrete values; it is incumbent upon the task implementation to honor a given CONTROL request.

    Value Action
    Proceed The task should proceed normally.
    Pause The task should pause operations, but not exit.
    Abort The task should cease operations and exit cleanly.

    If a task is paused, the CONTROL keyword should be set to Proceed to signal the task to resume activities. If the task exits for any reason, the CONTROL keyword will automatically reset to Proceed. The CONTROL keyword cannot be set if a task is not running.

  • TASK_STATUS: the STATUS keyword reflects the current overall state of the task process. Typically, the only status values set by a task implementation are the exit status; the remainder are generally set by the apftask as a result of other status changes, in particular, the internal PS_STATE keyword.

    Value Meaning
    Running The task is currently running. Running is set when a task establishes itself.
    Pausing Pausing is set by the dispatcher after the CONTROL keyword is set to Pause. If the CONTROL modify request was blocking, it will block until the status changes to Paused.
    Paused The task has successfully paused. Paused is either set directly by the task implementation, or implicitly by the apftask interface toolkit.
    Exited/Success The task has successfully completed. Exited/Success is only set by the task implementation, and should be the last operation performed before the task exits.
    Exited/Failure The task failed to successfully complete. Exited/Failure is only set by the task implementation, and should be the last operation performed as part of an error-handling routine before the task exits.
    Exited/Unknown The task did not set a status before exiting. Exited/Unknown is only set by the dispatcher if a task was in a non-exited state and it receives notification that the task is no longer running.
  • TASK_MESSAGE: the MESSAGE keyword provides descriptive feedback about the activities of the task. It can be set by the apftask dispatcher, by the apftask interface toolkit, or by the task implementation.
  • TASK_PHASE: the PHASE keyword is only set by the task implementation. It should be set to a descriptive string for each discrete phase of operations that the task enters. This information is useful not only for reporting, but could also be used by the task implementation to resume operations if its previous run was interrupted.
  • TASK_STEP: the STEP keyword will reset itself to zero every time the PHASE keyword changes. The STEP can be used to count off repetitive steps within a discrete phase.
  • TASK_LAST_START: the LAST_START keyword is a UNIX timestamp, and it will automatically set itself to the current time whenever a task successfully establishes itself. If this value will be queried by a task, perhaps as part of an assessment of whether to start anew or resume from the currently set phase+step, it should be queried before the task establishes itself.
  • TASK_LAST_SUCCESS: the LAST_SUCCESS keyword is a UNIX timestamp, and it will automatically set itself to the current time whenever a task exits with STATUS ‘Exited/Success’.

In addition to the above, there can be arbitrary per-task keywords used to communicate specific information. As of the writing of this document (December 2013), there are a set of three arbitrary string keywords (TASK_VAR_1, TASK_VAR_2, and TASK_VAR_3) that are being phased out in favor of more specific keywords.

Internal keywords

  • TASK_PID: the process ID of the running task. A task establishes itself by setting its PID and RUNHOST keywords. The taskmon helper daemon on each host uses these keywords to identify all tasks on that host. When taskmon asserts that the task is no longer running (by clearing the TASK_PS_STATE value) the apftask dispatcher will reset the PID keyword to -1 and the RUNHOST keyword to the empty string.
  • TASK_RUNHOST: the hostname on which a given task is running. A task establishes itself by setting its PID and RUNHOST keywords. The taskmon helper daemon on each host uses these keywords to identify all tasks on that host. When taskmon verifies that the task is no longer running, taskmon will reset the PID keyword to -1 and the RUNHOST keyword to the empty string.
  • TASK_PS_STATE: the PS_STATE keyword is used by the taskmon helper application to communicate the process state of a running task back to the apftask dispatcher. In particular, when PS_STATE is set to the empty string, the apftask dispatcher interprets this to mean that the process is no longer running. PS_STATE should only ever be set by taskmon.
  • TASK_SIGNAL: if set to a non-None value, the SIGNAL keyword defines the signal that taskmon will send to the running task in the event that INTERRUPT transitions to True.
  • TASK_TRIPWIRE: if set to a non-None value, the TRIPWIRE keyword lists the conditions that will be used to set the INTERRUPT keyword.

    Condition Abort when...
    OPEN_OK ...the checkapf.OPEN_OK keyword is False. This indicates that the dome shutter and/or vents should not be opened.
    MOVE_PERM ...the checkapf.MOVE_PERM keyword is False. This indicates that the telescope and any movable components on the telescope should not be moved, largely for personnel safety reasons.
    INSTR_PERM ...the checkapf.INSTR_PERM keyword is False. This indicates that no components in the the Levy spectrometer should be commanded to move, nor should lamps be asked to turn on. Stopping stages and turning off lamps are both permitted.
    TASK_ABORT ...the task’s CONTROL keyword is set to Abort. This indicates that the task should exit as soon as reasonably possible.
    TASK_PAUSE ...the task’s CONTROL keyword is set to Pause. This indicates that the task should pause as soon as reasonably possible.
  • TASK_INTERRUPT: will be set to True if any of the conditions described by TRIPWIRE are met. If SIGNAL is set to a non-None value, the taskmon helper application will send the requested signal to the running task when INTERRUPT transitions to True.

The taskmon helper application

taskmon runs on each of the APF linux hosts, and monitors all tasks that are running on that host. It does this by monitoring all TASK_PID and TASK_RUNHOST keywords; if the RUNHOST matches the hostname where taskmon is running, it will poll the contents of /proc/<pid>/status, and write out the current status to the TASK_PS_STATE keyword. If taskmon sees that the process is no longer running, it will set PS_STATE to the empty string. This is a critical piece of feedback for the apftask dispatcher, and will trigger a cascade of automatic updates, including clearing the PID and RUNHOST keywords for that task.

taskmon also monitors TASK_CONTROL keywords and a small set of keywords from the checkapf service as part of its handling for TASK_TRIPWIRE keywords. taskmon runs as root, and will signal the task if any of its TRIPWIRE conditions are met.

The interface toolkit

An interface toolkit was created to simplify task implementations’ interactions with the apftask service. Tasks implemented in Python can make direct use of the :mod`APFTask` module; tasks written in other languages can use the apftask command-line tool, which is a script-friendly wrapper to the core functions of the APFTask module.

An example task implementation was written to demonstrate the intended use of the apftask command-line interface; this example can be found here:

cvs/lroot/apf/apftask/interface/example.sin

The command-line interface has the following options available:

Usage: apftask [operation] [options]

Optional flags:

        -h,-?                   Print verbose help (this output)
        --help

        -v                      When retrieving keyword values, print
        --verbose               the keyword values on individual lines
                                with additional formatting for human
                                readability.

        --no-auto               Do not auto-pause when performing a
                                'do' operation and the task's control
                                mode changes from 'Proceed'. Otherwise,
                                `apftask` will pause when the task's
                                control mode is set to 'Pause', and will
                                only resume processing when the control
                                mode changes; with --no-auto set, an
                                exception will instead be raised, and
                                `apftask` will exit with a non-zero status
                                code. 'Abort' always triggers an exception
                                and a non-zero exit.


Important environment variables:

        APFTASK                 Task name. If set by the parent, it is
                                not necessary to specify the task name
                                for any of the operations below that
                                use or require a taskname to be specified.

        APFTASK_task_PID        Established PID for the task, which is
                                embedded in the environment variable name
                                _and_ capitalized. Setting this per-task
                                environment variable upon establishing a
                                task would allow for any sub-processes to
                                invoke `apftask` as if they were the established
                                PID. This can be helpful when re-using
                                shell scripts between tasks, or when working
                                with shell scripts that mysteriously invoke
                                sub-shells when you weren't expecting it.


Available operations:

        tasks                   Print out a list of all available tasks.

        status [taskname]       Display the status of a specific task or
                                tasks; if no task is specified, the status
                                of all available tasks will be reported.

        taskname establish      Establish the invoker of `apftask` as
                                the one true implementation of 'taskname'.
                                It is essential that the invoker of the
                                'establish' operation inspect the return status
                                of `apftask` and abort if it is non-zero.
                                This routine will block until the taskmon
                                helper application confirms the task is
                                running.

        taskname abort          Request that the 'taskname' task stop running.

        taskname do             Perform any shell command while simultaneously
                                watching the task's CONTROL keyword for values
                                other than 'Proceed'. If `apftask` exits
                                for any reason before the command completes
                                execution, the command will be sent a SIGTERM
                                signal.

        taskname pause          Request that the 'taskname' task pause.

        taskname proceed        Request that the 'taskname' task proceed.
                                Issuing a proceed request is typically
                                necessary after successfully pausing a
                                task.

        taskname status         Same as "status [taskname]", will report
                                the current status of 'taskname'.

        taskname step++         Increment the step for the 'taskname' task.

        taskname keyword        Retrieve the current value of a task keyword.
                                Any valid per-task keyword may be specified;
                                for example, if there is a keyword TASK_VAR1,
                                one could invoke `apftask task var1` and it
                                would return the value. This is equivalent
                                to doing `show -terse -s apftask task_var1`.
                                An arbitrary number of keywords may be
                                specified in a single invocation. You cannot
                                mix `show` with `modify` behavior.

        taskname keyword=value  Set a task keyword to a specific value.
                                Any valid per-task keyword may be specified;
                                for example, if there is a keyword TASK_VAR1,
                                the operation would be `var1=some value`.
                                Whitespace around the = sign will be ignored;
                                if there is significant whitespace in 'value',
                                it will need to be quoted in order to get past
                                the user's shell. This is equivalent to doing
                                `modify -s apftask task_var1="some value"`.
                                An arbitrary number of keyword/value pairs may
                                be specified in a single invocation. You cannot
                                mix `show` with `modify` behavior.