For Your Consideration... Event Plugins

Version: Deadline 6.0 onwards

What are Event Plugins?

Event plugins in Deadline are amazing! Ok, I've stuck my neck out here; I guess I'm going to have to prove it now, right?! Event plugins can be created to execute specific tasks in response to specific events in Deadline (like when a job is submitted or when it finishes). For example, event plugins can be used to communicate with in-house pipeline tools to update the state of shots or tasks, or they can be used to submit a post-processing job when another job finishes. All of Deadline's event plugins are written in Python, which means that it's easy to create your own plugins or customize the existing ones. Don't forget, as well as using Deadline's Scripting API you can also take advantage of using the .NET framework libraries as well as the standard cPython modules. The best of both worlds, eh?

An event plugin can respond to one or more of the following DeadlineEventListener events, which are taking place all the time internally within the Deadline eco-system. The available events are listed below.

Job

Slave

Power Management

Miscellaneous

For the remainder of this blog, we will assume you've read the Event Plugins section in the Deadline docs and are already familiar with the basic Event plugin code framework.

Guns for Show, Knives for a Pro

Before we start, let's share some pro-tips to make life easier for a prospective Deadline event plugin developer.

  • Our plugin docs have an Event Callback Details section which explains exactly when an event takes place in Deadline and importantly, which Deadline application is responsible for executing each event.

  • Make sure your event plugin is enabled accordingly in the State UI field as seen in "Configure Events..." under "Monitor" -> "Tools". You can also control the order that event plugins are executed if this is important by clicking on the up or down arrows at the bottom of the UI. Events can also be selectively enabled via the Opt-In option under State, so this type of event won't execute unless the job has chosen to Opt-In to this potential event, whereas a "Global Enabled" event will execute for all jobs. Depending on what you're trying to achieve this may be a better approach for certain jobs.

  • Your event *.param file must always contain these lines, so don't forget!

    [State]
    Type=Enum
    Items=Global Enabled;Opt-In;Disabled
    Label=State
    Default=Disabled
    
  • In the Monitor, you can set the Suppress Events property under the Advanced tab in the Job Properties dialog. If you have a custom submission tool or script, you can specify the following in the job info file:

    SuppressEvents=True
    
  • The Python scripts for event plugins will be executed in a non-interactive way, it is important that your scripts do not contain any blocking operations like infinite loops, or interfaces that require user input.

  • Always place your event plugin in your custom directory on your repository: <your_repo>/custom/events. A same-named event plugin under the <your_repo>/custom/events/ directory will always take precedence over the same-named event plugin under the default shipping <your_repo>/events/ directory.

  • Give Deadline a chance to re-sync any event plugin file changes before re-testing. You can speed this up by clicking on "Monitor" -> "Tools" -> "Synchronize Scripts and Plugins". Watch the "Console" panel in Monitor for feedback.

  • To test "job" centric event callbacks, ensure you have a job in the queue and then manually right-click and change its state to force a particular event to happen such as "Mark Job As Complete" to force the OnJobFinishedCallback to execute. Make sure you have the "Console" panel open to review the StdOut/StdErr.

  • Use self.LogInfo("string"), print "string" or self.LogWarning("string") to provide you with feedback during event plugin execution. By adding print statements, this will ensure a log report contains some useful debug info to go on. If in doubt, log it!

  • Event plugins generate Job or Slave log reports. Review the applicable "Reports" panel to gain valuable feedback. When an event is executed the log will show where the script is being loaded from.

Initial Setup

Before we start working on our event plugin, we first want to be able to see any log or error messages that our script might produce. To do this, we create a new instance of the Console panel by selecting "View" -> "New Panel" -> "Console". Make sure you also have the Job/Slave log reports panel visible as well. Remember, these panels can sometimes be very revealing of what your underlying problem is with your event plugin during testing.

Examples

Two of the most popular event callbacks are OnJobSubmittedCallback and the OnJobFinishedCallback. Through this blog, we will go through the creation process of an example event plugin and corresponding files for each of these callbacks. We will use real-world VFX studio problems and show how they were solved using an event plugin inside of Deadline by a TD/developer. Our User Manual provides an excellent basic example of how to create an event plugin, so we won't spend timing repeating that here, but rather concentrate on some real-world problems.

AutoChunker

A Maya based studio wants to optimize the size of each task (one or more frames per task) in a Maya animation job on the Deadline farm. Why? Well, by attempting to optimize the number of tasks (where 1 task could be 1 or more frames to render), the job should process in an optimal fashion across the farm, assuming all other factors remain identical and equal in terms of the job scheduling logic. So, what do we need to do to make this happen? Let's take a walk through the code to see the approach taken here, which is also available via our GitHub Event Plugins example. Let's get the basics of the event plugin constructed first as per below.

#############################################################################################
#Imports
#############################################################################################
# Usual .NET imports that will be used later in the py script
from System.Diagnostics import *
from System.IO import *

# Standard import of Deadline namespaces so we can use Events and the Scripting API
from Deadline.Events import *
from Deadline.Scripting import *

#############################################################################################
#This is the function called by Deadline to get an instance of AutoChunker.
#############################################################################################
# Function to get an instance of event listener
def GetDeadlineEventListener():
    return AutoChunker()

# Cleanup function to ensure no memory leak
def CleanupDeadlineEventListener( eventListener ):
    eventListener.Cleanup()

#############################################################################################
#AutoChunker generic event listener class.
#############################################################################################
# Our defined class
class AutoChunker( DeadlineEventListener ):

    # Setup which callbacks we will be using here
    def __init__( self ):
        self.OnJobSubmittedCallback += self.OnJobSubmitted
    
    # Cleanup function to ensure no memory leak
    def Cleanup( self ):
        del self.OnJobSubmittedCallback
    
    # This is called when a job is submitted.
    def OnJobSubmitted( self, job ):
        
        # Only execute for Maya jobs.
        if not job.JobPlugin in ["MayaBatch", "MayaCmd"]:
            return

To expose configurable UI items to the "Configure Events..." interface via Monitor, we will define some UI elements in the AutoChunker.param file. Use the syntax as defined in the user manual to configure the UI controls:

[State]
Type=Enum
Items=Global Enabled;Opt-In;Disabled
Category=Options
CategoryOrder=0
CategoryIndex=0
Label=State
Default=Global Enabled
Description=How this event plug-in should respond to events. If Global, all jobs and slaves will trigger the events for this plugin. If Opt-In, jobs and slaves can choose to trigger the events for this plugin. If Disabled, no events are triggered for this plugin.

[Verbose]
Type=Boolean
Label=Verbose
Default=False
Category=Logging
CategoryOrder=1
CategoryIndex=0
Description=If enabled, verbose logging is enabled.

[UsePool]
Type=Boolean
Label=Use Pool
Default=True
Category=Job Properties
CategoryOrder=2
CategoryIndex=0
Description=Take into account the pool the job is being submitted to.

[UseGroup]
Type=Boolean
Label=Use Group
Default=True
Category=Job Properties
CategoryOrder=2
CategoryIndex=1
Description=Take into account the group the job is being submitted to.

[UseLimits]
Type=Boolean
Label=Use Limits
Default=True
Category=Job Properties
CategoryOrder=2
CategoryIndex=2
Description=Take into account ALL the 'limits' the job is being submitted against.

[MinFrameCount]
Type=integer
Label=Min Frame Count
Default=10
Category=Settings
CategoryOrder=3
CategoryIndex=0
Description=Min frame count of the job for this event plugin to kick into effect.

[MultiplyFactor]
Type=integer
Label=Multiply Factor
Default=2
Category=Settings
CategoryOrder=3
CategoryIndex=1
Description=A simpler multiplier on the pool slot count. Experimental. As most studios use 'pools' for job scheduling.

[MaxChunkSize]
Type=integer
Label=Max Chunk Size
Default=50
Category=Settings
CategoryOrder=3
CategoryIndex=2
Description=A max cap limit on the chunk size to limit crazy users doing crazy things.

The above param file configuration provides us with this UI in Monitor for authorized users (you must be Super-User to access the "Configure Events..." UI) to interact with.

Diving back into the AutoChunker.py event plugin, we need to create various functions to collect all the applicable Slaves which meet the criteria as submitted by the user for this particular job. Essentially, we need to build up a clear picture of the current Deadline farm configuration. So, we will collect the pools, groups, limits, whitelisted and blacklisted Slaves that might be applicable to process this particular job. Here are those functions:

def GetLimitBasedSlaves( self, limitGroups ):
    '''
    Given any limits for the job, return list of slaves which will actually be
    applicable for the job at this exact moment during job submission.
    '''
    slaves=[]
    whitelistedSlaves=[]
    blacklistedSlaves=[]
    
    for limit in limitGroups:
        
        limitGroup = RepositoryUtils.GetLimitGroup( limit, True )
        listedSlaves = limitGroup.LimitGroupListedSlaves
        
        for slave in listedSlaves:
            if limitGroup.LimitGroupWhitelistFlag:
                whitelistedSlaves.append(slave)
            else:
                blacklistedSlaves.append(slave)
    
    slaves = list(set(whitelistedSlaves)-set(blacklistedSlaves))
    
    if self.Verbose:
        print "GetLimitBasedSlaves: %s" % slaves
    
    return slaves

def GetSlaveNamesInPool( self, poolName ):
    '''
    Given the pool for the job, return list of slaves which are enabled and
    members of this pool, correct at moment of job submission.
    '''
    slaves=[]
    slaveSettings = RepositoryUtils.GetSlaveSettingsList( True )
    for slave in slaveSettings:
        if slave.SlaveEnabled and poolName in slave.SlavePools:
            slaves.append(slave.SlaveName)
    
    if self.Verbose:
    print "GetSlaveNamesInPool: %s" % slaves
    
    return slaves

def GetSlaveNamesInGroup( self, groupName ):
    '''
    Given the group for the job, return list of slaves which are enabled and
    members of this group, correct at moment of job submission.
    '''
    slaves=[]
    slaveSettings = RepositoryUtils.GetSlaveSettingsList( True )
    for slave in slaveSettings:
        if slave.SlaveEnabled and groupName in slave.SlaveGroups:
            slaves.append(slave.SlaveName)
    
    if self.Verbose:
        print "GetSlaveNamesInGroup: %s" % slaves
    
    return slaves

Now, armed with all this information, we can try to calculate the best chunk size for this job currently being submitted. Again, let's walk through this code via additional code comments:

print "AutoChunker running as this is a Maya job"

# Always useful to add a verbose flag for later debugging in your event plugin
self.Verbose = self.GetBooleanConfigEntryWithDefault( "Verbose", False )
print "Verbose Logging: %s" % self.Verbose

# This is how to retrieve the value of a UI setting - "MinFrameCount"
# It is best to return a default value if we fail to query the given KEY name, so default to 10
minFrameCount = self.GetIntegerConfigEntryWithDefault( "MinFrameCount", 10 )
print "MinFrameCount: %s" % minFrameCount

# Obtain the job's current frame count
jobFramesList = job.JobFramesList
frameCount = len(jobFramesList)

if self.Verbose:
    print "Frame Count: %s" % frameCount

# Exit this event plugin if the job's frame count is less than the declared min frame count
if minFrameCount > frameCount:
    print "AutoChunker exiting as job frame count is less than min frame count"
    return

# Let's print out to the log report the BEFORE value so we can compare later
print "BEFORE [ChunkSize]: %s" % job.JobFramesPerTask

# Retrieve the job's limits
limitGroups = list(job.JobLimitGroups)

if self.Verbose:
    print "jobPool: %s" % job.JobPool
    print "jobGroup: %s" % job.JobGroup
    print "jobLimits: %s" % limitGroups

# Should we use each of these values in our chunk size calculation?
usePool = self.GetBooleanConfigEntryWithDefault( "UsePool", True )
useGroup = self.GetBooleanConfigEntryWithDefault( "UseGroup", True )
useLimits = self.GetBooleanConfigEntryWithDefault( "UseLimits", True )

slavePool=None
slaveGroup=None
slaveLimits=None

# If True, then take 'pools' into account in our equation
if usePool:
    slavePool = len(self.GetSlaveNamesInPool( job.JobPool ))

# If True, then take 'groups' into account in our equation
if useGroup:
    slaveGroup = len(self.GetSlaveNamesInGroup( job.JobGroup ))

# If True, then take 'limits' and 'white/blacklisting' into account in our equation
if useLimits:
    slaveLimits = len(self.GetLimitBasedSlaves( limitGroups ))

if self.Verbose:
    print "slavePool: %s" % slavePool
    print "slaveGroup: %s" % slaveGroup
    print "slaveLimits: %s" % slaveLimits

# Exposed an extra 'multiplier' so studios can dial up/down the effect on pools
multiplyFactor = self.GetIntegerConfigEntryWithDefault( "MultiplyFactor", 2 )
print "MultiplyFactor: %s" % multiplyFactor

# Calculate number of potential slots
if slaveLimits > 0:
    l = [(multiplyFactor*slavePool),slaveGroup,slaveLimits]
else:
    l = [(multiplyFactor*slavePool),slaveGroup]

# For each entry in our list, return the lowest number
slots = min(i for i in l if i is not None)

# If the calculation above returns 0, then default back to the normal chunk size of 1
if slots == 0:
    print "There are no Slaves which satisfy the job requirements. Submitting anyways with default slots of 1"
    slots = 1

if self.Verbose:
    print "slots: %s" % slots

# Fail-safe here and never apply an excessively high chunk size, which users can dial down/up if required
maxChunkSize = self.GetIntegerConfigEntryWithDefault( "MaxChunkSize", 50 )
print "MaxChunkSize: %s" % maxChunkSize

# Again, always return the lowest number here
chunkSize = min(slots, maxChunkSize)

# Finally! Apply the new chunk size to the job being submitted
RepositoryUtils.SetJobFrameRange( job, job.JobFrames, chunkSize )

# Print the result, so we can compare with the previous 'BEFORE' print statement in the log report in Monitor
print "AFTER [ChunkSize]: %s" % job.JobFramesPerTask

Alrighty, there you have it. An AutoChunker event plugin that attempts to calculate an optimal ChunkSize for all Maya jobs submitted to Deadline, based on slave count in pool, slave count in group, limits applied to the job, a user controlled multiplying factor, and finally the actual frame range. Various control knobs are exposed in the event plugin to let users tinkle with what best works for the majority of jobs in their studio. It currently is limited to Maya jobs, but could easily be tweaked to work against all/some other job plugins. What could you do to improve this event plugin?

VRayPrePassVerify

In this example, we will consider a potential V-Ray issue when submitting 2 jobs to Deadline. The first job is the Animation Pre-Pass calculating job whilst the second job is the main render job which has been submitted dependent on the first job completing correctly. However, there is the odd occasion where one of the animation frame *.vrmap files may fail to save back to your network file server, perhaps due to an infrastructure issue. This can cause a serious level of wastage, as the second render job will be released by Deadline and the renders will fail due to missing *.vrmap irradiance files for some of the frames. A nightmare if this happens during the night!

You can see in the images below, a perfectly normal looking V-Ray pre-pass animation job has completed in Deadline. Uh oh...looking at the second image, we can see that the number of *.vrmap files doesn't match the number of frames completed in Deadline.

So, what can a pipeline TD do here? How about an onJobFinished event plugin which checks that each frame calculated by V-Ray has a corresponding *.vrmap file on your network file server and if one is missing, then requeue this particular task(s) automatically for you. Ok, here's the code, let's again walk through it together.

First, we create the usual event plugin template code; then we need to add a serious of checks or tests to ensure we only execute against an applicable type of 3dsMax job in our queue. All other types of 3dsMax job should be skipped.

from System.Diagnostics import *
from System.IO import *

from Deadline.Events import *
from Deadline.Scripting import *

###############################################################
## This is the function called by Deadline to get an instance
## of the VRay Pre Pass Verify event listener.
###############################################################
def GetDeadlineEventListener():
    return VRayPrePassEventListener()

def CleanupDeadlineEventListener( eventListener ):
    eventListener.Cleanup()

###############################################################
## The VRay event listener class.
###############################################################
class VRayPrePassEventListener( DeadlineEventListener ):
    '''
    Verify V-Ray animation pre-pass *.vrmap files have been successfully saved back to the network file server.
    If the *.vrmap file is missing, then requeue the task(s) automatically.
    '''
    def __init__( self ):
        self.OnJobFinishedCallback += self.OnJobFinished
    
    def Cleanup( self ):
        del self.OnJobFinishedCallback
    
    ## This is called when the job finishes rendering.
    def OnJobFinished( self, job ):
        
        # Check job is a 3dsMax job
        if job.JobPlugin != "3dsmax":
            return
        
        # Check job is a V-Ray job
        vrayFilterExists = self.GetPluginInfoEntryWithDefault( "vray_filter_on", "" )
        if vrayFilterExists == "":
            return
        
        # Check GI is enabled
        vray_gi_on = self.GetPluginInfoEntryWithDefault( "vray_GI_on", "" )
        if vray_gi_on != "true":
            return
        
        # Check job is an animation pre-pass job
        vray_adv_irradmap_mode = self.GetPluginInfoEntryWithDefault( "vray_adv_irradmap_mode", "" )
        if vray_adv_irradmap_mode != "6":
            return

Ok, and now the main event (pun intended), we calculate what the file path is in the sequence of *.vrmap files, replace it with the current frame number for any given task whilst we loop through all tasks and we do a simple file test to see if that actual *.vrmap exists and if not, then we add the taskId number to a Python list. We later remove any duplicates from this list (as a Deadline task could contain more than 1 frame per chunk) and finally, we execute a Scripting API function to re-queue all these identified tasks. As this event happens the moment the pre-pass job completes, it means the troublesome tasks get requeued BEFORE the main render job ever gets released. Considerable wasted render farm time is avoided and you're a hero!

self.LogInfo( "Event Plugin: V-Ray PrePass Verify Started" )

# Calculate the vrmap file path from the job properties
vray_adv_irradmap_autoSaveFileName = self.GetPluginInfoEntryWithDefault( "vray_adv_irradmap_autoSaveFileName", "" )
filePath = Path.GetDirectoryName( vray_adv_irradmap_autoSaveFileName )
fileName = Path.GetFileNameWithoutExtension( vray_adv_irradmap_autoSaveFileName )
vrmapFile = str( fileName ) + "####.vrmap"
vrmapFile = Path.Combine( filePath, vrmapFile )
padding = "####"
taskIdsToRequeue = []

# Grab an instance of the TaskCollection of the job
tasks = RepositoryUtils.GetJobTasks( job, True )

# Loop through the tasks, swapping padding for current frame number, check if the vrmap exists
# and if not, add the taskId to a list
for task in tasks:
    for frame in task.TaskFrameList:
        frameNumber = StringUtils.ToZeroPaddedString( frame, 4 )
        currFile = vrmapFile.replace( padding, frameNumber )
        
        if not File.Exists( currFile ):
            
            self.LogWarning( "Missing: %s" % currFile )
            
            if not task in taskIdsToRequeue:
                taskIdsToRequeue.append( task.TaskId )

# Check we have at least 1 taskId that needs requeuing
if len(taskIdsToRequeue) > 0:
    
    # Using Python's 'set' command, we can remove duplicates and put it back into a list
    taskIdsToRequeue = list(set(taskIdsToRequeue))
    
    # Loop through the tasks and if we find a match between a task in the job and the previously
    # compiled list of taskId's that need requeuing, then execute the RepositoryUtils.RequeueTasks() function
    for task in tasks.Tasks:
        for i in taskIdsToRequeue:
            if i == task.TaskId:
                self.LogInfo( "Requeuing Task: %s" % task.TaskId )
                RepositoryUtils.RequeueTasks( job, [task,] )

# Always wise to inform when an event plugin has actually finished
self.LogInfo( "Event Plugin: V-Ray PrePass Verify Finished" )

Here's the result of the event plugin running and detecting 4 missing *.vrmap files and automatically requeuing them.

Where can you go from here? Well, unlike Spinal Tap's suggestion for an amplifier that can go to eleven; what can you do to improve the above proposed solution? Here's some ideas, which I will leave you to consider:

  • Email notification, informing the job's user or a collection of users that a requeue has been required here? Luckily, DeadlineCommand has a handy function for you here:

    deadlinecommand -sendemail -to jsmith@mycompany.com -cc cjones@mycompany.com -subject "the subject" -message "C:/MyMessage.html"
    
  • Should we check that the *.vrmap file exists OR should we also check for a min file size?

  • What if the requeued frames fail a second or third time? How will you handle that situation?

  • How could this event plugin be re-used to check other app plugins?

Aladdin's Cave

Make sure to review the official Thinkbox Event Plugins documentation and Scripting API reference manual. No need to reinvent the wheel - we also have many example event plugins on our public GitHub site, which could serve as a perfect starting point for your own coding project. Here's a quick overview of some of the current GitHub event plugins we have online which have all been updated to work in Deadline 8:

  • 3dsMaxVersionInstalled: Regularly inject the 3dsmax.exe version installed on a Slave into the ExtraInfo0 column.
  • AutoChunker: Auto calculate chunk size for submitted (Maya) jobs.
  • CustomEnvironmentCopy: Inject certain environment variables whilst submitting into your job.
  • EventScript: Abstracted event script for all event callbacks to a studio's pipeline script system.
  • ForcePoolChoice: Force certain users to submit to certain pools in Deadline.
  • HouseCleaning: Bootstrap example of how to execute a script whenever Deadline carries out its house cleaning process.
  • JobCleanup: Slave started event callback script which deletes jobs older than a configurable number of days. Be careful!
  • JobStats: An event plugin script which executes on job completion for any job in your queue, calculating and returning the job statistics object, allowing you to retrieve information.
  • MachineLimitClamp: An event plugin that can prevent everyone but blessed users from a specific user group from submitting jobs with too high a machine limit. Big brother is watching you.
  • OverrideJobCleanup: This example event plugin script hooks into the "OnJobSubmitted" callback and allows a studio to globally configure different job cleanup settings per application plugin type.
  • OverrideJobName: This example event plugin script hooks into the "OnJobSubmitted" callback and allows a studio to globally manipulate the job name of "3dsmax" jobs after they have been submitted by a user. In our example, we are going to inject the long Deadline JobId 'string' into the JobName as well as an optional PREFIX & SUFFIX, if configured.
  • PriorityClamp: What everyone's been waiting for, a way to limit users from hogging the farm. The idea is that you can prevent everyone but blessed users from a specific user group from submitting jobs higher than a certain priority.
  • Sample: Behold! A sample event plugin! This is for those like me who are too lazy to read documentation!
  • SetJobInterruptible: Ensure ALL jobs that have been submitted to a particular pool are forced to have the "Job Is Interruptible" setting enabled.
  • SetJobLimit: Ensure ALL jobs have a certain Deadline "limit" applied to them, so users don't go over budget on their floating license server and get in a situation where render nodes are pulling more licenses than owned.
  • SetJobTimeout: This event plugin is useful for when you want to make sure jobs on the farm cannot run for too long, or want to make sure a task is considered failed if it didn't render long enough. These can be set on individual jobs, but this event will ensure those settings are applied to any newly submitted jobs. Useful for shared farms such as schools.
  • SlaveAutoconf: Ensure new Slaves automatically have configured a set of pools each time they start up.
  • SlaveCleanup: A user asked how can they cleanup all the files & directories in one or more locations on a local or indeed, network location on a semi-regular basis.
  • SlaveExtraInfo: A user asked how can they query on a semi-regular basis each of their Slaves for one or more system environment variables entered on the individual machine and somehow send that information back so it can be visible via Deadline Monitor?
  • SoftwareAudit: Show what copy of a particular piece of software is installed on your Windows nodes in the 'Extra Info 0' column in Deadline Monitor.
  • VRayPrePassVerify: Check 'V-Ray for 3ds Max' pre-pass animation job has created *.vrmap files and if not, requeue certain tasks.
  • Zabbix: Automatically collect and push Deadline farm statistics to a Zabbix system.

Future Development

Studio pipeline evolution, just like software development is an iterative process. I'm sure you can think of more ways to improve these example event plugins. I hope by reading this blog entry it encourages you to jump in and write some event plugins of your own! If you get a bit stuck, don't forget to RTM (Read The Manual), but we can also help via the usual Thinkbox Forums and Thinkbox Support channels! Please make sure you provide snippets of your code, so we can best answer you effectively. Happy scripting!