Monte Carlo production tools
  • Introduction
  • Monte Carlo production overview
  • Monte Carlo Management (McM): introduction
  • Analyzer's corner
    • Monitoring submitted requests
    • How to search for datasets in DAS and McM
    • How to find the fragment of a request in McM
    • How to find the cmsDriver used for a certain request
    • How to use randomized parameters samples
  • Monte Carlo contact's corner
    • Rules for dataset names
    • Rules for Run3 dataset names
    • Rules for Run3 2024 dataset names
    • How to create a new ticket
    • High priority requests
    • Scripting in McM
    • Request checking script
    • News and current request policy
    • Interactive MC contact exercises
    • Randomized Parameters
    • Info for MC production for Ultra Legacy Campaigns 2016, 2017, 2018
    • Info for MC production for Run3 Campaigns
  • Request manager's corner
    • [DRAFT] MultiValidation in McM
    • Procedure how to create and setup a campaign
    • Fast Simulation Campaigns
    • "Dead" requests and tickets
  • Computing's corner
    • Status of requests in computing
    • Errors in production: explanation
    • Task chain vs step chain
  • cmsDriver argument and meaning
    • runTheMatrix and release validation
  • Monte Carlo Management (McM): detailed guide
    • McM Glossary: requests
    • McM glossary: chained requests
    • McM glossary: campaigns
    • McM glossary: flows
    • McM glossary: tickets
  • Production Monitoring Platform (pMp): detailed guide
  • Data reprocessing (old injection method via script)
  • FAQ
  • Contacts
  • Group Analysis Samples Page: GrASP
    • Tagging on GrASP
Powered by GitBook
On this page
  • Status of requests in RequestManager page
  • IMPORTANT PARAMETERS FOR PRODUCTION
  • Status of requests in Unified
  • MOST COMMON REASONS FOR WORKFLOWS NOT (YET) RUNNING / GETTING ANNOUNCED
  • Useful links:

Was this helpful?

  1. Computing's corner

Status of requests in computing

In this page, we collect useful information for Monte Carlo contacts about how to act in McM

PreviousComputing's cornerNextErrors in production: explanation

Last updated 6 years ago

Was this helpful?

acquired: request has been split by the global WorkQueue into work element, but no work element has been injected into the local WorkQueue.

running-open: at least 1 work element injected into local WorkQueue, jobs are created running.

running-closed: all work elements are in running state.

Computing offers many tools to monitor the status of a request in central production (please see ). In order to understand at which stage of production requests are, two main things need to be monitored:

  • Status in RequestManager page

  • Status in Unified

For additional information, one can consult the Log of a request or the ErrorReport. All these features can be accessed by the camera button in McM (see again ), where the following page will open up (example):

Status of requests in RequestManager page

Below is the explanation of the status of requests in computing:

assignment-approved: will be moved to assigned in the next Unified cycle if the campaign is enabled.

force-completed: set by users/operators to kill all remaining work and move to completed status.

completed: all work elements are done.

rejected: invalid at assignment or produced output is not satisfactory.

failed: has failure in one of the work elements.

closed-out: output is ready to be announced.

aborted: set by users/operators to kill all current jobs, run only auxiliary tasks and move to aborted-completed status.

staging: waiting for input data to be transferred to disk.

staged: workflow ready to be assigned.away: workflow set to assigned in ReqMgr.

close: synchronising with closed-out, all output satisfactory.

assistance: all jobs are finished with produced output less than threshold due to job failures. “assistance” status is usually accompanied by one or more keywords (see next page).

IMPORTANT PARAMETERS FOR PRODUCTION

Unified and ReqMgr split events per job based on TimePerEvent and SizePerEvent with the following constraints:

announced: request has been announced to the requestors.

Status of requests in Unified

Below is the explanation of the status of requests in Unified:

considered: workflow obtained from request manager, will move to the next state if campaign is enabled.

If TimePerEvent and SizePerEvent are not close to the true value, jobs will exceed the limit and fail.

MOST COMMON REASONS FOR WORKFLOWS NOT (YET) RUNNING / GETTING ANNOUNCED

  • Maximize number of events per job to maintain CPU efficiency.

  • Each job finishes within 8 hours.

  • Total output size for each job is less than (20 GB * Ncores)

Same for Memory: jobs use more memory than requested will fail. Wrong Memory requirement will make job run at improper sites and fail.

  • Waiting for input dataset to be transferred on disk.

  • Low priority.

  • Workflows have errors and need to be handled by operators.

  • Workflows have long running tails.

Useful links:

(live status of a workflow)

Link to computing tutorial
Unified assistance page
JIRA page
wmStats
here
here
Example of monitoring page. One can see the Log button, the ErrorReport, and the status in ReqMgr page and Unified.