Monte Carlo production tools
  • Introduction
  • Monte Carlo production overview
  • Monte Carlo Management (McM): introduction
  • Analyzer's corner
    • Monitoring submitted requests
    • How to search for datasets in DAS and McM
    • How to find the fragment of a request in McM
    • How to find the cmsDriver used for a certain request
    • How to use randomized parameters samples
  • Monte Carlo contact's corner
    • Rules for dataset names
    • Rules for Run3 dataset names
    • Rules for Run3 2024 dataset names
    • How to create a new ticket
    • High priority requests
    • Scripting in McM
    • Request checking script
    • News and current request policy
    • Interactive MC contact exercises
    • Randomized Parameters
    • Info for MC production for Ultra Legacy Campaigns 2016, 2017, 2018
    • Info for MC production for Run3 Campaigns
  • Request manager's corner
    • [DRAFT] MultiValidation in McM
    • Procedure how to create and setup a campaign
    • Fast Simulation Campaigns
    • "Dead" requests and tickets
  • Computing's corner
    • Status of requests in computing
    • Errors in production: explanation
    • Task chain vs step chain
  • cmsDriver argument and meaning
    • runTheMatrix and release validation
  • Monte Carlo Management (McM): detailed guide
    • McM Glossary: requests
    • McM glossary: chained requests
    • McM glossary: campaigns
    • McM glossary: flows
    • McM glossary: tickets
  • Production Monitoring Platform (pMp): detailed guide
  • Data reprocessing (old injection method via script)
  • FAQ
  • Contacts
  • Group Analysis Samples Page: GrASP
    • Tagging on GrASP
Powered by GitBook
On this page
  • Where can I find the slides of the McM tutorials?
  • What is the difference between premix and classical mixing?
  • How can I estimate the memory consumption of a request?
  • When will my sample be ready?
  • Is it convenient to first run a GEN-SIM request standalone and run the DR step afterwards?
  • Issue with 2017 MiniAOD/NanoAOD samples:

Was this helpful?

FAQ

This is a developing page, which contains some of the frequently asked questions.

PreviousData reprocessing (old injection method via script)NextContacts

Last updated 5 years ago

Was this helpful?

Where can I find the slides of the McM tutorials?

The slides can be found at the bottom of this page:

What is the difference between premix and classical mixing?

There are two ways to produce samples with simulation of pile-up: premix and classical mixing.

Classical mixing implies a previous production of a MB sample (with a datatier "GEN-SIM"). It contains the event at generator level and the interaction of the particles with the detector material. For the generation of the sample with PU (which happens in the DIGI-RECO step), a root wmLHEGS/GS request is digitized (namely the interaction of the particles with the detector material are used to simulate the signals in the detector cells) together with the PU sample. The DR step needs the PU input dataset and the pile-up scenario, namely according to which distribution the pile-up should be simulated (and added to the root request). Since the DIGI step is quite consuming and happens for both root request and MB sample, classical mixing is generally more time and CPU consuming.

Premix is different because the PU sample is digitized separately (at the time of the production of the premix library). A MB sample (datatier GEN-SIM) is produced in the same way as before, but it is here used for the production of a SingleNeutrino sample (basically nothing in the final state) which is interfaced with the simulation of the PU according to a certain scenario and using the MB sample. The output of the SingleNeutrino sample is a GEN-SIM-DIGI sample (already digitized). For root requests using premix PU simulation, the DIGI step is run only on them, while the PU simulation is added after this step. SInce the DIGI step is run only once, premix requests are much faster and less CPU consuming than classical mixing requests.

How can I estimate the memory consumption of a request?

By executing it locally, one produces a .xml file. By executing the following command, one gets access to the peak memory consumption and the average one for the produced events:

grep "Peak" myfile.xml

This gives an estimate of how much memory one needs to setup for this request.

When will my sample be ready?

This is a very difficult question to answer, since it implies the knowledge of the general overview of the system. If there are many high priority requests in the system, a standard request in block2-3 will take more time, than if there are none. Normally, if the requested sample is not targeting an immediate conference, the ETA might be between 4-6 months (depending on the number of events), but already a small increase in priority might complete it in 1-2 months.

Is it convenient to first run a GEN-SIM request standalone and run the DR step afterwards?

The general answer is NO. This is due to the fact that the already produced GS sample is generally immediately transferred to tape. There, access is not possible and if one wants to run DR step on top of it, the sample needs to be re-transferred to disk again. This operation takes time and it's more convenient to restart production from scratch, in general.

Issue with 2017 MiniAOD/NanoAOD samples:

Some miniAODV2 samples in the Fall17 campaign have inconsistent pileup (PU) information in the dataset names. These samples [1] are produced from the 'original-buggy-PU-AODSIM' samples, which are in fact good to be used for the physics analyses. The more information about the PU issue and the solution is given in T. Boccali's talk [2] in the PPD General meeting. However, it is generally recommended that the analysts must always reweigh the MC samples to match the data pileup since you may want to know which pileup library has been used for a certain dataset. Here is a brief overview of this issue.

There are more than 1 AODSIM sample existing for some Fall17 GEN-SIMs.

It happened due to a problem in the old premix library [2]. So the corresponding AODSIM samples have a buggy-PU profile, which can be corrected by PU-reweighing at the analysis level.

* Then a new premix library was created with the 'corrected-PU' information. The word 'PU2017' is added in the dataset names of AODSIM and miniAODv2 samples that have been submitted/produced centrally with this 'corrected-PU' library.

* So 'PU2017' in the dataset names can be used to differentiate the 'original-buggy-PU-AODSIM' and the 'corrected-PU-AODSIM' samples.

Then, another central migration was done to get MiniAODv2+nanoAOD samples from all the existing AODSIMs (original-buggy-PU and new-corrected-PU) but mistakenly, the same 'setup' was used to create all the MiniAODv2s (for original-buggy-PU and new-corrected-PU).

* So the word 'PU2017' in the dataset names is not enough to differentiate the Fall17 miniAODv2 samples of the 'original-buggy-PU-AODSIM' and the 'corrected-PU-AODSIM' samples. Now 1054 MiniAODv2 requests [2] have incorrectly 'PU2017' in the dataset name but these have been produced with 'original-buggy-PU-AODSIM'.

* Solution: You can always see the 'Parent' dataset of these samples in DAS to confirm the PU info. If the AODSIM for your miniAODv2 has 'PU2017' in the dataset name then it's the 'corrected-PU' otherwise it's the 'original-buggy-PU'.

[1] /afs/cern.ch/user/g/gurpreet/public/Fall17_oldPU.txt

[2] T. Boccali (February 1, 2018) https://indico.cern.ch/event/695872/contributions/2877123/attachments/1593469/2522749/pileup_ppd_feb_2018.pdf

One can execute the get_test script for that particular request for a number of events close to 1000. (You can find the get_test script at each request's McM link, by clicking the "tick" button, an example is here: ).

https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_test/SUS-RunIIFall18wmLHEGS-00113
https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmVMcMProdManagerActionstwiki.cern.ch