# Rules for Run3 2024 dataset names

The following dataset name rules are enforced for all MC samples innjected in the `RunIII2024Summer24` MC campaign.\
The same rules apply also for 2022 and 2023 Run 3 re-reco campaigns (`RunIII2022Summer24`, `RunIII2023Summer24`), as well as all future MC campaigns.

## General Conventions

### **Naming : PROCESS\_\[BINNING]\_\[FILTER]\_\[PARAMETERS]\_TUNE\_BEAME\_ME-PS**

* **PROCESS** : DY, Z, W, TT, T, WW, WZ, ZZ, QCD, ...
* **BINNING** : Bin-MLL-XtoX, Bin-HT-XtoX, Bin-XJ, ...
* **FILTER** : Fil-EMEnriched, Fil-MuEnriched, Fil-BEnriched, ...
* **PARAMETERS** : Par-MH-125, ...
* **TUNE** : TuneCPX
* **BEAME** : 13p6TeV
* **ME-PS** : madgraphMLM-pythia8, powheg-pythia8, amcatnloFXFX-pythia8 (generator names in lower cases and merging schemes in upper cases)

The dataset name structure must respect the following rules:

* The blocks PROCESS, TUNE, BEAME and ME-PS are mandatory. Every dataset name must contain these blocks.
* The blocks BINNING, FILTER, and PARAMETERS are optional. The name can contain none, one, two, or all of them, depending on the physics process.
* The `_` (underscore) must be used **ONLY** to separate the main blocks of the dataset name.
* The `-` (dash) can be used to separate strings within a given block.
* The `Bin-` header must be pesent at the beginning of the BINNING block.
* The `Fil-` header must be pesent at the beginning of the FILTER block.
* The `Par-` header must be pesent at the beginning of the PARAMETERS block.

### **Particle** Acronyms

| Particle     | Acronyms | Additional information (Only when needed) |
| ------------ | -------- | ----------------------------------------- |
| lepton       | L        | Lplus, Lminus                             |
| electron     | E        | Eplus, Eminus                             |
| muon         | Mu       |                                           |
| tau          | Tau      |                                           |
| neutrino     | Nu       |                                           |
| quark        | Q        |                                           |
| quark+gluon  | J        |                                           |
| top quark    | T        | Tbar                                      |
| bottom quark | B        | Bbar                                      |
| higgs        | H        |                                           |
| photon       | G        |                                           |
| gluon        | Glu      |                                           |
| W boson      | W        |                                           |
| Z boson      | Z        |                                           |

### **PROCESS**

Specify the process we are producing.

To unify this part we suggest to use the following conventions:

* all 'particles' start with capital letters, followed by minor letters, e.g. `W, Z, Mu, Tau, E, Nu, Wplus, H, Jets, Tbar, B, Bbar`
* if a specific decay is simulated, this is specified using the keyword `to`, e.g. `WtoENu, HtoWWto2L2Nu`
* initial state particles are only specified if needed to distinguish between other processes, e.g. `GluGluToWW` with respect to `WW`
* charge fo a particle is only specified if relevant, i.e. use `Wplus` if only W+ is in the sample, but **don't use** `WplusWminus` for W-pair production
* the same for anti-particles: use `Tbar` if only anti-top is in the sample, but **don't use** `TTbar`, but rather `TT`
* if one one part of the process name there is a) more then one particle of the same kind **and** b) more then two particles in total, use `2E2Nu` rather then `EENuNu`

**Always using number if more than one same particle, and arrange in alphabetical order**

* Using **DYto2L** instead of DYtoLL
* Using **WtoLNu and WtoQQ** to distinguish decay of W boson
* Using **TT** instead of TTbar
* Using **TtoLNu and TbartoLNu** to distinguish top and anti-top
* Using **WWto2L2Nu** instead of WWtoLLNuNu (or WWto2Nu2L)
* Using **WWtoLNu2Q** instead of WWtoLNuQQ

**Merging information if it is not confusing**

* Using **WZto3LNu** instead of WZtoLNu2L
* Using **WWto4Q** instead of WWto2Q2Q

**IMPORTANT**: Use the `Jet` keyword with caution. We're operating a hadron collider, there are jets all over the place. We propose to use it only if there are matrix elements in the generation that explicitly include the higher QCD multiplicity diagrams, i.e. MadGraph, Alpgen and Sherpa, and some matching procedure had/has to be applied. If there are cuts on pthat, this should be indicated using the dedicated keywords. A an example `ZJet` in Pythia6, which is nothing but Z production with a cut on pthat of the hard interaction should become `Z`, or `ZmumuJet` simply `ZToMuMu`. In case of decay products, e.g. RS Gravitons decaying into quarks and gluons, use the keyword `J`, e.g. `RSGravToJJ`.

### **BINNING**

The format is: **Bin-VAR1-X1toY1-VAR2-X2toY2**

When producing binned samples, e.g., DY process with maximum 4jet in LHE level:

* The inclusive process name is: **DYto2L-4Jets**
* The corresponding jet-binned sample, e.g., 1 jet at LHE-level is: **DYto2L-4Jets\_Bin-1J**

N.B.: check the hyphen and underscore in this case carefully!!

Other binning cases are trivial, e.g., DYto2L-4Jets\_Bin-MLL-60to90, DYto2L-4Jets\_Bin-HT-100to200 ...

For bins without an upper boundary no need to add `Inf`: e.g. `600toInf` should be `600` directly.

If sample is binned in multiple variables, separate the various parts with `-` and list the bins. The bins should be first ordered with number of jets (if it is binned in number of jets) and then the rest should be listed in alphabetical order, e.g. `Bin-HT-100to400-MLL-50to120`, `Bin-1J-MLL-50to120`.

N.B.:The only exception to this rule is for jet bins (for historical reasons). In this case you should use the format `Bin-0J`, `Bin-1J`, ...

### **FILTER**

The format is: **Fil-FILTER1-FILTER2**

If more than one filter is used, separate them with `-` and list filters in alphabetical order, e.g. `Fil-K0s-Mu`.

Some complicated cases:

* `DYto2L-4Jets_Fil-BEnriched`: GEN filter requiring for b quarks from parton shower (maximum jet multiplicity is 4 in LHE level )

### **PARAMETERS**

The format is: **Par-PARAMETER1-VALUE1-PARAMETER2-VALUE2**

This is used to identify the values (NB: not the ranges, for which we use BINNING) of some relevant parameters in the physics process, such as the mass of the Higgs boson, Z', ...

If more than one parameter is used, separate them with `-` and list parameters in alphabetical order, e.g. `Par-ctau-100cm-M-1000GeV`.

### **TUNE**

When using Pythia8, the format is: **TuneCPX** with X between 1 and 5.

* Tunes CP1 and CP2 are LO tunes and go along with LO PDF sets (NNPDF3.1 LO - \alpha\_s = 0.130)
* Tunes CP3, CP4, CP5 are NLO tunes and go along with NLO PDF sets (NNPDF3.1 N(N)LO - \alpha\_s = 0.180)

When using Herwig7, the tune is CH3 and the format is **TuneCH3**.

When using Sherpa, we currently use the default tune from Sherpa authors and the format is **TuneSherpaDef**.

### BEAME

The format is: **13p6TeV**

This is fixed and must not be changed for Run3 pp collisions.

### ME-PS

The generators to be used are:

| GENERATOR                                                  | NAME               |
| ---------------------------------------------------------- | ------------------ |
| Pythia8                                                    | `pythia8`          |
| Pythia6                                                    | `pythia6`          |
| Herwig6                                                    | `herwig6`          |
| Herwig++                                                   | `herwigpp`         |
| Herwig7                                                    | `herwig7`          |
| Sherpa                                                     | `sherpa`           |
| MadGraph/MG5\_aMC\@NLO (LO)                                | `madgraph`         |
| MadGraph/MG5\_aMC\@NLO (LO) **e.g. showered with Pythia8** | `madgraph-pythia8` |
| MadGraph/MG5\_aMC\@NLO (NLO)                               | `amcatnlo`         |
| Alpgen                                                     | `alpgen`           |
| MC\@NLO                                                    | `mcatnlo`          |
| POWHEG                                                     | `powheg`           |
| POWHEG **e.g. showered with Pythia8**                      | `powheg-pythia8`   |
| JHUGen                                                     | `jhugen`           |
| POWHEG+JHUGen                                              | `powheg-jhugen`    |
| HARDCOL                                                    | `hardcol`          |
| BCVEGPY 2                                                  | `bcvegpy2`         |
| ...                                                        | ...                |

If there are specialized decay tools used, please append this to the name, e.g. if EvtGen was used after Pythia8, use `...pythia8-evtgen, ...pythia8-tauola, ...pythia8-photos`.

When madspin is used, please append to the name, e.g.: `powheg-madspin-pythia8`, `madgraph-madspin-pythia8`.

When merging/matching methods are used in in MadGraph/MG5\_aMC\@NLO, POWHEG or Sherpa, please refer to the following table:

| GENERATOR                                | NAME           |
| ---------------------------------------- | -------------- |
| MadGraph5\_aMC\@NLO (LO) + MLM merging   | `madgraphMLM`  |
| MadGraph5\_aMC\@NLO (NLO) + FxFx merging | `amcatnloFXFX` |
| POWHEG + MiNLO method                    | `powhegMINLO`  |
| POWHEG + MiNNLO method                   | `powhegMINNLO` |
| Sherpa + MEPS merging                    | `sherpaMEPS`   |

### Some full examples

This is a list of examples, comparing OLD (not ok) and NEW names (following the current rules).

* OLD: ADDGravTo2G\_NegInt-0\_LambdaT-10000\_M-1000To2000\_TuneCP5\_13p6TeV\_pythia8
* NEW: ADDGravTo2G\_Bin-M-1000to2000\_Par-NegInt-0-LambdaT-10000\_TuneCP5\_13p6TeV\_pythia8
* OLD: AMSB\_Higgsino\_M1000GeV\_ctau100cm\_TuneCP5\_13p6TeV\_madgraph-pythia8
* NEW: AMSB-Higgsino\_Par-ctau-100cm-M-1000GeV\_TuneCP5\_13p6TeV\_madgraph-pythia8
* OLD: B0ToJpsiK0s\_JMM\_BMuFilter\_DGamma0\_SoftQCDnonD\_TuneCP5\_13p6TeV-pythia8-evtgen
* NEW: B0ToJpsiK0s-JMM\_Fil-BMu\_Par-DGamma-0\_SoftQCDnonD\_TuneCP5\_13p6TeV\_pythia8-evtgen
* OLD: bbH\_Hto2Zto4L\_M-125\_TuneCP5\_13p6TeV\_JHUGenV752-pythia8
* NEW: BBH-Hto2Zto4L\_Par-M-125\_TuneCP5\_13p6TeV\_jhugen-pythia8
* OLD: B0ToK0sMuMu\_MuFilter\_K0sFilter\_TuneCP5\_13p6TeV\_pythia8-evtgen
* NEW: B0ToK0sMuMu\_Fil-K0s-Mu\_TuneCP5\_13p6TeV\_pythia8-evtgen
* OLD: DYBto2LB-4Jets\_MLL-120\_HT-100to400\_TuneCP5\_13p6TeV\_madgraphMLM-pythia8
* NEW: DYBto2LB-4Jets\_Bin-HT-100to400-MLL-120\_TuneCP5\_13p6TeV\_madgraphMLM-pythia8
* OLD: DYto2L-2Jets\_MLL-50\_0J\_TuneCP5Down\_13p6TeV\_amcatnloFXFX-pythia8
* NEW: DYto2L-2Jets\_Bin-0J-MLL-50\_TuneCP5Down\_13p6TeV\_amcatnloFXFX-pythia8
* OLD: DYto2L-4Jets\_MLL-50to120\_HT-100to400\_TuneCP5\_13p6TeV\_madgraphMLM-pythia8
* NEW: DYto2L-4Jets\_Bin-HT-100to400-MLL-50to120\_TuneCP5\_13p6TeV\_madgraphMLM-pythia8
* OLD: RPVStopStopToJets\_UDD323\_M-700\_TuneCP5\_13p6TeV-madgraphMLM-pythia8
* NEW: RPVStopStoptoJets\_Par-M-700\_UDD323\_TuneCP5\_13p6TeV\_madgraphMLM-pythia8
* OLD: SUEP\_mMed-125\_mDark-2\_temp-0p5\_decay-generic\_14TeV-pythia8
* NEW: SUEP\_Par-mDark-2-mMed-125-temp-0p5\_decayGeneric\_14TeV-pythia8
* OLD: WminusH\_Wto2Q\_Hto2G\_M-125\_TuneCP5\_13p6TeV\_powheg-minlo-HWJ-pythia8
* NEW: WminusH-Wto2Q-Hto2G\_Par-M-125\_TuneCP5\_13p6TeV\_powhegMINLO-pythia8
* OLD: TtoLNu-2Jets\_s-channel\_TuneCP5\_13p6TeV\_amcatnloFXFX-pythia8
* NEW: TtoLNu-2Jets-schannel\_TuneCP5\_13p6TeV\_amcatnloFXFX-pythia8
* OLD: GluGluSpin0To2G\_W-5p6\_M-1750\_TuneCP5\_13p6TeV\_pythia8
* NEW: GluGluSpin0To2G\_Par-M-1750-W-5p6\_TuneCP5\_13p6TeV\_pythia8
