Project: Discover the Higgs!

Project: Discover the Higgs!#

CREDIT: This notebook was developed by Jim Pivarski and Ioana Ifrim for the columnar data analysis tutorial, presented at CoDaS-HEP on July 20, 2023. For full material, please visit please visit this repository.

Gather into groups of 2 or 3 to work on the problems posed in the rest of this notebook.

Setup#

Libraries#

import hist, vector
import uproot
import awkward as ak
import matplotlib.pyplot as plt
import numpy as np

vector.register_awkward()

Data Setup#

# Run to download the dataset
!wget https://cernbox.cern.ch/files/spaces/eos/user/r/rcruzcan/public_downloads/SMHiggsToZZTo4L.root

# Get a handle to the ROOT file and show the available branches in the tree
file = uproot.open("./SMHiggsToZZTo4L.root")
tree = file["Events"]
tree.show()

# Load the data into an awkward array
events = tree.arrays()

For future convenience, and to be able to use vector more easily, we will reformat the loaded data into a more manageable format. Note that this is not neccesary, but is just to make life a bit simpler.

events = ak.zip({
    "PV": ak.zip({
        "x": events["PV_x"],
        "y": events["PV_y"],
        "z": events["PV_z"],
    }, with_name="Vector3D"),
    "muon": ak.zip({
        "pt": events["Muon_pt"],
        "phi": events["Muon_phi"],
        "eta": events["Muon_eta"],
        "mass": events["Muon_mass"],
        "charge": events["Muon_charge"],
        "pfRelIso03": events["Muon_pfRelIso03_all"],
        "pfRelIso04": events["Muon_pfRelIso04_all"],
    }, with_name="Momentum4D"),
    "electron": ak.zip({
        "pt": events["Electron_pt"],
        "phi": events["Electron_phi"],
        "eta": events["Electron_eta"],
        "mass": events["Electron_mass"],
        "charge": events["Electron_charge"],
        "pfRelIso03": events["Electron_pfRelIso03_all"],
    }, with_name="Momentum4D"),
    "MET": ak.zip({
        "pt": events["MET_pt"],
        "phi": events["MET_phi"],
    }, with_name="Momentum2D"),
}, depth_limit=1)

# Example of accesing some data through dict-like syntax
events["muon"]["pt"]

# Example of accessing data through the awkward array syntax
events.muon.pt

Higgs mass peak: 4 leptons of the same flavor#

Exercise

Instead of a Z mass peak, formed with 2 muons (or 2 electrons), draw a Higgs mass peak with 4 muons (or 4 electrons). No need for any cuts, yet. Focus only on the combinatorics.

Hint: Look at the ak.combinations documentation to find the argument you need to change.

Hint: Do it in small steps! That’s what an interactive environment is for.

Should look like:

Charge-based collections#

Because of the way that particles are measured and reconstructed, different particle types (electron versus muon) are in different collections, but different charges are not. But physically, charge is a quantum number just like flavor (particle type), and it can be convenient to put different charges into different collections.

muons_plus  = events.muons[events.muons.charge > 0]
muons_minus = events.muons[events.muons.charge < 0]
electrons_plus  = events.electrons[events.electrons.charge > 0]
electrons_minus = events.electrons[events.electrons.charge < 0]

Now we can make opposite-sign Z peaks without applying a cut to the combinations. Also, the ak.combinations problem (picking \(n\) items from a single collection) has become an ak.cartesian problem (finding all pairs of items drawn from different collections).

mu1, mu2 = ak.unzip(ak.cartesian((muons_plus, muons_minus)))
hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((mu1 + mu2).mass)).plot()

(This section doesn’t have any questions; it’s to set things up for the next section.)

Higgs mass peak: the H → ZZ → 2μ2e final state#

Exercise

Now that you have muons_plus, muons_minus, electrons_plus, and electrons_minus, how would you make a Higgs mass peak for decays into 2 muons and 2 electrons?

Hint: ak.cartesian can take more than two input collections.

Hint: Still no need for cuts, thanks to the input collections already having the charge-cut applied.

Should look like:

Select on-shell Z in the 2μ2e final state#

In \(H \to ZZ\), one of the two \(Z\) bosons will usually be close to its “on-shell” mass of 91 GeV.

import particle
import hepunits

ZMASS = particle.Particle.findall("Z0")[0].mass / hepunits.GeV
ZMASS

With a pair of muons,

mu1, mu2 = ak.unzip(ak.cartesian((muons_plus, muons_minus)))
e1, e2 = ak.unzip(ak.cartesian((electrons_plus, electrons_minus)))

hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((mu1 + mu2).mass)).plot();

We can construct a dimuon mass and compute the absolute distance between that and the on-shell mass of 91 GeV.

abs((mu1 + mu2).mass - ZMASS)

ak.ravel(abs((mu1 + mu2).mass - ZMASS))

When this is close enough, maybe let’s say 20 GeV, we can call a given dimuon pair to be “on shell.”

onshell_mumu = abs((mu1 + mu2).mass - ZMASS) < 20

The effect of this is a sharp cut-off in the Z mass distribution.

hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((mu1 + mu2)[onshell_mumu].mass)).plot();

Exercise

In this section, make another 2μ2e mass peak in which either the 2μ is on-shell or the 2e is on-shell.

(That’s an “inclusive or”: having both be close to 91 GeV is allowed.)

Hint: You’ll have to compute the Z mass constraint from muons and electrons in a 4-way Cartesian product, not just the 2 muons as in my example above.

Hint: To make logical combinations of cuts, use | for “or” and & for “and”. Also, put parentheses around any comparisons: e.g. is_good & ((1 < x) | (x < 2)).

Should look like:

Select on-shell Z in the 4μ final state#

Addressing Z boson properties in the 2\(\mu\)2\(e\) case is easier than in the 4\(\mu\) or 4\(e\) cases because with each (\(\mu^+\), \(\mu^-\), \(e^+\), \(e^-\)) quad-tuple, there is only one way to identify each of the two \(Z\) bosons: \(Z_{\mu\mu}\to\mu^+\mu^-\) and \(Z_{ee}\to e^+e^-\). In a same-flavor final state, you have quad-tuples like (\(\mu^+_1\), \(\mu^-_1\), \(\mu^+_2\), \(\mu^-_2\)).

The possible decays are

\(Z_{11} \to \mu^+_1\mu^-_1\) and \(Z_{22}\to \mu^+_2,\mu^-_2\)
\(Z_{12} \to \mu^+_1\mu^-_2\) and \(Z_{21}\to \mu^+_2,\mu^-_1\)

and within each of the two possibilities, only one of the Z bosons will be on-shell. (The Higgs doesn’t have enough mass for both to be on-shell.)

To apply Z cuts in this case, we need to apply a combinatoric primitive to the result of a combinatoric primitive: nested combinatorics. First, from the muons_plus collection, we draw two distinct muons and call them muplus1 and muplus2. Then, from the muons_minus collection, we draw two distinct muons and call them muminus1 and muminus2.

muplus1, muplus2 = ak.unzip(ak.combinations(muons_plus, 2))
muminus1, muminus2 = ak.unzip(ak.combinations(muons_minus, 2))

Next, we need to find pairwise Cartesian products of each opposite-charge combination:

pairs of muplus1 ⊗ muminus1, which can be labeled “11”
pairs of muplus1 ⊗ muminus2, which can be labeled “12”
pairs of muplus2 ⊗ muminus1, which can be labeled “21”
pairs of muplus2 ⊗ muminus2, which can be labeled “22”

Note that every real combination will either be a “11” and “22” or it will be a “12” and “21”.

muplus11, muminus11 = ak.unzip(ak.cartesian((muplus1, muminus1)))
muplus12, muminus12 = ak.unzip(ak.cartesian((muplus1, muminus2)))
muplus21, muminus21 = ak.unzip(ak.cartesian((muplus2, muminus1)))
muplus22, muminus22 = ak.unzip(ak.cartesian((muplus2, muminus2)))

By construction each of these four collections has the same number of items in each event.

ak.num(muplus11), ak.num(muplus12), ak.num(muplus21), ak.num(muplus22)

ak.all((ak.num(muplus11) == ak.num(muplus12)) & (ak.num(muplus21) == ak.num(muplus22)) & (ak.num(muplus11) == ak.num(muplus22)))

We can look at the four possible Z bosons individually.

First the “11” and “22”:

hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((muplus11 + muminus11).mass)).plot();

hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((muplus22 + muminus22).mass)).plot();

Now the “12” and “21”:

hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((muplus12 + muminus12).mass)).plot();

hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((muplus21 + muminus21).mass)).plot();

They all have large backgrounds because they’re often not the right combination.

Now we’ll make distances as we did in the previous section.

dist11 = abs((muplus11 + muminus11).mass - ZMASS)
dist12 = abs((muplus12 + muminus12).mass - ZMASS)
dist21 = abs((muplus21 + muminus21).mass - ZMASS)
dist22 = abs((muplus22 + muminus22).mass - ZMASS)

Because we have so many possibilities, let’s define cuts like onshell_11 to mean “11 is the closest to being on-shell.”

That doesn’t mean it’s within 20 GeV of 91 GeV; it just means that it’s closer than the others.

(The organization of the cuts is as follows: for each “11”, “12”, “21”, “22”, the relevant dist is to the left of < in all comparisons, and the three other dist values it’s being compared to are every other “11”, “12”, “21”, “22” combination.)

onshell_11 = (dist11 < dist12) & (dist11 < dist21) & (dist11 < dist22)
onshell_12 = (dist12 < dist11) & (dist12 < dist21) & (dist12 < dist22)
onshell_21 = (dist21 < dist11) & (dist21 < dist12) & (dist21 < dist22)
onshell_22 = (dist22 < dist11) & (dist22 < dist12) & (dist22 < dist21)

Now let’s look at a \(Z\to 2\mu\) mass plot of “11” in which “11” is the closest to being on-shell.

Naturally, there’s a more pronounced peak.

h = hist.Hist.new.Regular(100, 0, 150).Double()

h.fill(ak.ravel((muplus11 + muminus11)[onshell_11].mass))

h.plot();

Now let’s look at \(H \to ZZ\to 4\mu\) for

“11” and “22” if “11” is on-shell or “22” is on-shell
“12” and “21” if “12” is on-shell or “21” is on-shell

h = hist.Hist.new.Regular(100, 0, 150).Double()

h.fill(ak.ravel((muplus11 + muminus11 + muplus22 + muminus22)[onshell_11 | onshell_22].mass))
h.fill(ak.ravel((muplus12 + muminus21 + muplus12 + muminus21)[onshell_12 | onshell_21].mass))

h.plot();

To go further and require “11” to be within 20 GeV of 91 GeV when it’s already the closest to it would be a cut like the following:

onshell_11 & (dist11 < 20)

Exercise

Now re-make the above plot, but require whichever muon pair is closest to being on-shell to be within 20 GeV of it.

Should look like:

(And compare that to the 4 muon mass without any Z mass constraints!)

Commentary#

Dealing with combinatorics is complex, but by reducing the single μ collection into a μ⁺ collection and a μ⁻ collection, it becomes easier (and more memory efficient) to consider all combinations without having to apply charge constraints after the fact.

Objects like muplus11, representing the μ⁺ in all “11” combinations, can be dealt with like scalars, like the single “11” μ⁺ inside a nested loop over combinations in imperative programming. (No array-length changing operations were performed after the two ak.combinations, ak.cartesian steps in the last section.)

But because the muplus11 object represents the μ⁺ in all “11” combinations, we can plot distributions of what we’re computing at every step, rather than just once at the end of a script, which is great for debugging.

That’s the value of array-oriented programming—the ability to look at distributions of each step while you develop the script—if you take advantage of that opportunity.

Project: Discover the Higgs!

Contents

Project: Discover the Higgs!#

Setup#

Libraries#

Data Setup#

Higgs mass peak: 4 leptons of the same flavor#

Charge-based collections#

Higgs mass peak: the H → ZZ → 2μ2e final state#

Select on-shell Z in the 2μ2e final state#

Select on-shell Z in the 4μ final state#

Commentary#