Project: Discover the Higgs!#
CREDIT: This notebook was developed by Jim Pivarski and Ioana Ifrim for the columnar data analysis tutorial, presented at CoDaS-HEP on July 20, 2023. For full material, please visit please visit this repository.
Gather into groups of 2 or 3 to work on the problems posed in the rest of this notebook.
Setup#
Libraries#
import hist, vector
import uproot
import awkward as ak
import matplotlib.pyplot as plt
import numpy as np
vector.register_awkward()
Data Setup#
# Run to download the dataset
!wget https://cernbox.cern.ch/files/spaces/eos/user/r/rcruzcan/public_downloads/SMHiggsToZZTo4L.root
# Get a handle to the ROOT file and show the available branches in the tree
file = uproot.open("./SMHiggsToZZTo4L.root")
tree = file["Events"]
tree.show()
# Load the data into an awkward array
events = tree.arrays()
For future convenience, and to be able to use vector
more easily, we will reformat the loaded data into a more manageable format. Note that this is not neccesary, but is just to make life a bit simpler.
events = ak.zip({
"PV": ak.zip({
"x": events["PV_x"],
"y": events["PV_y"],
"z": events["PV_z"],
}, with_name="Vector3D"),
"muon": ak.zip({
"pt": events["Muon_pt"],
"phi": events["Muon_phi"],
"eta": events["Muon_eta"],
"mass": events["Muon_mass"],
"charge": events["Muon_charge"],
"pfRelIso03": events["Muon_pfRelIso03_all"],
"pfRelIso04": events["Muon_pfRelIso04_all"],
}, with_name="Momentum4D"),
"electron": ak.zip({
"pt": events["Electron_pt"],
"phi": events["Electron_phi"],
"eta": events["Electron_eta"],
"mass": events["Electron_mass"],
"charge": events["Electron_charge"],
"pfRelIso03": events["Electron_pfRelIso03_all"],
}, with_name="Momentum4D"),
"MET": ak.zip({
"pt": events["MET_pt"],
"phi": events["MET_phi"],
}, with_name="Momentum2D"),
}, depth_limit=1)
# Example of accesing some data through dict-like syntax
events["muon"]["pt"]
# Example of accessing data through the awkward array syntax
events.muon.pt
Higgs mass peak: 4 leptons of the same flavor#
Exercise
Instead of a Z mass peak, formed with 2 muons (or 2 electrons), draw a Higgs mass peak with 4 muons (or 4 electrons). No need for any cuts, yet. Focus only on the combinatorics.
Hint: Look at the ak.combinations documentation to find the argument you need to change.
Hint: Do it in small steps! That’s what an interactive environment is for.
Should look like:

Charge-based collections#
Because of the way that particles are measured and reconstructed, different particle types (electron versus muon) are in different collections, but different charges are not. But physically, charge is a quantum number just like flavor (particle type), and it can be convenient to put different charges into different collections.
muons_plus = events.muons[events.muons.charge > 0]
muons_minus = events.muons[events.muons.charge < 0]
electrons_plus = events.electrons[events.electrons.charge > 0]
electrons_minus = events.electrons[events.electrons.charge < 0]
Now we can make opposite-sign Z peaks without applying a cut to the combinations. Also, the ak.combinations problem (picking \(n\) items from a single collection) has become an ak.cartesian problem (finding all pairs of items drawn from different collections).
mu1, mu2 = ak.unzip(ak.cartesian((muons_plus, muons_minus)))
hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((mu1 + mu2).mass)).plot()
(This section doesn’t have any questions; it’s to set things up for the next section.)
Higgs mass peak: the H → ZZ → 2μ2e final state#
Exercise
Now that you have muons_plus
, muons_minus
, electrons_plus
, and electrons_minus
, how would you make a Higgs mass peak for decays into 2 muons and 2 electrons?
Hint: ak.cartesian can take more than two input collections.
Hint: Still no need for cuts, thanks to the input collections already having the charge-cut applied.
Should look like:

Select on-shell Z in the 2μ2e final state#
In \(H \to ZZ\), one of the two \(Z\) bosons will usually be close to its “on-shell” mass of 91 GeV.
import particle
import hepunits
ZMASS = particle.Particle.findall("Z0")[0].mass / hepunits.GeV
ZMASS
With a pair of muons,
mu1, mu2 = ak.unzip(ak.cartesian((muons_plus, muons_minus)))
e1, e2 = ak.unzip(ak.cartesian((electrons_plus, electrons_minus)))
hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((mu1 + mu2).mass)).plot();
We can construct a dimuon mass and compute the absolute distance between that and the on-shell mass of 91 GeV.
abs((mu1 + mu2).mass - ZMASS)
ak.ravel(abs((mu1 + mu2).mass - ZMASS))
When this is close enough, maybe let’s say 20 GeV, we can call a given dimuon pair to be “on shell.”
onshell_mumu = abs((mu1 + mu2).mass - ZMASS) < 20
The effect of this is a sharp cut-off in the Z mass distribution.
hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((mu1 + mu2)[onshell_mumu].mass)).plot();
Exercise
In this section, make another 2μ2e mass peak in which either the 2μ is on-shell or the 2e is on-shell.
(That’s an “inclusive or”: having both be close to 91 GeV is allowed.)
Hint: You’ll have to compute the Z mass constraint from muons and electrons in a 4-way Cartesian product, not just the 2 muons as in my example above.
Hint: To make logical combinations of cuts, use |
for “or” and &
for “and”. Also, put parentheses around any comparisons: e.g. is_good & ((1 < x) | (x < 2))
.
Should look like:

Select on-shell Z in the 4μ final state#
Addressing Z boson properties in the 2\(\mu\)2\(e\) case is easier than in the 4\(\mu\) or 4\(e\) cases because with each (\(\mu^+\), \(\mu^-\), \(e^+\), \(e^-\)) quad-tuple, there is only one way to identify each of the two \(Z\) bosons: \(Z_{\mu\mu}\to\mu^+\mu^-\) and \(Z_{ee}\to e^+e^-\). In a same-flavor final state, you have quad-tuples like (\(\mu^+_1\), \(\mu^-_1\), \(\mu^+_2\), \(\mu^-_2\)).
The possible decays are
\(Z_{11} \to \mu^+_1\mu^-_1\) and \(Z_{22}\to \mu^+_2,\mu^-_2\)
\(Z_{12} \to \mu^+_1\mu^-_2\) and \(Z_{21}\to \mu^+_2,\mu^-_1\)
and within each of the two possibilities, only one of the Z bosons will be on-shell. (The Higgs doesn’t have enough mass for both to be on-shell.)
To apply Z cuts in this case, we need to apply a combinatoric primitive to the result of a combinatoric primitive: nested combinatorics. First, from the muons_plus
collection, we draw two distinct muons and call them muplus1
and muplus2
. Then, from the muons_minus
collection, we draw two distinct muons and call them muminus1
and muminus2
.
muplus1, muplus2 = ak.unzip(ak.combinations(muons_plus, 2))
muminus1, muminus2 = ak.unzip(ak.combinations(muons_minus, 2))
Next, we need to find pairwise Cartesian products of each opposite-charge combination:
pairs of
muplus1 ⊗ muminus1
, which can be labeled “11”pairs of
muplus1 ⊗ muminus2
, which can be labeled “12”pairs of
muplus2 ⊗ muminus1
, which can be labeled “21”pairs of
muplus2 ⊗ muminus2
, which can be labeled “22”
Note that every real combination will either be a “11” and “22” or it will be a “12” and “21”.
muplus11, muminus11 = ak.unzip(ak.cartesian((muplus1, muminus1)))
muplus12, muminus12 = ak.unzip(ak.cartesian((muplus1, muminus2)))
muplus21, muminus21 = ak.unzip(ak.cartesian((muplus2, muminus1)))
muplus22, muminus22 = ak.unzip(ak.cartesian((muplus2, muminus2)))
By construction each of these four collections has the same number of items in each event.
ak.num(muplus11), ak.num(muplus12), ak.num(muplus21), ak.num(muplus22)
ak.all((ak.num(muplus11) == ak.num(muplus12)) & (ak.num(muplus21) == ak.num(muplus22)) & (ak.num(muplus11) == ak.num(muplus22)))
We can look at the four possible Z bosons individually.
First the “11” and “22”:
hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((muplus11 + muminus11).mass)).plot();
hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((muplus22 + muminus22).mass)).plot();
Now the “12” and “21”:
hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((muplus12 + muminus12).mass)).plot();
hist.Hist.new.Regular(100, 0, 150).Double().fill(ak.ravel((muplus21 + muminus21).mass)).plot();
They all have large backgrounds because they’re often not the right combination.
Now we’ll make distances as we did in the previous section.
dist11 = abs((muplus11 + muminus11).mass - ZMASS)
dist12 = abs((muplus12 + muminus12).mass - ZMASS)
dist21 = abs((muplus21 + muminus21).mass - ZMASS)
dist22 = abs((muplus22 + muminus22).mass - ZMASS)
Because we have so many possibilities, let’s define cuts like onshell_11
to mean “11 is the closest to being on-shell.”
That doesn’t mean it’s within 20 GeV of 91 GeV; it just means that it’s closer than the others.
(The organization of the cuts is as follows: for each “11”, “12”, “21”, “22”, the relevant dist
is to the left of <
in all comparisons, and the three other dist
values it’s being compared to are every other “11”, “12”, “21”, “22” combination.)
onshell_11 = (dist11 < dist12) & (dist11 < dist21) & (dist11 < dist22)
onshell_12 = (dist12 < dist11) & (dist12 < dist21) & (dist12 < dist22)
onshell_21 = (dist21 < dist11) & (dist21 < dist12) & (dist21 < dist22)
onshell_22 = (dist22 < dist11) & (dist22 < dist12) & (dist22 < dist21)
Now let’s look at a \(Z\to 2\mu\) mass plot of “11” in which “11” is the closest to being on-shell.
Naturally, there’s a more pronounced peak.
h = hist.Hist.new.Regular(100, 0, 150).Double()
h.fill(ak.ravel((muplus11 + muminus11)[onshell_11].mass))
h.plot();
Now let’s look at \(H \to ZZ\to 4\mu\) for
“11” and “22” if “11” is on-shell or “22” is on-shell
“12” and “21” if “12” is on-shell or “21” is on-shell
h = hist.Hist.new.Regular(100, 0, 150).Double()
h.fill(ak.ravel((muplus11 + muminus11 + muplus22 + muminus22)[onshell_11 | onshell_22].mass))
h.fill(ak.ravel((muplus12 + muminus21 + muplus12 + muminus21)[onshell_12 | onshell_21].mass))
h.plot();
To go further and require “11” to be within 20 GeV of 91 GeV when it’s already the closest to it would be a cut like the following:
onshell_11 & (dist11 < 20)
Exercise
Now re-make the above plot, but require whichever muon pair is closest to being on-shell to be within 20 GeV of it.
Should look like:

(And compare that to the 4 muon mass without any Z mass constraints!)
Commentary#
Dealing with combinatorics is complex, but by reducing the single μ collection into a μ⁺ collection and a μ⁻ collection, it becomes easier (and more memory efficient) to consider all combinations without having to apply charge constraints after the fact.
Objects like muplus11
, representing the μ⁺ in all “11” combinations, can be dealt with like scalars, like the single “11” μ⁺ inside a nested loop over combinations in imperative programming. (No array-length changing operations were performed after the two ak.combinations, ak.cartesian steps in the last section.)
But because the muplus11
object represents the μ⁺ in all “11” combinations, we can plot distributions of what we’re computing at every step, rather than just once at the end of a script, which is great for debugging.
That’s the value of array-oriented programming—the ability to look at distributions of each step while you develop the script—if you take advantage of that opportunity.