Vectors and Multi-dimensional Histograms

Vectors and Multi-dimensional Histograms#

Vectors#

So far, to compute the mass for the dimuon spectrum, we have been using the equation

\[m_{\mu\mu} = \sqrt{ 2p_{T,0} p_{T,1} * \left(\cosh(\eta_0 - \eta_1) - \cos(\phi_0- \phi_1)\right) }\]

However, having to implement things like this and other equations that are common in HEP involving vector quantities can get annoying quickly. Luckily, Scikit-HEP has a solution for this: the vector library! This library can handle different types of vectors such as 4-vectors, 3D spatial vectors, and more, and it can seamlessly compute derived quantities. Moreover, these vectors can be represented in a variety of coordinate systems: cartesian, cylindrical, spherical, and any combination of these with time and proper time for Lorentz vectors.

As an example, lets define two 3-momentum vectors. and add them.

import vector

one = vector.obj(px=1, py=0, pz=0)
two = vector.obj(px=0, py=1, pz=1)

one + two

We can also get the magnitude of the “distance” between these two vectors.

one.deltaR(two)

And we can very easily change the coordinate system.

one.to_rhophieta()

(one + two).to_rhophieta()

Notice that we can get derived quantities not explicitly given to the object.

one.to_Vector4D()

This library includes the classes MomentumNumpy2D, MomentumNumpy3D and MomentumNumpy4D which are NumPy array sub-types, so NumPy arrays can be cast to these types and get all the vector function. As an example, lets use the skhep_testdata library to load some test data which contains simulated events of \(Z \to \mu\mu\).

import uproot
import skhep_testdata
import awkward as ak
tree = uproot.open(skhep_testdata.data_path("uproot-Zmumu.root"))["events"]

zmumu = tree.arrays()
zmumu

# Loading data into `ndarrays`
# Each column in the ndarray has a name, which is the name of the branch in the tree.
mu1 = ak.to_numpy(tree.arrays(filter_name=["E1", "p[xyz]1"]))
mu2 = ak.to_numpy(tree.arrays(filter_name=["E1", "p[xyz]2"]))
mu1, mu2

# Changing name of columns
mu1.dtype.names = ("E", "px", "py", "pz")
mu2.dtype.names = ("E", "px", "py", "pz")

# Casting to vector.MomentumNumpy4D
mu1_vector = mu1.view(vector.MomentumNumpy4D)
mu2_vector = mu2.view(vector.MomentumNumpy4D)

Now that we have access to a lot of useful methods for 4-vectors (4-momentum in this case). Let’s see some examples:

mu1_vector.t: gives you the array of energy (\(E\)) values. In the context of a 4-momentum vector \((E, p_x, p_y, p_z)\), .t refers to the time-like (energy) component.

mu1_vector.t

mu1_vector.beta gives you the speed of each muon in the lab frame as a fraction of the speed of light, \(c\). In relativistic kinematics, \(\beta\) is defined as:

\[ \beta = \frac{|\vec{p}|}{E} = \frac{v}{c} \]

where \(|\vec{p}|\) is the magnitude of the muon’s momentum and \(E\) is its energy. The value of \(\beta\) ranges from 0 (at rest) to nearly 1 (ultra-relativistic).

# Getting the speed of the muon in the lab frame
mu1_vector.beta

mu1_vector.boostX() applies a Lorentz boost to each muon 4-vector in the x direction. A Lorentz boost changes the reference frame, simulating what the particle’s energy and momentum would look like if observed from a frame moving with velocity \(\beta\) along the x-axis.

The boost matrix in the x direction is:

\[\begin{split} \begin{pmatrix} \gamma & -\beta\gamma & 0 & 0 \\ -\beta\gamma & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \end{split}\]

where \(\beta = v/c\) is the velocity as a fraction of the speed of light, and \(\gamma = 1/\sqrt{1-\beta^2}\).

Applying this matrix to a 4-vector \((E, p_x, p_y, p_z)\) transforms the energy and \(p_x\) components, leaving \(p_y\) and \(p_z\) unchanged. In practice, mu1_vector.boostX() returns a new array of 4-vectors, each boosted in the x direction by their own velocity. This is useful for transforming to the rest frame of a particle or for simulating different reference frames in relativistic kinematics.

mu1_vector.boostX()

mu1_vector.Et gives you the transverse energy (\(E_T\)) of a particle which is defined as the component of its energy perpendicular to the beam (z) axis. It is given by:

\[ E_T = E \cdot \sin\theta \]

where \(E\) is the total energy of the particle and \(\theta\) is the angle between the particle’s momentum and the beam axis. In collider experiments, \(E_T\) is important because it is less sensitive to the unknown longitudinal motion of the initial state and is often used in event selection and analysis.

# We can now use vector to get derived quantities
#$E_T = E \cdot \sin\theta$
mu1_vector.Et

mu1_vector.mt gives you the transverse mass (\(m_T\)) which is a quantity used to describe the mass of a particle or system as projected onto the plane perpendicular to the beam (the transverse plane). It is especially useful in collider experiments where the longitudinal momentum (along the beam axis) is not fully known or is less relevant. For a single particle, the transverse mass is defined as:

\[ m_T = \sqrt{E^2 - p_z^2} \]

where \(E\) is the total energy of the particle, and \(p_z\) is the component of the particle’s momentum along the beam (z) axis. The transverse mass represents the minimum possible invariant mass the particle could have, given its energy and longitudinal momentum. It is often used in analyses involving missing energy, such as searches for new particles or studies of neutrinos, because it is less sensitive to unknown or unmeasured longitudinal motion.

# Another derived quantity
#$m_T = \sqrt{E^2 - p_z^2}$
mu1_vector.mt

mu1_vector.deltaR(), given another 4-momentum or array of 4-momentums, computes the quantity \(\Delta R\) which a measure of the separation between two particles in the detector, combining their differences in pseudorapidity (\(\eta\)) and azimuthal angle (\(\phi\)):

\[ \Delta R = \sqrt{(\Delta \eta)^2 + (\Delta \phi)^2} \]

\(\Delta R\) is used quantify how close two particles are in the detector’s angular coordinates. It is especially important for jet clustering algorithms and for defining isolation criteria.

#$$\Delta R = \sqrt{\Delta\phi^2 + \Delta\eta^2}$$
mu1_vector.deltaR(mu2_vector)

mu1_vector.tau returns the proper time (or “invariant interval”) associated with the vector. Mathematically, vector defines it as:

\[ \tau = \text{copysign}\left(\sqrt{\left| t^2 - |\vec{p}|^2 \right|},\ t^2 - |\vec{p}|^2 \right) \]

where \(t\) is the time-like (energy) component, and \(|\vec{p}|\) is the magnitude of the spatial momentum. For a particle’s 4-momentum, \(\tau\) is related to the particle’s rest mass (for massive particles, \(\tau = m\)). If \(t^2 - |\vec{p}|^2 > 0\), the interval is “time-like” (as for massive particles); if \(t^2 - |\vec{p}|^2 < 0\), it is “space-like” (as for virtual particles or spacelike separations). In summary, .tau gives you the invariant mass (or proper time) associated with the 4-momentum, taking into account the sign of the interval.

mu1_vector.tau

mu1_vector.to_rhophieta() converts each 4-momentum vector in mu1_vector to a cylindrical coordinate representation, returning a new array of vectors with the components \(\rho\), \(\phi\), \(\eta\), and \(E\).

mu1_vector.to_rhophieta()

Awkward arrays and Vector#

If you do vector.register_awkward(), if you name Awkward records “Momentum2D”, “Momentum3D”, or “Momentum4D”, Awkward will recognize this and you will be able to use vector methods.

vector.register_awkward()

tree = uproot.open(skhep_testdata.data_path("uproot-HZZ.root"))["events"]
array = tree.arrays(filter_name=["Muon_E", "Muon_P[xyz]"])

muons = ak.zip(
    {
        "px": array.Muon_Px, 
        "py": array.Muon_Py, 
        "pz": array.Muon_Pz, 
        "E": array.Muon_E
    }, with_name="Momentum4D",
)

mu1, mu2 = ak.unzip(ak.combinations(muons, 2))

mu1 + mu2

mu1.deltaR(mu2)

muons.to_rhophieta()

mu1.p

Exercise

Now that you know how to use vector, try making the dimuon mass spectrum again, but now only include the mass of muon pairs which are close in \(\eta\)-\(\phi\) space. You may use the thresholds \(\Delta R < 0.4\) and \(\Delta R < 0.8\) and see if it makes a significant difference.

Hist#

Universal Histogram Indexing (UHI)#

This indexing system is being developed within Scikit-HEP in order to standardize what it means to do array-like slices for histograms. Let’s first load data to use in our examples.

data = uproot.open(skhep_testdata.data_path("uproot-Zmumu.root"))["events"].arrays()
data

Lets now make a 1D histogram of the mass.

import hist

h = hist.Hist(hist.axis.Regular(120, 60, 120, name="mass"))
h.fill(data["M"])
h.plot()

Firstly, integer slices select a range of bins.

# Selecting from bin 10 to bin 110
h[10:110].plot()

If we want to slice by coordinate value, we can just include a j.

# Slicing the histogram to get from coordinate 90 to the end
h[90j:].plot()

If we want to rebin by a certain factor, it can be done also through slicing syntax.

# Selecting all bins and then rebinning by a factor of 5
h[::20j].plot()

And you can sum over a range by doing the following.

# Slicing from coordinate 90 to 100 and the summing
h[90j:100j:sum]

Multi-dimensional histograms#

A powerful feature of hist is that it allows you to handle histograms with many dimensions (even more than the amount you could plot!).

import uproot
import hist
import awkward as ak

picodst = uproot.open(
    "https://pivarski-princeton.s3.amazonaws.com/pythia_ppZee_run17emb.picoDst.root:PicoDst"
)

vertexhist = hist.Hist(
    hist.axis.Regular(600, -1, 1, label="x"),
    hist.axis.Regular(600, -1, 1, label="y"),
    hist.axis.Regular(40, -200, 200, label="z"),
)

vertex_data = picodst.arrays(filter_name="*mPrimaryVertex[XYZ]")

vertexhist.fill(
    ak.flatten(vertex_data["Event.mPrimaryVertexX"]),
    ak.flatten(vertex_data["Event.mPrimaryVertexY"]),
    ak.flatten(vertex_data["Event.mPrimaryVertexZ"]),
)

vertexhist[:, :, sum].plot2d_full()
vertexhist[-0.25j:0.25j, -0.25j:0.25j, sum].plot2d_full()
vertexhist[sum, sum, :].plot()
vertexhist[-0.25j:0.25j:sum, -0.25j:0.25j:sum, :].plot()

Exercise

Using the \(Z \to \mu\mu\) data we downoaded before, plot a 2d histogram of eta vs phi of the muons. Explain what you are looking at in the histogram. (Hint: Refer back to the Figure of the CMS coordinate system)