The CERN PAW Program

and "ntuples"

ntuples?  
Sample ntuples:  Omega and Cascade
Brief Tutorial  
Exercise 1  
Exercise 2
Fitting  
Exercise 3
Use Results of Fit 
Exercise 4
Plotting Functions  
Exercise 5
This tutorial is relatively brief.  Another useful tutorial is available from CERN.

What is an "ntuple"?

Events from the experiment are reconstructed, filtered and eventually written to an "ntuple" file.  This file contains an entry for each interpretation of an event (there may be several for a single event!)  The entry contains a few words real variables) containing pertinent quantities for this interpretation.  This is called an "ntuple."

Different ntuples could be written for each event, each containing a different set of quantities.  An ID number is associated with each ntuple.
 

Example:  The Omega ntuple

The following quantities are stored for each interpretation of an event as an Omega.  (NOTE that before writing this file, a vast number of events with no conceivable interpretation as Omega had already been rejected.)  The ntuple records are in a file on the UNIX network:  
 

The 17 Most Important Quantities in Omega Ntuple with ID number 2000

Variable #

Name

What it is.

1 NUMOM Number of interpretations for this topology for this event.
2 CHGO Charge of Omega.  (-1. for Omega, +1. for anti-Omega)
3 CatK Category for "K" track.  If >1000 track does not appear in the SMD's
6 Qual dtheta2 + dphi2. (radians 2
Angular agreement between Lambda+K momentum and direction of Omega- track in the SMD's
7 KCpi Cherenkov probability for "K" track to be a pion
8 KCK Cherenkov probability for "K" track to be a K
9 Lmz z position of Lambda decay vertex.
10 Prmz z position of primary (production) vertex.
13 Omz z position of Omega- decay vertex.
15 LM Proton(1)+pi(2) effective mass - sqrt of (E1+E2) 2-|p1+p2|2
16 XM Lambda(1)+pi(2) effective mass - sqrt of (E1+E2) 2-|p1+p2|2
17 OM Lambda(1)+K(2) effective mass - sqrt of (E1+E2) 2-|p1+p2|2
19 POM |p| of Omega- (GeV/c)
26 DIPO Distance of closest approach of SMD Omega - track to production vertex.
27 DIPK Distance of closest approach of K- track to production vertex.
14 ISMD Positive if Omega decays downstream from SMD's; Negative if Omega decays within SMD's.
28 Nprm Number of tracks at primary vertex

Another Example - the Cascade ntuple with ID=3000

A further ntuple available for this experiment contains entries of event interpretations as Cascades.  These events differ from the Omega's in that Omega- decays to Lambda and K- while Cascade - decays to Lambda and pi-.  This ntuple is in the area
           To access this enter the following unix command:

           ln -s /home/hepdisk3/briandata/omega.paw   xi_data

  The table of contents is similar to that for Omegas except that the names of the variables often contain an "X" rather than an "O".
 


Tutorial Using the Omega paw file


NOTE - In the following examples of PAW commands:
  • PAW does not distinguish upper and lower case
  • Commands may exceed the screen's line width.  They automatically wrap onto next line.
  • In addition to the Exercises below, devise some of your own.
  • HELP <command> will display a file on your screen in an editor.  To read the file, these commands are used:
  •  
    It is recommended you leave the CURSOR on the bottom (command) line of the HELP screen.  "Enter" will place it there.
    To EXIT the HELP screen F3, <Esc> 3, or qq<Return>
    To go to next page F7 or <Esc> 7
    To go to next page F8 or <Esc> 8

    Login to the UNIX network as advlab1 using the password provided and go to the directory where the ntuple is:

    Run the paw program:

    Open the ntuple file for input:

    Print out the contents of the ntuple (you will see all that is in the table above and more):

    Plot the Lambda K- effective mass for Omega- candidates:

    Plot anti-Lambda K+ effective mass for anti-Omega candidates.

    Find out more about "n/pl"

    Make a histogram with 50 bins starting at 1.65 and ending at 1.70

    Now plot the Lambda K effective mass in this histogram:

    Plot the anti-Omega (dashed) superimposed on this histogram.

    Make a printout of these plots:

    Change the appearance of this plot:

    Now plot these two histograms side by side:

    Repeat, but make them appear one above the other:


    Exercise 1:

    Attempt to make a set of cuts on the following quantities which will reduce the background without significantly reducing the signal.  (Some signal loss is inevitable, but hold it to less than 10% if you can while reducing the background by at least a factor two.)
  • CATK
  • DIPO
  • KCK
  • NUMOM

  • Exercise 2:

    Now look at the Cascade ntuple.

  • Open the cascade file and close the omega one:
  • PAW > close 1                       ! Close the Omega file
  • PAW > h/file 1 xi_data            ! and open the cascade one.
  • Make a histogram ID 300 with 90 bins starting at 1.300 ending at 1.345 and plot Lambda+pi effective mass ("XM") for the two disjoint samples:
  • ISMD>0  (ie cascade decays downstream from the SMD's) and
  • ISMD<0  (ie cascade decays before end of SMD's).

  • Make these plots side by side on the screen.
    Q: What do you notice?  Does this make sense?
  • Repeat the above, but in each histogram superimpose also the subsamples with CATK>1000.
  • Q: What do you notice?  Does this make sense?

    Fitting

    What a fit is:

    PAW is able to fit histograms with any function f(x;a) where x is the abscissa of the histogram and a is a set of parameters which are required in the definition of f.

    If the quantity x is histogrammed and f(x;a) is the form it should take   [ie the number of entries y(x) in the bin from x to x+delta_x should be F(x; a) = Integral f(x;a)dx from x to x+delta_x ]  then a "fit" will adjust all the a's so as to minimize the quantity:

    where sigma_y is the uncertainty on y(x).   (PAW will take this to be the square root of y(x) - the uncertainty expected for this number of entries from Poisson statistics.)

    A variation of this ("minimum Chi squared") method involves maximizing the quantity:

    This is the "Log-Likelihood" method.  This works if the function is defined in such a way that its integral over the x range of the histogram does not depend upon the parameters a.

    What kinds of fits PAW is capable of:

    PAW is equipped to use either minimum chi squared or maximum likelihood methods to make fits (see the HELP file on h/fit).  The form for f(x;a ) can be taken from a choice of three standard forms:
  • polynomial in x;
  • exponential in x or
  • Gaussian in x

  • with parameters a (see the example below) to define the precise form.  In this example, a further feature of PAW - its ability to combine the first two of the above forms - is exploited.

    PAW can also fit a user defined function.  This requires that you write a FORTRAN function and save it as a file.  An example of this can be seen in the file in

    ~brian/E791/Cascade/xeval.kumac

    where a number of PAW commands are stored as well as the FORTRAN function to define a Breit-Wigner peak shape upon a polynomial background.  This is a more advanced application of PAW.
     

    Look for further HELP on this subject in PAW as follows:

    If you have any problem, type help and follow "Histogram" (4) and "Fit" (9)

    Make a fit to a Polynomial background of order 2 plus a Gaussian:

    Clearly this requires 6 parameters and here is how you define them: This creates a REAL array "a" dimension 6 with data 1.0, 1.0, etc. (i.e . a(1)=a(2)=a(3)=a(4)=1.0, a(5)=1.672, a(6)=0.004).  The first 4 values are not important to PAW as it will easily fit the polynomial parameters and the fraction of Gaussian on top.  The last two numbers are the Mass and Resolution for the Omega peak respectively, and it is necessary to give PAW a good start on these values.
     

    Now do the FIT:

    That is all there is to it!
     


    Exercise 3:

  • Find the Mass and resolution for Cascade minus and compare them for anti-Cascade.
  • Repeat for cleaner Omega and anti Omega samples using the "cuts":
  • Try the "disjoint sample" having
  • Q: What do you see?  Explain it in terms of the spectrometer layout and where the SMD's are.


    Using the Results of a Fit

    In the above example, the results of the fit are stored in the PAW vector "a".  Suppose you want to compute the area under the peak (as opposed to those in the background part.)  From the context of equation (1) above we see that a(5) is the mean (or peak) value for the Gaussian and a(6) is the standard deviation.  The area under the Gaussian is a(4)*a(6)*sqrt(2*pi).  To obtain this number:
  • PAW > mess $sigma(a(6)*a(4)*sqrt(2*pi))
  • In this command the "mess" instruction asks PAW to print a message.  The $sigma() part tells a subsystem (named "sigma") to evaluate the expression within the parentheses.  To obtain more information on "sigma" type
  • PAW > help sigma      !  (or look at the CERN tutorial.)
  • When you do as above, you will be surprised to find that the result is much smaller than you would expect from looking at your plot.  The "mess" command will result in a number like 7 while it appears from the plot as if there are thousands of events in the peak.  The problem is that to obtain the number of events, the curve needs to be divided by the bin size of the histogram.  (Why is this?)

    So - you will need the binsize for histogram ID 100.  PAW is capable of providing information about any plot.  Various functions "$HINFO" give this information and you will find these with

  • PAW > help k/functions
  • You will see a variety of functions, but the one you really need - binsize - is not directly available.  You can however compute it from:
  • PAW > mess $sigma(($hinfo(100,'xmax')-$hinfo(100,'xmin'))/$hinfo(100,'xbins'))
  • (Note that PAW is not case sensitive like unix.  Also note the balanced parentheses.)
    Now see what the fit you made says of the number of events in the peak:
  • PAW > mess $sigma(a(4)*a(6)*sqrt(2*pi))*$hinfo(100,'xbins')/($hinfo(100,'xmax')-$hinfo(100,'xmin'))
  • (This command should all be on one line.)  You will probably obtain a result like
       7.072*50/(1.65-1.70)
    and this is not quite what you want!  So try
  • PAW > mess $eval($sigma(a(4)*a(6)*sqrt(2*pi))*$hinfo(100,'xbins')/($hinfo(100,'xmax')-$hinfo(100,'xmin')))
  • Finally, the result will now be the number of events in the peak!

    More Complicated Example:

    Suppose you want to compute the amount of background in the plot.  Clearly this is the integral from low to high abscissae of the second order polynomial.

    PAW is a bit clumsy, so it will be convenient for various reasons to make use of another vector "c" as follows:

  • PAW > v/cre c(3) r
  • PAW > v/input c(1) $hinfo(100,'xmin')
  • PAW > v/input c(2) $hinfo(100,'xmax')
  • PAW > v/input c(3) $hinfo(100,'xbins')
  • Make sure you have set up the plot and made the fit:
  • PAW > n/pl 2000.om chgo<0 -100
  • PAW > v/cre b(6) r 0 0 0 0 1.672 0.004
  • PAW > h/fit 100 $KUMAC/bwfit.for ! 6 b
  • Now do the background calculation:
  • PAW > v/input c(4) $sigma(b(1) + 1/2*b(2)*(c(2)+c(1)) + 1/3*b(3)*(c(2)**2+3*c(1)*c(2)+c(1)**2))

  • PAW > mess $eval(c(3)*c(4))

    Exercise 4:  (Write this up)

  • Find the asymmetry between production of Omega and anti Omega.
  • Find the asymmetry between production of Cascade and anti Cascade:
  • Comment on the two results.

  •  
  • Use the errors from the fit to estimate the uncertainties in these asymmetries.

  •  
  • Investigate the effects of the cuts on the uncertainties.  Define three sets of cuts:

  • More Complicated Fits and examples - FORTRAN.

    Sometimes it is necessary to make a complex calculation.  Perhaps a fit requires a function different from those provided by the "h/fit" command.  For example we may wish to fit a Breit-Wigner peak rather than a Gaussian to the Omega- in histogram ID 100 of the example above.

    This can be done using a FORTRAN subroutine "BWFIT(M)" defined in a file "$KUMAC/bwfit.for" on your disk.  To use this to make the required fit:
     

  • First define the parameters with starting values.  These must be REAL and correspond in order defined to those in the function to be fit (see the file below.)
  • PAW > v/cre b(6) r 0 0 0 0 1.672 0.004
  • Now make the fit
  • PAW > h/fit 100 $KUMAC/bwfit.for ! 6 b
  • The vector "b" will now contain optimised values for the parameters.  As you will now see, b(5) is the Breit-Wigner peak mass and b(6) is the "natural width" - the full width at half maximum of the peak shape.

     The file contains the FORTRAN code
     

          FUNCTION BWFIT(M)
    C
    C  Functional form of linear background and Breit-Wigner peak.
    C
          REAL M
          COMMON/PAWPAR/ B(6)       ! Parameters from PAW
    C                                                       ! (created by "v/cre B(6) r <values>")
    C
    C  Quadratic Polynomial Background.
            BACKGROUND  = B(1) + B(2)*M + B(3)*M**2
    C
    C  Breit-Wigner Peak times A(4).
            BREITWIGNER = B(4)*B(5)*B(6)/((M**2-B(5)**2)**2+(B(5)*B(6))**2)
    C
    C  The combination.
            BWFIT = BACKGROUND + BREITWIGNER
    C
          RETURN
          END
    This uses parameters in vector B (dimension 5) and evaluates the second order polynomial background and the Breit-Wigner peak and forms a linear combination of them in BWFIT at an arbitrary value of Lambda K- mass M.  This function is called during the fitting process probably thousands of times while the parameters in B are being optimised.

    Example Using a FORTRAN Routine:

    Perhaps we wanted the background within an arbitrary range "S" of the peak.
  • PAW > call backg.for(0.01)
  • This will use the code in file $KUMAC/backg.for to compute background within 0.01 GeV from the peak.  The file contains the code:
          SUBROUTINE BACKG(S)
    C
    C  Compute the background under the peak.
    C
          REAL S, LOW, HIGH, DX
          VECTOR B          ! Special decaration of vector created in PAW by
    C                         "PAW > v/cre A(5) ...etc."
    C                       ! (now optimised by the FIT)
          VECTOR C(3)       ! Declaration for the vector containing the
    C                         low x, high x and number of bins respectively.
    C
    C  Quadratic Background is B(1) + B(2)*M + B(3)*M**2
    C  Integrate from peak-S to peak+S
          LOW = B(5)-S
          HIGH = B(5)+S
    C  Limit the limits to be within the plot boundary:
          IF(LOW.lt.C(1) .or. S.eq.0.0) LOW = C(1)
          IF(HIGH.gt.C(2) .or. S.eq.0.0) HIGH = C(2)
          DX = (C(2)-C(1))/C(3)
          BACKG = (B(1)*(HIGH-LOW)
         A      + 0.5*B(2)*(HIGH**2 - LOW**2)
         A      + 1/3*B(3)*(HIGH**3 - LOW**3)) / DX
          PRINT *, 'Background has', BACKG,' events from',LOW,' to',HIGH
    C
          RETURN
          END

    Plotting Functions

    PAW will draw arbitrary functions which may be defined on the command line:
  • PAW > fun/pl (5*cos(x)+3*sin(x))*exp(-.5*x) 0 10
  • to plot the indicated function in the range from x=0 to x=10.
  • PAW > fun/pl $eval(b(1))+$eval(b(2))*x+$eval(b(3))*x**2 $hinfo(100,'xmin') $hinfo(100,'xmax') s
  • will plot the background superimposed on the last plot drawn from low to high abscissa.
    [Note that PAW is clumsy and does not allow use of vector elements in formulae unless they are evaluated by a $eval()]

    Functions can also be drawn using fortran routines to define them:

  • PAW > fun/pl $KUMAC/backg.for(x) 0.001  0.1
  • which should plot the computed background within limits from 1 MeV up to 100 MeV (actually the whole plot) from the peak value..

    Exercise 5:

  • Use the Cascade sample and plot "XM" for the selection "CHGO<0" in a histogram with 50 bins from 1.3 GeV to 1.35 GeV.
  • Make a fit with Gaussian peak on linear background.
  • Use the fit to determine how many background events there are.  Print this number with the number of signal events printed on the same line.

  • [The number of events in the signal can be taken to be the $hinfo(id,'SUM') minus the background.]
  • Now repeat the sequence above making the selection CATK<1000.  (Use the up arrow - perhaps the <CTRL P> key!)
  • Try other selections to find how effective they are in reducing the background while preserving the signal.



  •  
    Prof Brian T. Meadows 
    Physics Department, ML 11 
    Universtiy of Cincinnati 
    Cincinnati, OH 45221-0011 
    tel:    513 556-0531 
    fax:   513 556-3425 
    E-mail: Brian.Meadows@UC.Edu