Share some knowledge, post study notes or talk nonsense.


Further Properties of Probability Generating Function and its Applications

Abstract: Probability generating Function is the simplest one among all the gernerating functions, yet it is compact and useful in dealing with discrete random variables, especially the sum of independent random variables. This makes it a powerful tool in theory like stochastic process, where there is a need to deal with a collection of independent random variables. Therefore, it is still necessary to explore further properties of p.g.f. Two applications related to simple random walk and
Galton Watson process will also be introduced in this paper.

Keywords: probability generating function; random walk; Galton Watson process;


Assume X is a discrete integer-valued random variable with probability mass function {pk=P(X=k)}k=0\{p_k = P(X=k)\}_{k=0}^{\infty}, its probability generating function (p.g.f) is defined by the following power series: $$P(s):=E\left(s{X}\right)=\sum_{k=0}{\infty} p_{k} s^{k} \quad|s| \leq 1$$. We know that in the discrete case, probability distribution is determined by all the possible values of the random variable and the probability corresponding to those values, which is characterized by the so-called probability mass function. Probability generating function is another way to represent and characterize a probability distribution. From the following properties, we will see that it actually “encodes” everything about the distribution. Meanwhile, it makes some probability calculation easier. This is also the reason why it is a useful tool in the study of branching process or stochastic process. Note that there are other ways to transform a distribution: characteristic function, moment generating function and cumulant generating function. Many properties of probability generating function can also be applied to these functions. But they are not in the scope of this article.


Probability Generating Function to Probability Distribution

If we take k-th derivative of p.g.f PX(s)=k=0pkskP_X(s) = \sum_{k=0}^{\infty} p_{k} s^{k}, we would get:

PX(k)(s)=d(k)PX(s)dsk=k!pks0+c1s1+...+cnsn,ci=constP_X^{(k)}(s) = \frac{d^{(k)}P_{X}(s)}{ds^k} = k!p_ks^0 + c_1s^1 + ... + c_ns^n, c_i = const

PX(k)(0)=d(k)PX(s)dsks=0=k!pk\Rightarrow P_X^{(k)}(0) = \left.\frac{d^{(k)}P_{X}(s)}{d s^{k}}\right|_{s=0} = k!p_k

pk=P(X=k)=1k!PX(k)(0)\Rightarrow p_k = P(X = k) = \frac{1}{k!} P_X^{(k)}(0)

Therefore, we can recover the probability distribution from the probability generating function. The p.g.f alone contains all the information about the original distribution. Since we can already derive all the probabilities of the distribution from p.g.f, it’s reasonable that we can extract properties like variance, expectation or moments of the distribution from p.g.f. . Through simple calculation, we would get (proof is omitted):

  1. E(X)=PX(1)(1)E(X) = P_X^{(1)}(1) (techniquely, it is left derivative)

  2. Var(X)=PX(2)(1)[PX(1)(1)]2+PX(1)(1)Var(X) = P_X^{(2)}(1) - [P_X^{(1)}(1)]^2 + P_X^{(1)}(1)

  3. E{X(X1)(X2)(Xk+1)}=PX(k)(1)\mathbb{E}\{X(X-1)(X-2) \ldots(X-k+1)\}=P_{X}^{(k)}(1)
    (k-th derivative of p.g.f is k-th factorial moment of X. In principle you can recover all the moments from factorial moments.)

Uniqueness Theorem for Probability Generating Function

From section 2.1, we know that p.g.f. determines {pn}\{p_n\}, which is the distribution. Uniqueness Theorem for p.g.f. tells us additionally that it determines a unique distribution.

Uniqueness Theorem: Let X and Y be discrete random variables with probability generating functions PXP_X and PYP_Y, respectively. Then PX(s)=PY(s)P_X(s) = P_Y(s) iff P(X=k) = P(Y=k) for all integers k \geq 0, i.e. the probability generating functions are the same iff the distributions of X and Y are the same.
Proof: If the distributions are the same, by the definition of p.g.f., PX(s)=PY(s)P_X(s) = P_Y(s). If PX(s)=PY(s)P_X(s) = P_Y(s), expand them into power series, PX(s)=k=0P(X=k)sk=PY(s)=k=0P(Y=k)skP_X(s) = \sum_{k=0}^{\infty} P(X=k)s^k = P_Y(s) = \sum_{k=0}^{\infty} P(Y=k)s^k.
It follows that the coefficients of two power series must be the same, thus P(X=k) = P(Y=k) for all intergers k0k \geq 0, which means p.g.f. uniquelly determines a distribution.

Probability Generating Function of the Sum of Independent R.V.s

The main reason why p.g.f is powerful is that it makes things easier in dealing with the sum of independent random variables. It turns convolution into product:
Theorem 1: Assume X1,...,XnX_1, . . . , X_n are independent random variables. Let Y=X1+...+XnY = X_1 + . . . + X_n. Then $$P_{Y}(s)=\prod_{i=1}^{n} P_{X_{i}}(s)$$ .

PY(s)=E(s(X1++Xn))=E(sX1sX2sXn)=E(sX1)E(sX2)E(sXn)  (since  X1,...,Xn  are  independent)\begin{aligned} P_{Y}(s) &=\mathbb{E}\left(s^{\left(X_{1}+\ldots+X_{n}\right)}\right) \\ &=\mathbb{E}\left(s^{X_{1}} s^{X_{2}} \ldots s^{X_{n}}\right) \\ &=\mathbb{E}\left(s^{X_{1}}\right) \mathbb{E}\left(s^{X_{2}}\right) \ldots \mathbb{E}\left(s^{X_{n}}\right) \; (since \; X_1, ..., X_n \; are \; independent) \end{aligned}
PY(s)=i=1nPXi(s).\Rightarrow P_Y(s) =\prod_{i=1}^{n} P_{X_{i}}(s) . \quad

If we don’t know how many random variable are in the sum, i.e. n is not fixed, instead, it is itself a random variable (say N), then we have the following theorem:
Theorem 2: Let X1,X2,...X_1, X_2, . . . be a sequence of i.i.d random variables with common p.g.f. PX(s)P_X(s). Let N be a random variable, independent of XiX_i, with p.g.f PN(s)P_N(s), and let SN=X1+...+XNS_N = X_1+. . .+X_N. Then $$P_{S_N}(s) = P_N(P_X(s))$$

PSN(s)=E(sSN)=E(sX1++XN)=EN{E(sX1++XNN)}( law of total probability for expectations )=EN{E(sX1sXNN)}=EN{E(sX1sXN)}(Xi are independent of N)=EN{E(sX1)E(sXN)}(Xi are independent of each other)=EN{(PX(s))N}(by definition of PX)=PN(PX(s))(by definition of PN)\begin{aligned} P_{S_{N}}(s) &=\mathbb{E}\left(s^{S_{N}}\right)=\mathbb{E}\left(s^{X_{1}+\ldots+X_{N}}\right) \\ &=\mathbb{E}_{N}\left\{\mathbb{E}\left(s^{X_{1}+\ldots+X_{N}} \mid N\right)\right\} \quad(\text { law of total probability for expectations }) \\ &=\mathbb{E}_{N}\left\{\mathbb{E}\left(s^{X_{1}} \ldots s^{X_{N}} \mid N\right)\right\} \\ &=\mathbb{E}_{N}\left\{\mathbb{E}\left(s^{X_{1}} \ldots s^{X_{N}}\right)\right\} \quad\left(X_{i} \text { are independent of } N\right) \\ &=\mathbb{E}_{N}\left\{\mathbb{E}\left(s^{X_{1}}\right) \ldots \mathbb{E}\left(s^{X_{N}}\right)\right\} \quad\left(X_{i}\right.\text{ are independent of each other})\\ &=\mathbb{E}_{N}\left\{\left(P_{X}(s)\right)^{N}\right\} \quad\text{(by definition of }\left.P_{X}\right)\\ &=P_{N}\left(P_{X}(s)\right) \quad\text{(by definition of }\left.P_{N}\right) \end{aligned}

This theorem helps us to compute the p.g.f. (equivalently, finding the distribution) of the sum of i.i.d. R.V. when the number of random variables is random.
Note that we could get the basic version of Wald’s Identity using the above result: $$\begin{aligned}
P_{S_N} &= P_N \circ P_X\
P_{S_N}^{’}(s) &= P_X^{’}(s) {P_N^{’}(P_X(s))}\
P_{S_N}^{’}(1) &= P_X^{’}(1) {P_N^{’}(P_X(1))}\
E(S_N) &= E(N)E(X) = E(N)E(X_1)\end{aligned}$$

Convergence of Probability Generating Function

Note that in the definition of probability generating function, it simply assumes that s1|s| \leq 1 (the radius of convergence = 1) to ensure the power series will converge. Though it is a sufficient condition for the convergence of p.g.f, it is not necessary. In many cases, the radius of convergence can > 1. We now explore the convergence property of p.g.f. starting from some simple facts.

Theorem 1: p.g.f. must converge (point-wise) for at least$ s \in [-1,1]$.
Proof: Suppose p.g.f PX(s)=k=0pkskP_X(s) = \sum_{k=0}^{\infty} p_{k} s^{k}.
1s1\forall -1 \leq s \leq 1, series {sk}\{s^k\} is monotonous and bounded,
and k=0pk\sum_{k=0}^{\infty} p_k converges, so by Albel’s test, series
k=0pksk\sum_{k=0}^{\infty} p_{k} s^{k} converges.
Remark: The radius of convergence for p.g.f is at least 1.

Theorem 2: Suppose p.g.f. converges on (-R, R), i.e., the radius of convergence is R, then p.g.f. converges uniformly on any closed set [-r,r] where 0 < r < R.
Proof: 0<r<R\forall 0 < r < R, k=0pkrk\sum_{k=0}^{\infty} |p_{k} r^{k}| converges. So pkxkpkrk|p_kx^k| \leq |p_kr^k| when xr|x| \leq r. By Weierstrass’s Theorem, k=0pkxk\sum_{k=0}^{\infty} p_{k} x^{k} converges uniformly on [-r,r].
Remark: If p.g.f. converges on a closed set [-R, R], then it must converges uniformly on [-R, R]. Let R = 1, then we can get that p.g.f. must converge uniformly on [-1, 1].

Theorem 3: Suppose p.g.f. converges on (-R, R), then p.g.f. is continuous on (-R, R).
Proof: For any x0(R,R)x_0 \in (-R, R), there must exist r>0r > 0, s.t. x0[r,r](R,R)x_0 \in [-r,r] \subset (-R,R). Since p.g.f. converges uniformly on [-r,r], so p.g.f. is continuous at x0x_0. Since x0x_0 can be chosen arbitrarily in (-R,R), so p.g.f. is continuos at every point in (-R,R).

Theorem 4: Suppose p.g.f converges at the right (left) end of interval (-R,R), then p.g.f. is left (right) continuos at this end.
Proof: Without loss of generality, assume p.g.f. converges at x = R.
We only need to prove k=0pksk\sum_{k=0}^{\infty} p_ks^k converges uniformly on [0,R].
k=0pksk=k=0pkRk(sR)k\sum_{k=0}^{\infty} p_ks^k = \sum_{k=0}^{\infty} p_kR^k(\frac{s}{R})^k,
because k=0pkRk\sum_{k=0}^{\infty} p_kR^k converges, {(sR)n}\{(\frac{s}{R})^n\} monotonically decreasing and uniformly bounded, so by Abel’s test, k=0pksk\sum_{k=0}^{\infty} p_ks^k converges uniformly on [0, R].
Remark: Let R = 1, by the theorem we can get that PX(1)=PX(1)P_X(1) = P_X(1^-), and PX(1)=PX(1+)P_X(-1) = P_X(-1^+), i.e., p.g.f. is continuous at both ends of (-1, 1). If R > 1, p.g.f. may not be continuos on its ends.

Theoretical Applications

The first reaching time in simple random walk

Model construction:

Random walk is a Stochastic Process that describes a path that consists a succession of random steps on some mathematical space. Probability generating function is especially useful in this area due to its good property in computing independent sum of random variables. Here we only consider one dimensional random walk on the real line and the length of each step is a constant (assume is 1). Let SnS_n denotes the position of the object at the n-th step. X1,...,XnX_1, ..., X_n denotes each step’s behavior, i.e., Xi=1X_i = 1 if at i-th step the object moves right on the axis and Xi=1X_i = -1 if moves left. The probability that the object will move right at each step is p.

The p.g.f. (distribution) of the first reaching time:

In this section, we aim to use probability generating funciton to find the distribution of reaching times of the object. The reaching time is defined as Ti,j:=T_{i,j} := “the first time to reach position j from position i”. We will concentrate on the situation when i = 0, j = 1, and assmue its distribution is characterized by p.m.f pn=P(T0,1=n)p_n = P(T_{0,1} = n), then the goal is to find {pn}\{p_n\}, which is the distribution of the random varible T0,1T_{0,1}.

Firstly, p1=1p_1 = 1 is obvious. When n > 1, the object must have to go left to position -1 first and then reach 1 through the following (n-1) steps. Additionally, we can decompose those (n-1) steps into two phases: moving from -1 to 0, then moving from 0 to 1. So by the law of total probability we can get: $$\begin{aligned}
p_n &= \sum_{j=1}^{n-2} qP(\text{j steps to first reach 0 from -1" andn-j-1 steps to first reach 1 from 0")}\
&= q\sum_{j=1}^{n-2} P(T_{-1,0}=j \quad and \quad T_{0,1} = n-1-j) \end{aligned}$$
Intuitively, we can sense that the event “j steps to first reach 0 from -1” won’t have any effect on the event “n-i-j steps to first reach 1 from 0”, which means for any given j P(T0,1T_{0,1}=n-1-j) = P(T0,1T_{0,1}=n-1-j | T1,0T_{-1,0}=j). So it’s reasonable to regard two events as independent. This is the regeneration property of random walk.
With independence, we can get: $$\begin{aligned}
p_{n} &= q\sum_{j=1}^{n-2} P(T_{-1,0}=j)P(T_{0,1} = n-1-j)\
&= q\sum_{j=1}^{n-2} P(T_{0,1}=j)P(T_{0,1} = n-1-j)\
&= q \sum_{j=1}^{n-2} p_{j} p_{n-j-1}, \quad n \geq 3, \quad p_{0}=0, \quad p_{1}=p, \quad p_{2} = 0\end{aligned}$$
It is hard to solve distribution pnp_n from the above equation since it contains a form of convolution. But we can use gernerating function method to convert it into product to reduce complexity of computation:

by definition of p.g.f. , PX(s)=k=0pksk(PX(s))2=k=0(i=0kpipki)ski=0kpipki=p0pk+pkp0+i=1(k+1)2pip(k+1)i1,k2=pk+1q,k2q(PX(s))2=k=2pk+1skqs(PX(s))2=k=2pk+1sk+1=k=0pksksp=PX(s)spsolving the quadratic equation, PX(s)=114pqs22qs,s12pq\begin{aligned} \text{by definition of p.g.f. , } P_X(s) &= \sum_{k=0}^{\infty} p_ks^k\\ \Rightarrow (P_X(s))^2 &= \sum_{k=0}^{\infty}\left(\sum_{i=0}^{k} p_{i} p_{k-i}\right) s^{k}\\ \sum_{i=0}^{k} p_ip_{k-i} &= p_0p_k + p_kp_0 + \sum_{i=1}^{(k+1)-2} p_ip_{(k+1)-i-1}, \quad k \geq 2\\ &= \frac{p_{k+1}}{q}, \quad k \geq 2\\ \Rightarrow q(P_X(s))^2 &= \sum_{k=2}^{\infty} p_{k+1} s^k\\ qs(P_X(s))^2 &= \sum_{k=2}^{\infty} p_{k+1}s^{k+1} = \sum_{k=0}^{\infty} p_ks^k - sp = P_X(s) - sp\\ \text{solving the quadratic equation, }\Rightarrow P_X(s) &= \frac{1-\sqrt{1-4 p q s^{2}}}{2 q s}, \quad |s| \leq \frac{1}{2 \sqrt{p q}}\end{aligned}

By now we successfully get the p.g.f. of T0,1T_{0,1}. From section 2 above, we know that p.g.f. characterizes a unique distribution of a random variable, and we can extract useful information only through p.g.f. without calculating its “real distribution” {pn}\{p_n\}. There are two major information concerned in this random walk example.

The probability that the object will never reach 1 from 0:

Since p.g.f. is defined as PX(s)=k=0pksk  (s1)P_X(s) = \sum_{k=0}^{\infty} p_ks^k \; (|s| \leq 1), PX(1)=k=0pk=P(X<)P_X(1) = \sum_{k=0}^{\infty} p_k = P(X < \infty), thus if random variable X can take value \infty, i.e., P(X=)>0P(X=\infty) > 0, then PX(1)=1P(X=)<1P_X(1) = 1 - P(X = \infty) < 1, and vice versa. So to determine if a random variable can take value \infty, one way is to check if PX(1)P_X(1)
is < 1.

In this case, event “the object may never reach 1 from 0” is equivalent with P(T0,1=)>0P(T_{0,1}=\infty) > 0, so we only need to examine the value of PX(1)P_X(1). If PX(1)=1P_X(1) = 1, P(T0,1=)=0P(T_{0,1}=\infty) = 0; Else,
P(T0,1=)>0P(T_{0,1}=\infty) > 0:
PX(1)=114pq2q=1pq2q={1,p12pq<1,p<12P_X(1)=\frac{1-\sqrt{1-4 p q}}{2 q}=\frac{1-|p-q|}{2 q}=\left\{\begin{array}{ll} 1, & p \geq \frac{1}{2} \\ \frac{p}{q} < 1, & p<\frac{1}{2} \end{array}\right..
So the conclution is: the object will definitely reach position 1 from position 0 when p0.5pqp \geq 0.5 \Leftrightarrow p \geq q, no matter how long it takes; but when p<0.5p<qp < 0.5 \Leftrightarrow p < q, it is possible that the object will never reach 1 from 0, with probability 1pq1-\frac{p}{q}.

Expected time (number of steps) to first reach 1 from 0:

Since we can easily extract expectation (first-order moment) of a random variable from its p.g.f., so we again apply generating function method to find expected time to reach 1 from 0, i.e., E(T0,1)E(T_{0,1}). It should be emphasized that in this case E(T0,1)E(T_{0,1}) may not equal to PT0(1)P_{T_0}^{\prime}(1). This is because T0,1T_{0,1} may take value of infinity but p.g.f. only defines for finite values.

  1. If p<0.5p < 0.5, then P(T0,1=)>0P(T_{0,1} = \infty) > 0, so by definition of
    expection we have E(T0,1)=E(T_{0, 1}) = \infty.

  2. If p=0.5p = 0.5,
    PX(s)=2p14pqs2114pqs22qs2P_X^{\prime}(s)=\frac{2 p}{\sqrt{1-4 p q s^{2}}}-\frac{1-\sqrt{1-4 p q s^{2}}}{2 q s^{2}},
    E(T0,1)=lims1PX(s)=lims1(11s211s2s2)=+E(T_{0,1}) = \lim _{s \rightarrow 1^-} P_X^{\prime}(s)=\lim _{s \rightarrow 1^-}\left(\frac{1}{\sqrt{1-s^{2}}}-\frac{1-\sqrt{1-s^{2}}}{s^{2}}\right)=+\infty.
    This means even if the object will definitely reach 1, the expected
    reaching time, however, is \infty.

  3. If p>0.5p > 0.5,
    E(T0,1)=lims1PX(s)=PX(1)=1pqE(T_{0,1}) = \lim _{s \rightarrow 1^-} P_X^{\prime}(s) = P_X^{\prime}(1) = \frac{1}{p-q}.

The population size in Galton Watson process

Model Construction:

At 0-th generation, there is one individual. It reproduces offsprings and then dies, these offsprings are then belong to 1-th generation. Afterwards, each of these offsprings can continue the same reproductionprocess, producing more and more generations.

Now define ZnZ_n as the population size at n-th generation (i.e. the number of individuals at n), clealy Z0=1Z_0 = 1. Y is the number of offsprings of an individual. Each individual’s family size (i.e. number of offsprings) Y1,Y2,...  i.i.d.YY_1, Y_2, ... \; i.i.d. \sim Y.

The p.g.f. (distribution) of ZnZ_n:

To know the information of n-th generation, first consider (n-1)-th generation. Assume each individual in (n-1)-th generation is labelled 1,...,Zn11, ..., Z_{n-1}, and their number of offsprings is given by Y1,...,YZn1Y_1, ..., Y_{Z_{n-1}}. Then we can get a formula: Zn=i=1Zn1YiZ_n = \sum_{i=1}^{Z_{n-1}} Y_i. This means ZnZ_n is the sum of i.i.d. random variables with random quantity. Section 2.3 shows that probability generating function will be a useful tool in dealing with this kind of problem.

Let PY(s)P_Y(s), PZn(s)P_{Z_n}(s), PZn1(s)P_{Z_{n-1}}(s) be the p.g.f. of random
variables Y,Zn,Zn1Y, Z_{n}, Z_{n-1}.
Since Zn=i=1Zn1YiZ_n = \sum_{i=1}^{Z_{n-1}} Y_i, by theorem 2 from section 2.3,
we can find a recursion formula for the p.g.f. of ZnZ_n:
PZn(s)=PZn1(PY(s))P_{Z_n}(s) = P_{Z_{n-1}}(P_Y(s)).
Continue iterating, we will get p.g.f. for ZnZ_n:
PZn(s)=PY(PY(PY(PY(s)))n times )P_{Z_n}(s)=\underbrace{P_Y(P_Y(P_Y(\ldots P_Y(s) \ldots))}_{n \text { times }}).
Note that Z1=YZ_1 = Y, then we have the following important recursion furmula.
Branching process recursion formula:

PZn(s)=PZ1(PZ1(PZ1(PZ1(s)))n times )=PZn1(PZ1(s))=PZ1(PZn1(s))P_{Z_n}(s)=\underbrace{P_{Z_1}(P_{Z_1}(P_{Z_1}(\ldots P_{Z_1}(s) \ldots))}_{n \text { times }}) = P_{Z_{n-1}}(P_{Z_1}(s)) = P_{Z_1}(P_{Z_{n-1}}(s))

Remark: In many cases it’s not easy to solve the explicit expression of p.g.f. from the above formula when n is large, but for any distribution Y we can use it to work out the mean and variance of ZnZ_n.

Mean and variance of ZnZ_n:

Now we proceed to extract information about the distribution of ZnZ_n using p.g.f. and some of the consequences obtained in previous sections.

Theorem 1 (Mean): Let E(Y)=μE(Y) = \mu, if μ<\mu < \infty, then E(Zn)=μnE(Z_n) = \mu^n.
Proof: Since PZn(s)=PZn1(PY(s))P_{Z_n}(s) = P_{Z_{n-1}}(P_Y(s)), derivative on both sides (or directly use Wald’s Identity introduced previously) will get: E(Zn)=E(Y)E(Zn1)=μE(Zn1)=μ2E(Zn2)=...=μnE(Z_n) = E(Y)E(Z_{n-1}) = \mu E(Z_{n-1}) = \mu^2 E(Z_{n-2}) = ... = \mu^n.
Remark: In natural language, if each individual is expected to reproduce μ\mu offsprings, then it is expected to have μn\mu^n individuals at n-th generation. This implicates a fact that if μ>1\mu > 1, the population size will grow exponentially by generation; if μ<1\mu < 1, the population size will decrease exponentially by

Theorem 2 (Variance): Let E(Y)=μE(Y) = \mu, Var(Y)=σ2Var(Y) = \sigma^2, if u<\ u < \infty and σ2<\sigma^2 < \infty, then
Vn=Var(Zn)={σ2n if μ=1σ2μn1(1μn1μ) if μ1V_n = \operatorname{Var}\left(Z_{n}\right)=\left\{\begin{array}{cl} \sigma^{2} n & \text { if } \mu=1 \\ \sigma^{2} \mu^{n-1}\left(\frac{1-\mu^{n}}{1-\mu}\right) & \text { if } \mu \neq 1 \end{array}\right.

Zn=i=1Zn1YiVar(Zn)={E(Y)}2×Var(Zn1)+Var(Y)×E(Zn1)Vn=μ2Vn1+σ2E(Zn1)Vn=μ2Vn1+σ2μn1\begin{aligned} Z_{n} &=\sum_{i=1}^{Z_{n-1}} Y_{i} \\ \Rightarrow \operatorname{Var}\left(Z_{n}\right) &=\left\{\mathbb{E}\left(Y\right)\right\}^{2} \times \operatorname{Var}\left(Z_{n-1}\right)+\operatorname{Var}\left(Y\right) \times \mathbb{E}\left(Z_{n-1}\right)\\ \Rightarrow \quad V_{n} &=\mu^{2} V_{n-1}+\sigma^{2} \mathbb{E}\left(Z_{n-1}\right) \\ \Rightarrow \quad V_{n} &=\mu^{2} V_{n-1}+\sigma^{2} \mu^{n-1} \end{aligned}
It is then easy to get the result.


This article mainly covers all the important properties of probability generating function and uses two small examples to illustrate how p.g.f. works and why it is useful. The two examples introduced are just two basic and fundamental topics in the theory of random walk and branching process. If dig further, we can still use generation function method to analyze some classic problems like the Gambler’s Ruin or the Extinction Problem. These theoretical results can be applied in many aspects of the real world like in Demography or Econometrics.

References:\_function\_process\#Extinction\_problem\_for\_a\_Galton\_Watson\_process\_random\_walks\_color.pdf\sim fewster/325/notes/ch6.pdf\sim fewster/325/notes/ch4.pdf\sim chance/teaching\_aids/books\_articles/probability\_book/Chapter10.pdf