# The Chinese Restaurant Process

Posted on 25 Feb 2014

The Chinese Restaurant Process is a prior for the likelihoods of clustered data. Let $$\mathbf{z}$$ be the cluster assignments of users. Let $$K$$ be the number of clusters and let $$\boldsymbol{\pi}$$ be a vector of size $$K$$, indicating the probability of being assigned to cluster $$k$$. If we put a Dirichlet prior on $$\boldsymbol{\pi}$$ , we have:

\begin{align} \mathbf{z} &\sim \text{Multinomial}(\boldsymbol{\pi})\notag\\ \boldsymbol{\pi} &\sim \text{Dirichlet}(\alpha) \end{align}

Assuming that $$\alpha$$ is a fixed parameter, the joint probability of $$\mathbf{z}$$ and $$\boldsymbol{\pi}$$ is:

\begin{align} p(\mathbf{z}, \boldsymbol{\pi}) = p(\mathbf{z} | \boldsymbol{\pi})p(\boldsymbol{\pi} | \alpha) \end{align}

If we integrate out $$\mathbf{\pi}$$, we get the marginal probability of $$\mathbf{z}$$:

\begin{align} p(\mathbf{z}) &= \int_{\boldsymbol{\pi}} p(\mathbf{z} | \boldsymbol{\pi}) p (\boldsymbol{\pi} | \alpha) = \int_{\boldsymbol{\pi}} \prod_{i=1}^K \pi_i^{n_i} \frac{1}{B(\alpha)} \prod_{j=1}^K \pi_j^{\alpha-1} \end{align}

Renaming $$\alpha+n_i$$ as $$\alpha_i'$$, multiplying and dividing by $$B(\boldsymbol{\alpha'})$$, and rearranging the elements so that it is visually clear, we get:

\begin{align} p(\mathbf{z})= \frac{ B(\boldsymbol{\alpha'}) }{ B(\alpha) } \int_{\boldsymbol{\pi}} \frac{1}{B(\boldsymbol{\alpha'})} \prod_{i=1}^K \pi_i^{\alpha_i' - 1} \end{align}

where the value of the integral is 1, since the factor inside is a Dirichlet density function. Expanding the definition of the Beta function, we get:

\begin{align} p(\mathbf{z})= \frac{ \prod_{i=1}^K \Gamma(\alpha_i + n_i) } { \Gamma \left(\sum_{i=1}^K \alpha_i + n_i \right) } \frac{ \Gamma \left(\sum_{i=1}^K \alpha_i \right) } { \prod_{i=1}^K \Gamma(\alpha_i) } \end{align}

Note that the presence of the counts $$n_i$$ makes it impossible to factorize $$p(\mathbf{z})$$ into the product of the individual $$z$$. However, we can derive the conditional probability of one $$z_u$$ given all the other assignments:

\begin{align} p(z_u = j| \mathbf{z_{-u}}) = \frac{p(\mathbf{z})} {p(\mathbf{z_{-u}})} \end{align}

To compute the denominator we assume cluster assignments are exchangeable, that is, the joint distribution $$p(\mathbf{z})$$ is the same regardless the order in which clusters are assigned. This allows us to assume that $$z_u$$ is the last assignment, therefore obtaining $$p(\mathbf{z_{-u}})$$ by considering how was the equation $$p(\mathbf{z})$$ before $$z_u$$ was assigned to cluster $$j$$.

\begin{align} p(\mathbf{z_{-u}}) = \frac { \Gamma(\alpha_j + n_j-1) \prod_{i\neq j}^K \Gamma(\alpha_i + n_i) } {\Gamma \left( \alpha_j + n_j -1 + \sum_{i\neq j}^K \alpha_i + n_i \right) } \frac{ \Gamma \left(\sum_{i=1}^K \alpha_i \right) } { \prod_{i=1}^K \Gamma(\alpha_i) } \end{align}

Plugging $$p(\mathbf{z}_{-u})$$ and $$p(\mathbf{z})$$ into $$p(z_u \vert \mathbf{z}_{-u})$$, we get:

\begin{align} p(z_i = j| \mathbf{z_{-u}}) &\propto \frac{ \prod_{i=1}^K \Gamma(\alpha_i + n_i) } { \Gamma \left(\sum_{i=1}^K \alpha_i + n_i \right) } \frac {\Gamma \left( \alpha_j + n_j -1 + \sum_{i\neq j}^K \alpha_i + n_i \right) } { \Gamma(\alpha_j + n_j-1) \prod_{i\neq j}^K \Gamma(\alpha_i + n_i) }\\ &= \frac{ \Gamma(\alpha_j + n_j) \prod_{i \neq j }^K \Gamma(\alpha_i + n_i) } { \Gamma \left(\sum_{i=1}^K \alpha_i + n_i \right) } \frac {\Gamma \left( \alpha_j + n_j -1 + \sum_{i\neq j}^K \alpha_i + n_i \right) } { \Gamma(\alpha_j + n_j-1) \prod_{i\neq j}^K \Gamma(\alpha_i + n_i) }\\ &= \frac{ \Gamma(\alpha_j + n_j) } { \Gamma \left(\sum_{i=1}^K \alpha_i + n_i \right) } \frac {\Gamma \left( \alpha_j + n_j -1 + \sum_{i\neq j}^K \alpha_i + n_i \right) } { \Gamma(\alpha_j + n_j-1) } \end{align}

Assuming that all $$\alpha_i$$ have the same value $$\alpha$$, and since the sum of all $$n_i$$ is $$U$$:

\begin{align} p(z_i = j| \mathbf{z_{-u}}) &\propto \frac{ \Gamma(\alpha + n_j) } { \Gamma \left(K \alpha + U \right) } \frac {\Gamma \left( K\alpha_j +U -1 \right) } { \Gamma(\alpha_j + n_j -1) } \end{align}

and finally exploiting the fact that $$a \Gamma(a) = \Gamma(a+1)$$:

\begin{align} p(z_i = j| \mathbf{z_{-u}}) &\propto \frac {\alpha + n_j} {K \alpha + U -1} \end{align}

we can reparametrize $$\alpha$$ as $$\alpha/K$$. Then we have the standard result:

\begin{align} p(z_i = j| \mathbf{z_{-u}}) &\propto \frac {\alpha/K + n_j} {\alpha + U -1} \end{align}

where $$n_j$$ is the number of users in cluster $$j$$ before the assignment of $$z_u$$.

The Chinese Restaurant Process is the consequence of considering $$K \rightarrow \infty$$. For clusters where $$n_j>0$$, we have:

\begin{align} p(z_i = j| \mathbf{z_{-u}}) &\propto \frac {n_j} {\alpha + U -1} \end{align}

and the probability of assigning $$z_u$$ to any of the (infinite) empty clusters is:

\begin{align} p(z_i = j| \mathbf{z_{-u}}) &\propto K\frac {\alpha/K} {\alpha + U -1} = \frac {\alpha} {\alpha + U -1} \end{align}

which written in a compact equation is:

\begin{align} p(z_u = j | \mathbf{z_{-u}}, \alpha) = \begin{cases} \frac{n_{-u,j}}{U - 1 + \alpha}& \text{if } n_{-u,j}>0 \\ \frac{\alpha}{U-1 + \alpha}& \text{if } n_{-u,j}=0 \end{cases} \end{align}