# The Chinese Restaurant Process

Posted on 25 Feb 2014

The Chinese Restaurant Process is a prior for the likelihoods of clustered data. Let $\mathbf{z}$ be the cluster assignments of users. Let $K$ be the number of clusters and let $\boldsymbol{\pi}$ be a vector of size $K$, indicating the probability of being assigned to cluster $k$. If we put a Dirichlet prior on $\boldsymbol{\pi}$ , we have:

Assuming that $\alpha$ is a fixed parameter, the joint probability of $\mathbf{z}$ and $\boldsymbol{\pi}$ is:

If we integrate out $\mathbf{\pi}$, we get the marginal probability of $\mathbf{z}$:

Renaming $\alpha+n_i$ as $\alpha_i'$, multiplying and dividing by $B(\boldsymbol{\alpha'})$, and rearranging the elements so that it is visually clear, we get:

where the value of the integral is 1, since the factor inside is a Dirichlet density function. Expanding the definition of the Beta function, we get:

Note that the presence of the counts $n_i$ makes it impossible to factorize $p(\mathbf{z})$ into the product of the individual $z$. However, we can derive the conditional probability of one $z_u$ given all the other assignments:

To compute the denominator we assume cluster assignments are exchangeable, that is, the joint distribution $p(\mathbf{z})$ is the same regardless the order in which clusters are assigned. This allows us to assume that $z_u$ is the last assignment, therefore obtaining $p(\mathbf{z_{-u}})$ by considering how was the equation $p(\mathbf{z})$ before $z_u$ was assigned to cluster $j$.

Plugging $p(\mathbf{z}_{-u})$ and $p(\mathbf{z})$ into $p(z_u \vert \mathbf{z}_{-u})$, we get:

Assuming that all $\alpha_i$ have the same value $\alpha$, and since the sum of all $n_i$ is $U$:

and finally exploiting the fact that $a \Gamma(a) = \Gamma(a+1)$:

we can reparametrize $\alpha$ as $\alpha/K$. Then we have the standard result:

where $n_j$ is the number of users in cluster $j$ before the assignment of $z_u$.

The Chinese Restaurant Process is the consequence of considering $K \rightarrow \infty$. For clusters where $n_j>0$, we have:

and the probability of assigning $z_u$ to any of the (infinite) empty clusters is:

which written in a compact equation is: