next up previous
Next: About this document ...




CheiRank versus PageRank
(quantware popular article quwap1)



D.L. Shepelyansky
Laboratoire de Physique Théorique du CNRS, IRSAMC, Université de Toulouse, UPS, F-31062 Toulouse, France


The CheiRank is an eigenvector with a maximal real eigenvalue of the Google matrix ${\mathbf G^*}$ constructed for a directed network with the inverted directions of links. It is similar to the PageRank vector, which ranks the network nodes in average proportionally to a number of incoming links being the maximal eigenvector of the Google matrix G with a given initial direction of links. Due to inversion of link directions the CheiRank ranks the network nodes in average proportionally to a number of outgoing links. Since each node belongs both to CheiRank and PageRank vectors the ranking of information flow on a directed network becomes two-dimensional.


1. Definition

For a given directed network the Google matrix is constructed in the way described in the article Google matrix. The PageRank vector is the eigenvector with the maximal real eigenvalue $\lambda = 1$. It was introduce in [1] and is discussed in the article PageRank. In a similar way the CheiRank is the eigenvector with the maximal real eigenvalue of the matrix ${\mathbf G^*}$ built in the same way as G but using inverted direction of links in the initially given adjacency matrix. Both matrices ${\mathbf G}$ and ${\mathbf G^*}$ belong to the class of Perron-Frobenius operators and according to the Perron-Frobenius theorem the CheiRank $P^*_i$ and PageRank $P_i$ eigenvectors have nonnegative components which can be interpreted as probabilities [2,3]. Thus all $N$ nodes $i$ of the network can be ordered in a decreasing probability $P^*_i$, $P_i$ order with ranks $K^*_i$, $K_i$ for CheiRank and PageRank respectively. In average the PageRank probability $P_i$ is proportional to the number of ingoing links with $P_i \propto 1/{K_i}^\beta$ [4,5,6]. For the World Wide Web (WWW) network the exponent $\beta =1/(\nu-1) \approx 0.9$ where $\nu \approx 2.1$ is the exponent for ingoing links distribution [4,5]. In a similar way the CheiRank probability is in average proportional to the the number of outgoing links with $P^*_i \propto 1/{K^*_i}^{\beta^*}$ with $\beta^* =1/(\nu^*-1) \approx 0.6$ where $\nu^* \approx 2.7$ is the exponent for outgoing links distribution of the WWW [4,5]. The CheiRank was invented for the procedure call network of Linux Kernel software in [6], the term itself was introduced and used in [7]. While the PageRank highlights very well known and popular nodes, the CheiRank highlights very communicative nodes. Top PageRank and CheiRank nodes have certain analogy to authorities and hubs appearing in the HITS algorithm [9] but the HITS is query dependent while the rank probabilities $P_i$ and $P^*_i$ classify all nodes of the network. Since each node belongs both to CheiRank and PageRank we obtain a two-dimensional ranking of network nodes.


2. Examples

An example of nodes distribution in the plane of PageRank and CheiRank is shown in Fig.1 for the procedure call network of Linux Kernel software taken from [7].

Figure: Distribution of procedure calls of Linux Kernel network in the plane of PageRank probability $x = \log_{10} P_i$ and CheiRank probability $y = \log_{10} {P}^*_i$ for Linux version 2.6.32 with matrix size $N = 285509$ at $\alpha = 0.85$, color shows the density of nodes with white for maximum and blue for minimum, black space has no nodes (from [7])
\includegraphics[width=10cm,angle=0, clip]{CheiRank1.eps}

The dependence of $P$, $P^*$ on $K$, $K^*$ for the network of hyperlink network of Wikipedia English articles is shown in Fig.2 from [8]. The distribution of these articles in the plane of PageRank and CheiRank is shown in Fig.3 from [8]. The difference between PageRank and CheiRank is clearly seen from the names of Wikipedia articles (2009) with highest rank. At the top of PageRank we have 1.United States, 2. United Kingdom, 3. France while for CheiRank we find 1. Portal:Contents/Outline of knowledge/Geography and places, 2. List of state leaders by year, 3. Portal:Contents/Index/Geography and places. Clearly PageRank selects first articles on a broadly known subject with a large number of ingoing links while CheiRank selects first highly communicative articles with many outgoing links. Since the articles are distributed in 2D they can be ranked in various ways corresponding to projection of 2D set on a line. The horizontal and vertical lines correspond to PageRank and CheiRank, 2DRank combines properties of CheiRank and PageRank as it is discussed in [8]. It gives top Wikipedia articles 1. India, 2. Singapore, 3. Pakistan.

Figure 2: Dependence of probability of PageRank P (red curve) and CheiRank P * (blue curve) on the corresponding rank indexes $K$ and $K^*$ . The straight dashed lines show the power law dependence with the slope $\beta = 0.92; 0.57$ respectively, corresponding to $\beta = 1/(\nu - 1)$ (from [8]).
\includegraphics[width=10cm,angle=0, clip]{CheiRank2.eps}


Figure: Density distribution of Wikipedia English articles (2009) in the plane of PageRank and CheiRank indexes $0 \leq \ln K , \ln K^* \leq \ln N$ shown by color with blue for minimum and white for maximum (black for zero); green/red points show top 100 personalities from PageRank/CheiRank, yellow pluses show top 100 personalities from Hart's book, number of articles $N = 3282257$ (from [8]).
\includegraphics[width=10cm,angle=0, clip]{CheiRank3.eps}

The 2D ranking highlights the properties of Wikipedia articles in a new rich and fruitful manner. According to the PageRank the top 100 personalities described in Wikipedia articles have in 5 main category activities: 58 (politics), 10 (religion),17 (arts), 15 (science), 0 (sport) and thus the importance of politicians is strongly overestimated. The CheiRank gives respectively 15, 1, 52, 16, 16 while for 2DRank one finds 24, 5, 62, 7, 2. Such type of 2D ranking can find useful applications for various complex directed networks including the WWW. Applications to a brain neural network and a business process management network are analysed in [10,11].


3. Links

Here are few links to related subjects: Wikipedia articles
* Google matrix, PageRank, HITS algorithm;
* Markov chain, Transfer operator, Perron-Frobenius theorem;
* Information retrieval;
* Web search engines and
http://www.quantware.ups-tlse.fr/QWLIB/2drankwikipedia/ .


4. References

[1] Brin S.; Page L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems v.30, p.107

[2] Langville, Amity N; Carl Meyer (2006). Google's PageRank and Beyond. Princeton University Press. ISBN 0-691-12202-4.

[3] Austin, David (2008). How Google Finds Your Needle in the Web's Haystack. AMS Feature Columns. http://www.ams.org/samplings/feature-column/fcarc-pagerank

[4] Donato D.; Laura L., Leonardi S., Millozzi S. (2004). Large scale properties of the Webgraph. Eur. Phys. J. B v.38, p.239

[5] Pandurangan G.; Ranghavan P., Upfal E. (2005). Using PageRank to Characterize Web Structure. Internet Math. v.3, p. 1

[6] Litvak N.; Scheinhardt W.R.W, Volkovich Y. (2008). Probabilistic Relation between In-Degree and PageRank. Lecture Notes in Computer Science, V.4936 p.72 (DOI: 10.1007/978-3-540-78808-9)

[7] Chepelianskii, Alexei D. (2010). Towards physical laws for software architecture. arXiv:1003.5455. http://arxiv.org/abs/1003.5455

[8] Zhirov A.O.; Zhirov O.V., Shepelyansky D.L. (2010). Two-dimensional ranking of Wikipedia articles. Eur. Phys. J. B v.77, p.523

[9] Kleinberg, Jon (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM v.46(5), p.604

[10] Shepelyansky D.L.; Zhirov O.V. (2010). Towards Google matrix of brain. Phys. Lett. A v.374, p.3206

[11] Abel M.; Shepelyansky D.L. (2011). Google matrix of business process management. Eur. Phys. J. B (to appear) arxiv:1009.2631[cs.CY]




next up previous
Next: About this document ...
SHEPELYANSKY Dima 2011-01-14