The Statbot

TechCrunch Statistics A-W

May 27th, 2008 · 26 Comments

Why A-W? Because I’m pretty sure I missed some ;) Do point them out so that I can do a “TechCrunch W-Z Statistics” post too ;)

Quick Stats

  • Total of 7007 posts….
  • …spread over 1079 days, or just under 3 years
  • …with a total of 1,977,710 words
  • …at an average of 6.5 posts a day
  • …with 282.2 words a post
  • …receiving 228,449 comments
  • …from 56,292 unique commentators
  • …with 18,440 outbound links…
  • to 4641 sites…
  • …at an average of 4 links to every site

Growth

Looking at Techcrunch’s growth patterns…

Posting Frequency

Posting frequency almost doubled after May 2007, and has been increasing ever since. I believe that hiring more bloggers to be part of TechCrunch resulted in the, uh, “explosion”. Gabe Rivera explained that this(ie going Pro) was one of the reasons TC was given more weight on Techmeme.

Post Size

Although the frequency of posting has gone significantly up, their length remains almost constant (notice the almost flat Black Trendline)

Commenting Growth

The comments increased by quite a bit after Jan 06, but seem pretty constant after that.

Top Links

Here is the list of the most linked to domains from TechCrunch:

Rank Site Links Presence
1 http://crunchbase.com

1574

8.54

2 http://techcrunch.com

670

3.63

3 http://google.com

429

2.33

4 http://crunchgear.com

343

1.86

5 http://yahoo.com

298

1.62

6 http://blogspot.com

258

1.40

7 http://flickr.com

255

1.38

8 http://facebook.com

220

1.19

9 http://wikipedia.org

162

0.88

10 http://technorati.com

155

0.84

11 http://typepad.com

124

0.67

12 http://nytimes.com

115

0.62

13 http://gigaom.com

110

0.60

14 http://youtube.com

108

0.59

15 http://digg.com

96

0.52

16 http://myspace.com

92

0.50

17 http://twitter.com

79

0.43

18 http://amazon.com

75

0.41

19 http://wsj.com

75

0.41

20 http://blogs.com

75

0.41

21 http://techmeme.com

73

0.40

22 http://live.com

71

0.39

23 http://talkcrunch.com

70

0.38

24 http://microsoft.com

70

0.38

25 http://crunchboard.com

70

0.38

Here’s the chart showing the percentage of links to TechCrunch properties (Crunchboard, CrunchBase, TechCrunch, CrunchGear, etc) and others:

Pretty good :) I’ll leave you to draw your own conclusions… ;)

Distribution of links to sites seem to follow a power-law (as most stuff online today seems to). [Note: Power law distribution, in simplified terms, means that a few sites have a lot of links, and a lot of sites have few links]

Note that the tail is pretty much indefinitely long, since I’ve included only the Top 100 sites here…

Posting Frequency Distribution

The posting frequency appears skewed towards the left:

Most days have 1-5 posts, with 2 posts being the most common. 36 days had no posts, and 29 days had more than 20 posts.

Commenting Frequency Distribution

Comments seem to follow a Normalish distribution, skewed towards the left (btw, question to better stats buffs than me – is it really normal if it is skewed?)

Most posts have 5-40 comments. The big pole at the end is not exactly at 100, but it’s because there are 252 posts with more than 100 comments. Large number of outliers, eh?

Links Distribution

The graph looks pretty weird, but it’s how blogs work if you think ’bout it:

Mosts posts have 1-3 links, while there are more posts with 0 links than 10. I think this is pretty normal for most blogs.

Top Commentators

Here’s the list of Top 25 people who have left a comment on TechCrunch:

Rank Commentator Comments Presence
1 Michael Arrington

3195

1.40%

2 Chris

1649

0.72%

3 Andrew

1131

0.50%

4 Duncan Riley

1110

0.49%

5 David Mackey

1004

0.44%

6 John

923

0.40%

7 Mike

900

0.39%

8 Steve Ballmer

869

0.38%

9 Alex

834

0.37%

10 Matt

802

0.35%

11 Jon

777

0.34%

12 Jason

669

0.29%

13 David

659

0.29%

14 Paul

641

0.28%

15 Dave

614

0.27%

16 Steve

604

0.26%

17 Allen Stern

604

0.26%

18 Josh

594

0.26%

19 Peter

581

0.25%

20 Tom

557

0.24%

21 Mark

541

0.24%

22 Sean

494

0.22%

23 James

488

0.21%

24 Alaska Miller

485

0.21%

25 Marshall Kirkpatrick

476

0.21%

(Sorry for the lack of URLs – I put in everyone who was found via Google. If you’re here, and want your url, just leave a comment)

Just as how TechCrunch links back to itself the most, Mike is the highest commentator ;)

p.s. which is the correct term to call a person who leaves comments – commentor or commentator?

Trivia

My most favorite section! :) However, this has been cut short since TechCrunch doesn’t provide the “time” of posting nor has categories or tags (why is that?)

As usual Saturday is really the “day off”, with less posts than even Sunday. Friday, as usual, is “slow news day”

Saturday’s posts are longest, and Wednesday’s are shortest. Let’s wait till Steve Gillmor’s ramblings make Sunday’s posts longer ;) (On a slightly pseudo-serious note, has anyone been able to understand anything Steve Gillmor says?)

Conclusion

So folks, that’s the end of it. I would release the data set (An 180 meg XML file with all the content and comments) for download, but am not sure what Mike and co would feel like. If I get-go from them, then I’ll make a post releasing the files. Till then: Any queries? Leave a comment!

SideNote: College Admissions

Things will be relatively quiet here for a week ‘coz I’m looking into college admissions. You can help too! The college I’m hoping to get into has a history of accepting students with a history of good extra-curricular activities. Now, if I get lotsa links here, my Technorati Rank is goanna improve and put up my chances of sneaking into the college so that I can feel at home with the rest of the geeks there ;)

Tags: Blogs · Uncategorized

26 responses so far ↓

  • 1 Dennis G // May 27, 2008 at 2:08 pm

    Wow, impressive stats.
    I hope you have used some automated tools for pulling these… OR do you have a lot of time

    Keep them coming, love this stuff…

  • 2 Yuvi // May 27, 2008 at 2:13 pm

    @DennisG: I do have an quasi-automated tool for pulling these up :) Will someday make a post about them….

    Thanks for the encouragement ;)

  • 3 Dennis G // May 27, 2008 at 2:15 pm

    I hope you will land in a good place, college I mean.
    Added your blog to my reader, and will follow you.

    Who knows in 2/3 years time I have some open positions.
    This is good S***

  • 4 Joseph Hunkins // May 27, 2008 at 2:22 pm

    Brilliant Yuvi - very well organized info - keep up the clever work.

  • 5 Andy Beard // May 27, 2008 at 2:24 pm

    Hi Yuvi

    Great stats - I know a few you missed though which are quite insightful.
    I have one data source which is around 400MB of data to play with.

  • 6 Yuvi // May 27, 2008 at 2:58 pm

    @Joseph: Thanks!

    @Andy: I know I did miss a few. What’s that 400 Megs of data you’re talking about? I’m quite interested…

  • 7 notes, thoughts, ideas and responses » Three Power Law Relationships (Techmeme, Twitter, TechCrunch) // May 27, 2008 at 3:03 pm

    [...] TechCrunch Distribution of Links to Hosts (scroll halfway down the post) CrunchBase Information TechCrunch Information provided by CrunchBase [...]

  • 8 kamla bhatt // May 27, 2008 at 3:41 pm

    Great job with that number crunching and the pie charts and graphs.

    What did you come away learning from this exercise? How to run an effective blog? ….I am curious to know.

    Thanks for sharing this.

    Kamla Bhatt

  • 9 TechCrunch Under A Microscope | Jeffro2pt0 // May 27, 2008 at 4:31 pm

    [...] has recently published an awesome statistical analysis of TechCrunch.com Some immediate findings based on the number crunching produced by the [...]

  • 10 JD Rucker // May 27, 2008 at 5:04 pm

    Amazing compilation. Wonder how much time it took to count all of those stats.

  • 11 Sandeep Balaji // May 27, 2008 at 6:00 pm

    Great stats…I was amazed that 86% of the link were to their own properties……

    Cheers
    Sandeep

  • 12 Cyndy Aleo-Carreira // May 27, 2008 at 7:06 pm

    Yuvi, I don’t give a rat’s patootie about the stats, but the analysis is hilarious. I come just for the giggles. ;)

  • 13 Wendy Piersall // May 27, 2008 at 10:21 pm

    Fascinating that not one of those top commentator people are women - but not entirely surprising either. They live and write for a male dominated world.

    While I completely admire and respect the empire Michael Arrington has built, I’ve had to explain who he is to more than one (promiment) female blogger out there.

    His stats are impressive, but don’t forget he blogs in somewhat of a bubble. :)

  • 14 Ouriel Ohayon // May 28, 2008 at 12:22 am

    Interesting piece of work. Would be curious to know if you could run an aggregated analysis with other Techcrunch blogs (specially techcrunch france)

  • 15 The StatBot analyzes TechCrunch from A-W : The Blog Herald // May 28, 2008 at 4:11 am

    [...] StatBot, an upcoming and coming blog focused on statistical analysis of the blogosphere, has posted an in-depth analysis of TechCrunch - and what they find is pretty interesting [...]

  • 16 Darren Herman - Marketing, Advertising, Media and Technology Blog » Blog Archive » Digital Wednesdays: Back in Action // May 28, 2008 at 7:26 am

    [...] TechCrunch Statistics - I’m becoming more and more of a numbers person and find this post all about TechCrunch to be extremely interesting.  If I was Alley Insider, PaidContent, Center Networks, ReadWrite Web, or any other digital media/technology site, these stats may be extremely relevant to me as I can then benchmark my site to see how it’s performing.  One area that was interesting to me was that of the  posts by weekday & words per post.  I’ve highlighted one of their graphics below so you can see: [...]

  • 17 Yuvi // May 28, 2008 at 9:31 am

    @Sandeep: Err, you’ve got it in the reverse!

  • 18 Yuvi // May 28, 2008 at 9:43 am

    @Ouriel: I’d love to do that, but don’t think I’d be as effective because of language issues…

  • 19 Yuvi // May 28, 2008 at 9:44 am

    @Wendy: Yep ;) I’m actually trying to Statbot some Girl bloggers, but can’t find any good/popular (since popular=more data) ones!

  • 20 Cyndy Aleo-Carreira // May 28, 2008 at 9:46 am

    @Yuvi I’m just going to pretend I didn’t see that.

  • 21 Yuvi // May 28, 2008 at 10:04 am

    @Cyndy: Wait - You’re a girl! I never knew! :P Names just totally confuse me!

    I’m sorry *sniff* :)

  • 22 Benton // May 30, 2008 at 5:04 pm

    Nice work Yuvi

    Looks like TC is a volume game like McDonalds.

  • 23 First Indian Blogger to break into Techmeme LeaderBoard | Oodami // May 30, 2008 at 9:39 pm

    [...] has interesting graphs on FriendFeeds of Bloggers, Techmeme and TechCrunch. Congratulations Yuvi! So, If a blogger has something interesting to say other bloggers will link [...]

  • 24 Saad // May 30, 2008 at 9:46 pm

    Good work Yuvi. Glad to see such beautiful work. Wish you good luck for your higher studies. I am waiting for your post on the the system you use to develop such labor intensive graphs. ;-)

  • 25 Michael Arrington’s Love for Technorati | Vinay is In! - Weblog/Blog of Vinay // Jun 1, 2008 at 4:28 am

    [...] covered couple of article on TechCrunch which was pretty much interesting to know some facts about Mike & other authors. [...]

  • 26 Jon Peltier // Jun 16, 2008 at 5:15 am

    Some suggestions:

    1. The “Growth” plots are time series, and would be better suited to a line chart than a bar chart. It’s hard to tell what the numbers even are, because the bars give an unintentional shading from light near the top to dark below.

    They would also be improved by using a moving average, perhaps 7 day, to smooth out the severe variability of the raw numbers.

    2. The power law (Zipf) distribution is better shown with an XY chart, but keep the line without markers subtype. Change both scales to a logarithmic scale: a straight line is a power law relationship. Jakob Nielsen discusses this in Zipf Curves and Website Popularity.

Leave a Comment