Google Acquires Urchin

In other news, Google announced that they acquired Urchin, a web analytics vendor and service. This makes a lot of sense for Google, but not for some of the reasons I’ve seen speculated on.

One speculation is that it gives Google web analytics capabilities to analyze their site. Actually, no, it doesn’t. Google has too much traffic, and their analysis needs are too complex.

Another is that Google can now offer this as an additional capability to their AdWords / AdSense customers. I don’t buy this. Google’s already got enough reporting capabilities in the SEM (search engine marketing) area, and Urchin isn’t going to add any value here that couldn’t have been done cheaper in-house.

It’s also not because Google is just a bunch of Nice People and they want to have another tool in their portfolio of cool stuff.

So if Google doesn’t need this for their own analytics, or to offer to AdWords customers, why bother? After all, Urchin isn’t a game-changing technology. There are better solutions available, no matter which axis you measure on.

Simple. Google did this because they want more ability to get off-network surfing data. They want to know, for people not using any Google services, what are they using? That information is partially available through AdSense, because AdSense lives on third-party sites. That’s a rich source of data. A nice way to get even more off-network data is to supply folks with a hosted analytics service that most small and medium-sized web sites can use. Simply put a web bug / beacon in your page, and we’ll track your visitors for you. And for us.

(Before you get all cynical on me: yes, Overture bought Keylime many years ago, for SEM reporting, and perhaps for off-network information, I don’t know. The difference between the Overture/Keylime and Google/Urchin deals is that Yahoo! and Overture are different legal entities, and have different privacy policies. As a result, Yahoo and Overture cannot share third-party information about web surfers. Whether or not that makes business sense is beside the point – Yahoo’s pretty rabid about privacy.)

One final element of this announcement. If there’s no privacy backlash, and web sites brush off the concept of Google as big brother, the low-end market for web analytics is effectively dead. Omniture, WebSideStory and (perhaps) CoreMetrics will survive, but it’s going to be tough for anyone else, which is going to give the newly independent WebTrends second thoughts about resurrecting WebTrends Live / WebTrends OnDemand.

Google Acquires Urchin

Shaking up the Analytics Landscape

In case you missed it, NetIQ is spinning out WebTrends. I won’t speculate why – oh hell, of course I will. I thought (and still think) that WebTrends went together with NetIQ only a small amount more than Andromedia went together with Macromedia – that is to say, not very much. The two companies have different lines of business, and web analytics ended up being a side show. WebTrends and NetIQ sell to different people in the organization – simple as that.

I’ve heard rumor of a somewhat similar web analytics deal coming down soon, as the vendor sells off its analytics business to focus on a different line of products.

Consolidation, or diversification? Apparently some people think they can’t make enough in the web analytics business, while others think they can. Interesting times.

Shaking up the Analytics Landscape

Mojo et al

A bit of a buzz today around Om Malik’s How Yahoo Got Its Mojo Back with the attendant lovers and haters commenting along. As is the norm, a lot of the haters (of both Yahoo and Google) don’t know what they are talking about.

I still find it surprising how seemingly intelligent people can march up and down about how one service is amazing and the other is absolute rubbish. If it works for you, great. I know people who swear by the gmail interface, and others who swear at it. Some people want My Yahoo, and others prefer Google News. So be it.

Regardless, I think what makes both wonderful is the competition. Microsoft is coming? Hey, jump in – the water’s great.

Mojo et al

Flickrizing Yahoo!

Not a lot of blogging lately – not because there’s nothing to talk about, but because I’m up to my eyeballs in resumes and recruiting. (If you can code, and you understand web data, get in touch!)

Regarding Yahoo!’s purchase of Flickr – some random thoughts:

  1. I suspect Flickr will influence Y! more than the other way ’round.
  2. Tagging (aka folksonomies) will show up in other places on Y!.
  3. We (SDS) need a strategy for figuring out how to analyze/report on tags (perhaps with similar technology that’s used to power the the buzz index).
  4. Tags are going to give Overture and Google a whole new set of opportunities and headaches for context advertising. On the surface, they look like they could be used like search terms, but in so many ways, they’re a lot different.
  5. I’m glad I had the foresight (or lack of imagination) to create a Flickr ID that’s the same as my Yahoo! ID.
Flickrizing Yahoo!

Firefox? Yes Please.

I’m one of those guys that runs around thinking people should use the Firefox browser. Many people inside Y! do use it, but they are generally the early adopters. A week or two ago, posters went up around campus announcing an internal test of a new service (no, it’s not Yahoo! 360°). It looks pretty useful, so I went to check it out.

Windows and IE only. I couldn’t believe it.

So forgive me for snickering when I read Yahoo vows to open all services to Firefox users. It said Yahoo would not launch any new services without Firefox support. Cool.

Maybe the product in internal test will get modified to support other browsers, or operating systems, before it’s released. That would be great. And then maybe Launch could support non-Windows machines too…

Update: Sigh.

Firefox? Yes Please.

The web data pipelines

I wanted to address another observation given in the article Things That Throw Your Stats. The author makes the statement:

Web analysis is statistics, not accounting.

While I think his overall message is a disservice to the people trying hard to increase accuracy and accountability on the web, I won’t go on about that here. Instead, I want to point out that his view of web analysis is too narrow.

Actually there are three different components to web analysis. At Yahoo! we have many sources of data, but fundamentally three data pipelines:

  • Operational
  • Financial
  • Analytical

Each may start from a central place, such as the web server log files, but they move through the infrastructure at different speeds, and in different ways, because they are used for different things.

The operational data pipeline is largely concerned with availability, quality of service, consistency, correctness, etc. Some of the analysis needs to be available in real-time, and some of it much less so. A lot of the analysis is accounting, but there’s statistics involved for things like failure prediction.

The financial data pipeline is all about the money. If you can’t account for it, you can’t charge for it. Since Y! is largely ad-driven, it’s important to get this aspect right. A 10% “fudge” won’t sit right with advertisers, nor with shareholders, nor with the fine folks who brought you Sarbanes-Oxley. Not everything needs to be collected (e.g. click paths aren’t very interesting), just metrics like ad views and clickthroughs. It’s not real-time, but needs to be available relatively soon after a campaign ends, or at the end of an accounting quarter. This is largely straight accounting, yet there are statistics involved, for things like detecting click fraud.

The analytics data pipeline largely parallels the financial pipeline, but doesn’t have to be SOX-compliant. Also much more data is collected (e.g. browser string), and even more data is algorithmically computed (e.g. visit duration). The intention, of course, is to use analytics to impact the other two systems. The tricky part is that the way to positively impact the operational and financial systems is by improving the user experience (better response times, more engaging content, etc.) which largely must be inferred through observed behavior. There’s some accounting here, but largely statistics, advanced metrics, and data research/mining, with a heavy dose of human-based synthesis. Some of the results of the analytics systems feed the operational pipeline, for things like providing targeted advertising based on observed interest.

While the group I’m in largely focuses on strategic uses of the web data – the analytics pipeline – it’s never done in a vacuum; we’re always cognizant of the other two pipelines. All three groups – operational, financial, and analytical – are all doing analysis, all with the same source data, all towards the same overall goals. The data we keep, the tools we use, and the methods we employ can be very different, but it’s always a combination of accounting and statistics – never just one or the other.

The web data pipelines

Why standards are hard

In a good primer on why web analytics is hard, we see this statement:

Every single person inside the Ford corporation has the same IP address.

So, some questions:

  • What about married people? All have the same IP address? If so, are they all the same as each other but different from the single people?
  • Do they have different IP addresses outside of Ford?
  • How can they talk to each other if they all have the same IP address?
  • Do people really have IP addresses? Where do they keep them?

Silly questions? Sure. In this case, if we’re familiar with how the web works at a high-level (and we must if we’re going to do any analysis), we can devine intent from context as well as from our experience. However, human languages are not precision instruments, and thus everything is open to interpretation.

This is why standards, and legal contracts (including laws) try to be so precise. Precise, unambiguous wording is very hard. This is one reason why people invent new terms during any new endeavors – to make sure that the meaning is precise as possible. It’s also what makes standards (and laws) so hard to understand.

This is from experience: many moons ago, I was the technical chair and spec editor for the DMTF DMI 1.0 standard. The fruits of that effort – an 18-month process – are now embedded in many devices, including the BIOS of every x86-based computer.

Back to our topic. The other thing about standards is that they are produced for a specific purpose. Knowing that purpose is the key to how the standard is applied. Often, different groups have different goals, so they need different standards. When the standards overlap, or contradict, for very similar applications, there’s confusion. A common joke is “the good thing about standards is there are so many to choose from.” Until this article, I’d never heard of JICWEBS, and I’ve been doing web analytics for almost ten years now. Part of that is because it’s a UK & Ireland effort. But they exist and have a standard. These standards aren’t the same standards that the Interactive Advertising Bureau has produced, but that’s because the IAB focuses on ads. Yet there is overlap. Which do you choose?

I have high hopes the WAA will be able to help in this regard.

Why standards are hard

PR in 2005

So I was IMing with a friend today. His company is doing some very cool stuff and got some good press recently. I told him he should have a “CEO’s blog” so I can find out about it:

him: i have my own pr team...
me: but if you had a ceo blog, you could link to those news items,
         and they'd show up in my RSS reader.  it would be an easy way
         for me to keep tabs on what the company is up to.
         i can't be bothered to remember to go to your web site.
him: aren't you in the opt in list
him: do you get periodic emails from us?
me: no, i don't want it in my mailbox. email is so 90s

and then he was off to do whatever CEOs do.

Dude, I know you read this and understand “connectors”.

Let people talk to you, and link to you.

PR in 2005