An Industry on the Move

Some happenings in the web analytics industry that caught my eye today:

Omniture announced a new line of executive chairs that have built-in LCDs. This could be just the thing for busy folks to keep tabs on their site stats. The new division, Omni Furniture, has promised additional pieces in the future.

Urchin Software announced a special vertical release for banking and financial services firms. The software, Street Urchin, looks a little rough around the edges by design – apparently that “grimy feel” is playing well with the pinstripe set.

I just heard Apple has been grumbling about Coremetrics. Apparently Apple doesn’t like the use of the term “core”. The Beatles record company, Apple Corps, is not amused by Apple’s attitude. Can’t we all just get along?

Speaking of music, Visual Sciences are rumored to be changing their name to AV Sciences as they add what they call “symphonic analytics” to their lineup. Sounds interesting – ha ha!

Finally, I’m pleased to announce that Omniture CTO Brett Error and I are joining forces on a new part-time consultancy, Error Page Analytics.

An Industry on the Move

You Say Tomato…

Over at Coffee, Sun & Analytics, Xavier has a couple posts on session length. Some good thoughts there, but I was surprised at the statement

Session length = number of pages users viewed during their session on the site.

Call me old school, but I thought session length was the amount of time a user spent.

I can’t find any definitive phrase for what Xavier is talking about. At Accrue we called it Session Depth or Visit Depth (we said a session was user-centric: a user may visit many sites during a session). At Yahoo there’s no new term, it’s just “pageviews per session.”

Argh. Here we have two people with a lot of experience in web analytics and we’re not even speaking the same language. What a mess!

You Say Tomato…

Google Acquires Urchin

In other news, Google announced that they acquired Urchin, a web analytics vendor and service. This makes a lot of sense for Google, but not for some of the reasons I’ve seen speculated on.

One speculation is that it gives Google web analytics capabilities to analyze their site. Actually, no, it doesn’t. Google has too much traffic, and their analysis needs are too complex.

Another is that Google can now offer this as an additional capability to their AdWords / AdSense customers. I don’t buy this. Google’s already got enough reporting capabilities in the SEM (search engine marketing) area, and Urchin isn’t going to add any value here that couldn’t have been done cheaper in-house.

It’s also not because Google is just a bunch of Nice People and they want to have another tool in their portfolio of cool stuff.

So if Google doesn’t need this for their own analytics, or to offer to AdWords customers, why bother? After all, Urchin isn’t a game-changing technology. There are better solutions available, no matter which axis you measure on.

Simple. Google did this because they want more ability to get off-network surfing data. They want to know, for people not using any Google services, what are they using? That information is partially available through AdSense, because AdSense lives on third-party sites. That’s a rich source of data. A nice way to get even more off-network data is to supply folks with a hosted analytics service that most small and medium-sized web sites can use. Simply put a web bug / beacon in your page, and we’ll track your visitors for you. And for us.

(Before you get all cynical on me: yes, Overture bought Keylime many years ago, for SEM reporting, and perhaps for off-network information, I don’t know. The difference between the Overture/Keylime and Google/Urchin deals is that Yahoo! and Overture are different legal entities, and have different privacy policies. As a result, Yahoo and Overture cannot share third-party information about web surfers. Whether or not that makes business sense is beside the point – Yahoo’s pretty rabid about privacy.)

One final element of this announcement. If there’s no privacy backlash, and web sites brush off the concept of Google as big brother, the low-end market for web analytics is effectively dead. Omniture, WebSideStory and (perhaps) CoreMetrics will survive, but it’s going to be tough for anyone else, which is going to give the newly independent WebTrends second thoughts about resurrecting WebTrends Live / WebTrends OnDemand.

Google Acquires Urchin

Shaking up the Analytics Landscape

In case you missed it, NetIQ is spinning out WebTrends. I won’t speculate why – oh hell, of course I will. I thought (and still think) that WebTrends went together with NetIQ only a small amount more than Andromedia went together with Macromedia – that is to say, not very much. The two companies have different lines of business, and web analytics ended up being a side show. WebTrends and NetIQ sell to different people in the organization – simple as that.

I’ve heard rumor of a somewhat similar web analytics deal coming down soon, as the vendor sells off its analytics business to focus on a different line of products.

Consolidation, or diversification? Apparently some people think they can’t make enough in the web analytics business, while others think they can. Interesting times.

Shaking up the Analytics Landscape

Flickrizing Yahoo!

Not a lot of blogging lately – not because there’s nothing to talk about, but because I’m up to my eyeballs in resumes and recruiting. (If you can code, and you understand web data, get in touch!)

Regarding Yahoo!’s purchase of Flickr – some random thoughts:

  1. I suspect Flickr will influence Y! more than the other way ’round.
  2. Tagging (aka folksonomies) will show up in other places on Y!.
  3. We (SDS) need a strategy for figuring out how to analyze/report on tags (perhaps with similar technology that’s used to power the the buzz index).
  4. Tags are going to give Overture and Google a whole new set of opportunities and headaches for context advertising. On the surface, they look like they could be used like search terms, but in so many ways, they’re a lot different.
  5. I’m glad I had the foresight (or lack of imagination) to create a Flickr ID that’s the same as my Yahoo! ID.
Flickrizing Yahoo!

The web data pipelines

I wanted to address another observation given in the article Things That Throw Your Stats. The author makes the statement:

Web analysis is statistics, not accounting.

While I think his overall message is a disservice to the people trying hard to increase accuracy and accountability on the web, I won’t go on about that here. Instead, I want to point out that his view of web analysis is too narrow.

Actually there are three different components to web analysis. At Yahoo! we have many sources of data, but fundamentally three data pipelines:

  • Operational
  • Financial
  • Analytical

Each may start from a central place, such as the web server log files, but they move through the infrastructure at different speeds, and in different ways, because they are used for different things.

The operational data pipeline is largely concerned with availability, quality of service, consistency, correctness, etc. Some of the analysis needs to be available in real-time, and some of it much less so. A lot of the analysis is accounting, but there’s statistics involved for things like failure prediction.

The financial data pipeline is all about the money. If you can’t account for it, you can’t charge for it. Since Y! is largely ad-driven, it’s important to get this aspect right. A 10% “fudge” won’t sit right with advertisers, nor with shareholders, nor with the fine folks who brought you Sarbanes-Oxley. Not everything needs to be collected (e.g. click paths aren’t very interesting), just metrics like ad views and clickthroughs. It’s not real-time, but needs to be available relatively soon after a campaign ends, or at the end of an accounting quarter. This is largely straight accounting, yet there are statistics involved, for things like detecting click fraud.

The analytics data pipeline largely parallels the financial pipeline, but doesn’t have to be SOX-compliant. Also much more data is collected (e.g. browser string), and even more data is algorithmically computed (e.g. visit duration). The intention, of course, is to use analytics to impact the other two systems. The tricky part is that the way to positively impact the operational and financial systems is by improving the user experience (better response times, more engaging content, etc.) which largely must be inferred through observed behavior. There’s some accounting here, but largely statistics, advanced metrics, and data research/mining, with a heavy dose of human-based synthesis. Some of the results of the analytics systems feed the operational pipeline, for things like providing targeted advertising based on observed interest.

While the group I’m in largely focuses on strategic uses of the web data – the analytics pipeline – it’s never done in a vacuum; we’re always cognizant of the other two pipelines. All three groups – operational, financial, and analytical – are all doing analysis, all with the same source data, all towards the same overall goals. The data we keep, the tools we use, and the methods we employ can be very different, but it’s always a combination of accounting and statistics – never just one or the other.

The web data pipelines

Why standards are hard

In a good primer on why web analytics is hard, we see this statement:

Every single person inside the Ford corporation has the same IP address.

So, some questions:

  • What about married people? All have the same IP address? If so, are they all the same as each other but different from the single people?
  • Do they have different IP addresses outside of Ford?
  • How can they talk to each other if they all have the same IP address?
  • Do people really have IP addresses? Where do they keep them?

Silly questions? Sure. In this case, if we’re familiar with how the web works at a high-level (and we must if we’re going to do any analysis), we can devine intent from context as well as from our experience. However, human languages are not precision instruments, and thus everything is open to interpretation.

This is why standards, and legal contracts (including laws) try to be so precise. Precise, unambiguous wording is very hard. This is one reason why people invent new terms during any new endeavors – to make sure that the meaning is precise as possible. It’s also what makes standards (and laws) so hard to understand.

This is from experience: many moons ago, I was the technical chair and spec editor for the DMTF DMI 1.0 standard. The fruits of that effort – an 18-month process – are now embedded in many devices, including the BIOS of every x86-based computer.

Back to our topic. The other thing about standards is that they are produced for a specific purpose. Knowing that purpose is the key to how the standard is applied. Often, different groups have different goals, so they need different standards. When the standards overlap, or contradict, for very similar applications, there’s confusion. A common joke is “the good thing about standards is there are so many to choose from.” Until this article, I’d never heard of JICWEBS, and I’ve been doing web analytics for almost ten years now. Part of that is because it’s a UK & Ireland effort. But they exist and have a standard. These standards aren’t the same standards that the Interactive Advertising Bureau has produced, but that’s because the IAB focuses on ads. Yet there is overlap. Which do you choose?

I have high hopes the WAA will be able to help in this regard.

Why standards are hard