X1 Search Engine Takes $10MM

I don’t see it on their web site yet, but search engine company X1 today announced an investment of $10 million, lead by USVP. X1 is the search engine that Yahoo! uses as the base for its desktop search product.

I wonder about the future of desktop search. Between Google and Yahoo providing free integrated deskptop-web search functionality (on Wintel, anyway), and Microsoft and Apple providing instant metadata indexing at the file system (in Longhorn and Tiger), is there a market for anyone else?

Apparently X1 wonders as well. They say the $10MM will go to their yet-to-launch enterprise search product, which I guess aims it squarely in the face of players like Autonomy and Verity.

X1 Search Engine Takes $10MM

Yahoo! Shopping – Gift Finder

Today Y! Shopping rolls out a recommendation engine, licensed by ChoiceStream. I checked it out, at Yahoo! Shopping – Gift Finder and randomly tried to find a housewarming present. I got nothing.

Then I looked at the source code for the resulting page, which had lines like:

xmlhttp = new ActiveXObject("Msxml2.XMLHTTP");
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");

Bleah. Maybe once it’s out of beta it’ll work on a Mac.

Yahoo! Shopping – Gift Finder

Yahoo! News Beta

Have you seen the beta of Yahoo! news? Right away you can see it’s a redesign. While it was in alpha, company employees got to test drive it. I looked at it briefly and switched back, but eventually I got used to it and now I love it. It looks a lot more modern.

Think the redesign matters? Oh yeah — very much so. According to Nielsen/NetRatings and ComScore, Yahoo! News gets more visitors than any other news site, including CNN.

Not only is it a redesign, but it’s got some great new features. I like that I can toggle between headlines and summaries. I love the tabs so I can see headlines from different sources quickly without a page refresh. But I really like being able to add news sources — and the “My Sources” tab remembers which sources I had open and which I had closed. For years I’ve used My Yahoo as my start page, but primarily as a custom news front page. With the new “My Sources” feature, and ability to change the layout (somewhat), I can honestly see myself switching my home page to Yahoo News once it gets out of beta.

But even if I didn’t switch my home page, I love the “Related Search Results” feature, as indicated by the larger words and purple chevron icon in the story text. This is the Y!Q search technology, well-integrated, but not obnoxiously so. Seriously cool stuff.

All the “+ My Yahoo” and “XML” badges are a bit much, though. It’s also definitely beta – some things are still kinda wonky. But I’ve already switched over to it. The old site looks so … old.

Yahoo! News Beta

Mojo et al

A bit of a buzz today around Om Malik’s How Yahoo Got Its Mojo Back with the attendant lovers and haters commenting along. As is the norm, a lot of the haters (of both Yahoo and Google) don’t know what they are talking about.

I still find it surprising how seemingly intelligent people can march up and down about how one service is amazing and the other is absolute rubbish. If it works for you, great. I know people who swear by the gmail interface, and others who swear at it. Some people want My Yahoo, and others prefer Google News. So be it.

Regardless, I think what makes both wonderful is the competition. Microsoft is coming? Hey, jump in – the water’s great.

Mojo et al

Flickrizing Yahoo!

Not a lot of blogging lately – not because there’s nothing to talk about, but because I’m up to my eyeballs in resumes and recruiting. (If you can code, and you understand web data, get in touch!)

Regarding Yahoo!’s purchase of Flickr – some random thoughts:

  1. I suspect Flickr will influence Y! more than the other way ’round.
  2. Tagging (aka folksonomies) will show up in other places on Y!.
  3. We (SDS) need a strategy for figuring out how to analyze/report on tags (perhaps with similar technology that’s used to power the the buzz index).
  4. Tags are going to give Overture and Google a whole new set of opportunities and headaches for context advertising. On the surface, they look like they could be used like search terms, but in so many ways, they’re a lot different.
  5. I’m glad I had the foresight (or lack of imagination) to create a Flickr ID that’s the same as my Yahoo! ID.
Flickrizing Yahoo!

Firefox? Yes Please.

I’m one of those guys that runs around thinking people should use the Firefox browser. Many people inside Y! do use it, but they are generally the early adopters. A week or two ago, posters went up around campus announcing an internal test of a new service (no, it’s not Yahoo! 360°). It looks pretty useful, so I went to check it out.

Windows and IE only. I couldn’t believe it.

So forgive me for snickering when I read Yahoo vows to open all services to Firefox users. It said Yahoo would not launch any new services without Firefox support. Cool.

Maybe the product in internal test will get modified to support other browsers, or operating systems, before it’s released. That would be great. And then maybe Launch could support non-Windows machines too…

Update: Sigh.

Firefox? Yes Please.

The web data pipelines

I wanted to address another observation given in the article Things That Throw Your Stats. The author makes the statement:

Web analysis is statistics, not accounting.

While I think his overall message is a disservice to the people trying hard to increase accuracy and accountability on the web, I won’t go on about that here. Instead, I want to point out that his view of web analysis is too narrow.

Actually there are three different components to web analysis. At Yahoo! we have many sources of data, but fundamentally three data pipelines:

  • Operational
  • Financial
  • Analytical

Each may start from a central place, such as the web server log files, but they move through the infrastructure at different speeds, and in different ways, because they are used for different things.

The operational data pipeline is largely concerned with availability, quality of service, consistency, correctness, etc. Some of the analysis needs to be available in real-time, and some of it much less so. A lot of the analysis is accounting, but there’s statistics involved for things like failure prediction.

The financial data pipeline is all about the money. If you can’t account for it, you can’t charge for it. Since Y! is largely ad-driven, it’s important to get this aspect right. A 10% “fudge” won’t sit right with advertisers, nor with shareholders, nor with the fine folks who brought you Sarbanes-Oxley. Not everything needs to be collected (e.g. click paths aren’t very interesting), just metrics like ad views and clickthroughs. It’s not real-time, but needs to be available relatively soon after a campaign ends, or at the end of an accounting quarter. This is largely straight accounting, yet there are statistics involved, for things like detecting click fraud.

The analytics data pipeline largely parallels the financial pipeline, but doesn’t have to be SOX-compliant. Also much more data is collected (e.g. browser string), and even more data is algorithmically computed (e.g. visit duration). The intention, of course, is to use analytics to impact the other two systems. The tricky part is that the way to positively impact the operational and financial systems is by improving the user experience (better response times, more engaging content, etc.) which largely must be inferred through observed behavior. There’s some accounting here, but largely statistics, advanced metrics, and data research/mining, with a heavy dose of human-based synthesis. Some of the results of the analytics systems feed the operational pipeline, for things like providing targeted advertising based on observed interest.

While the group I’m in largely focuses on strategic uses of the web data – the analytics pipeline – it’s never done in a vacuum; we’re always cognizant of the other two pipelines. All three groups – operational, financial, and analytical – are all doing analysis, all with the same source data, all towards the same overall goals. The data we keep, the tools we use, and the methods we employ can be very different, but it’s always a combination of accounting and statistics – never just one or the other.

The web data pipelines