Cookies, IP Addresses and Unique Users

I’ve been thinking about how to track unique users today. These are my so-far-unorganized thoughts. Please comment!

You can’t track users by cookie alone for a couple of reasons: 1) they might use multiple computers, 2) they might delete cookies, 3) multiple users might share the same computer (same account=same cookie)

You also can’t track users by IP address alone for some more reasons: 1) they might be using a mobile device or portable computer that moves from IP to IP, 2) there could be multiple machines passing through a single gateway IP (i.e. LAN NAT).

However, if you combine the cookie/IP information together, you can start to address some of these issues. Let’s assume you have some webserver logs that minimally contain <IP address "A">, <cookie ID "C">, <timestamp T> triples.

            === time ===>
PATTERN 1:
C1-A1 ---      ---
C1-A2    ---         ---
C1-A3       ---   ---

PATTERN 2:
C1-A1 ------
C2-A1       ------
C1-A2             ------

PATTERN 3:
C1-A1 -----!
C2-A1       -----!
C3-A1             ------

PATTERN 4:
C1-A1 ------      ------
C2-A1     ----------

PATTERN 5:
C1-A1 -----     -----
C2-A1      -----     ---

This matrix indicates the compatibility of each of the patterns (P1-P5)
with several different classes of cookie/IP address combination that we
might want to detect.

                                     patterns
                               P1  P2  P3  P4  P5
profiles                      --------------------
multiple users per IP        | -   +   -   +   +
multiple users per cookie    | -   -   -   -   -
multiple IPs per user        | +   +   -   -   -
multiple cookies per user    | -   -   +   -   +
cookie deletion              | -   -   +   -   -
"permanent" IP change        | -   +   -   -   -

Note that none of these patterns gives any indication for the “multiple
users per cookie” profile. To assess if there is more than one
user/cookie, you might want to look at the context in which you’re
observing the cookie. Consider attributes like (timezone corrected)
time-of-day, day-of-week, type of content being viewed.