Analytics Uses and Limitations Overview
Website Analytics are a set of tools that allow us to track and analyze how the A2JViewer is used. It collects and organizes location of user, language, system info, browser info, pages visited, time spent on each page, and links clicked in site or app. There are many different systems available for collecting this info. The most popular- and one you may have heard of- is Google Analytics. Though Google Analytics is very capable and well supported there are concerns that the nature of Google's business model- which involves mining individual users data and selling to 3rd parties- poses a large material risk to the end clients of the viewer. Because of the sensitive nature of the data collected within the viewer, CALI does not use Google Analytics. Instead CALI has chosen the open source tool matomo (formerly piwik). The matomo code is not only auditable by CALI or anyone else who obtains the source, all settings can be managed by our system administrators.
The most basic use case of analytics is to count visits to site and pages. These are the easiest numbers to understand and present when trying to understand service usage. In an ideal world, this number would have three properties: it would be accurate in that it counts every visit, it would map exactly one person to one profile. Our wonderful real world unfortunately presents several challenges to meeting these properties.
Challenges to get accurate numbers:
Non-unique profiles:
Matomo generates a unique profile using an IP address and a tracking cookie. An IP (Internet Protocol) address is a unique identifier for a specific network interface on a network.IPs must be unique amongst devices online in the network at the time but are not necessarily tied to single device even if the device never disconnects from the network. This is like a ticket number at a deli where every customer gets a unique ticket until their order is completed. A cookie is a small piece of information accessible only to the browser (and any user with rights to the browsers files) and the site that owns the cookie. It is possible for tracking profiles to be shared. This can occur through shared computer like a kiosk or ip reuse. Non unique profiles for A2J matomo manifesting from ip reuse is fairly unlikely as the ips would have to be reused fairly close together and the new user would have to access the viewer.
Blockers:
A large part of the internet uses tracking blockers. These visits can still be counted by using server logs instead of browser based tracking but doing this may violate an expression of the desire to not be tracked. Do not track policies are typically set in the browser and matomo is set up to respect it.
Bots: not all runs represent real users. Bots- programs that automate the navigation and actions on a site- will appear as a run. Fortunately, bots are far fewer than real users, usually do not navigate more than one or two pages in the viewer, are usually identified by the fact that they tend to originate far from the Guided Interviews target jurisdiction.
Geolocation:
geolocation is currently obtained via ip address. IP addresses are at best correlated to a geographic location but the size of that region is extremely variable. This is especially true for mobile based ip location since IP addresses may be reused for devices all over the country depending on the carrier. The tendency is for larger sample size of visitors will better reflect where users are accessing the service from in general. Matomo uses ipv4 addressees which are a set of 4 numbers called octests. To protect the data and privacy of the SRLs CALI only records the first two octetcs in matomo. This is usually sufficient for a general region. In terms of anonymizarion,in a worst case scenario where the first two octets perfectally identify a locality and having all four numbers represents a unique device, using only the first two leaves for 2^16 or 65536 possible devices per locality.
Languages
Matomo will list the browsers current language preference but this may not necessarily be the user’s language preference though it is expected to be strongly correlated since most users will be using a device they conrol.