Lies, D*mn Lies, and Statistics Canada II: Internet Privacy & Security

With Statistics Canada having been criticized in the news recently, it’s good to see some of the real applications that impact Canadian businesses and lives, such as the Canadian Internet Use Survey.  But I think practitioners–and the general public–still aren’t quite fulfilling “due diligence” in either citing the Statistics Canada information or in how they perceive and interpret it.  Even following Statistics Canada’s own perfectly-correct guidelines about whom the results do and do not represent or whether a significant correlation can or cannot imply causation, the data may still not be giving the answers we think they are.

Statistics Canada’s Canadian Internet Use Survey is often cited by public interest groups, not-for-profit organizations, and marketers to support all manner of opinions.  What I am mostly concerned about this time is the portion of it concerning Internet Privacy and Security concerns.

Although the mere five questions with only three possible levels of concern (None at all, Concerned, or Very concerned) may have been sufficient to determine that Privacy and Security is one of Canadians’ leading concerns, we know consider Privacy and Security a top concern.  Five questions with only three levels of concern is no longer responsibly-adequate to be meaningful.  (I am mostly-facetious when I propose that the number of Canadians actually concerned was severely overstated because anyone that wasn’t oblivious or reckless was considered at least “Concerned” in the first place).  Knowing how important Privacy and Security is, and knowing how often-cited those statistics are,  I think the Stats Can survey is doing a disservice to Canadians, their concerns, and the businesses that benefit from it.

For example, if people take the time to examine the actual survey questions pertaining to Privacy and Security http://www.statcan.gc.ca/imdb-bmdi/instrument/4432_Q1_V8-eng.htm#a10

Section: Privacy and security (PS)

PS_BEG
Beginning of Section

PS_R01
The next set of questions relate to privacy and security concerns on the Internet.

PS_Q01
In general, how concerned (are you/would you be) about privacy on the Internet? For example, people finding out what websites you have visited, others reading your e-mail?

Interviewer: Read categories to respondent.

  1. Not at all concerned
  2. Concerned
  3. Very concerned
    DK, RF

Coverage: All respondents

PS_Q02
How concerned (are you/would you be) about conducting banking transactions over the Internet?

Interviewer: Read categories to respondent.

  1. Not at all concerned
  2. Concerned
  3. Very concerned
    DK, RF

Coverage: All respondents

PS_Q03
How concerned (are you/would you be) about using your credit card over the Internet?

Interviewer: Read categories to respondent.

  1. Not at all concerned
  2. Concerned
  3. Very concerned
    DK, RF

Coverage: All respondents

PS_Q04
How concerned (are you/would you be) about providing personal financial information to government departments over the Internet? (e.g., applying for employment insurance or a student loan?)

Interviewer: Read categories to respondent.

  1. Not at all concerned
  2. Concerned
  3. Very concerned
    DK, RF

Coverage:  All respondents

PS_Q05
How concerned (are you/would you be) about giving personal, non financial information to a government official in Canada over the Internet?

Interviewer: Read categories to respondent.

  1. Not at all concerned
  2. Concerned
  3. Very concerned
    DK, RF

Coverage: All respondents

PS_END
End of Section

they will note that there are a total of five questions. Those who have taken statistics will recognize that the meaningful options of “Not at all concerned,” “Concerned,” and “Very concerned” imply ordinal data (there is a consistent directionality in the variables).

Those of you who have taken some survey and research design might be concerned, however, that the “centre” choice (sometimes questionnaire-designers purposely give an even number of choices to avoid a dead centre choice) does not at all imply middle of the road. In fact, if a respondent is not absolutely free of concern about privacy (ie. reckless), then any other choice will enumerate them amongst the concerned. There are many of us who have “appropriate” caution when we conduct business online (ie. would not describe ourselves as either apathetic or reckless) but are also would not consciously be concerned about privacy and security under normal conditions (ie. would not describe ourselves as neurotic or paranoid).

Vote for Robin in the 2010 CIRA Board Elections!

The 2008-2009 CIRA Annual Report demonstrates how significantly these data have impacted CIRA’s initiatives, ranging from DNSSEC to BIND10 to WHOIS privacy http://www.cira.ca/annual-reports/2009/en/c_dns_03_en.html. But the primary survey to be cited employs only five questions that will inherently bias responses towards overestimating the amount and degree of concern Canadians have because of its pecular scale.

Highly-qualified statisticians and researchers at Statistics Canada go to a lot of trouble trying fastidiously to apply accepted theory in questionnaire, survey, and sampling design according to traditional principles of maximizing face validity, content validity, criterion validity, Likert scale best practices, stratified random sampling, and making sure that the report reflects accurate interpretation under the correct circumstances in the proper contexts.

But used out of context or with varying lower degrees of external validity (generalizability), all that effort can be wasted–or worse, reinforce the popular notion that statistics are somehow worse than both lies and d*mn lies http://robincheung.info/mbalog/2010/07/21/lies-dmn-lies-and-statistics-statistics-is-actually-your-friend-when-not-misused/

This time, I’m not blaming people for using statistics out of context to support their arguments; I’m suggesting that Statistics Canada should amend the survey.

There is a mechanism for interested businesses, individuals, and Statistics Canada to understand each other and develop surveys that are more meaningful and accurate, by the way.  This October 26 to 29, 2010, Statistics Canada is hosting the 2010 International Methodology Symposium in Ottawa, ON.   If you can’t make it to that event, Statistics Canada maintains a web site about its training, conferences, and research events: http://www.statcan.gc.ca/services/workshop-atelier-eng.htm

Lies, D*mn Lies, and Statistics: Statistics is actually your friend, when not misused.

This post is a response to an article, posted on her F*cebook Wall by a friend, Stacey Burkett: http://timesofindia.indiatimes.com/Life/Relationships/Man-Woman/Women-are-most-attractive-at-

F-test Fisher-Snedecor distribution compares elements of variation in data: the basis of ANOVA, one of the most common statistical tests.

31/articleshow/6187549.cms.  When my Wall comment exceeded the length of most of even some of my longer blog posts, I decided that I should actually make it an actual blog post.

This post is also relevant to the recent move by Statistics Canada to eliminate the long-form census, issued to 20% of the population every five years, that must be completed under penalty of law and replace it with an entirely-voluntary one; the same principles apply, although I will dedicate another blog post to the Statistics Canada census issue to illustrate the principles in another application as well as to respond more precisely to specific issues.

Surveys and statistics are used to describe all of us, all the time.  Used by marketing researchers who want to define and characterize target markets and psychologists who want to determine the impact of certain personality traits on job performance, surveys can characterize a target population of interest, with known precision, without requiring a census (“census” is a formal survey design term that refers to measuring every member of a population, rather than a “sample,” which is a smaller subset of the population, selected such that the results from it can be generalized to represent the entire population in cases where it is impractical or impossible to conduct a census.  But it is not only population size that limits our ability to conduct a meaningful census; the simple fact that not every individual that is relevant to your survey will be alive at the same time can make a true census impossible.)

LIES, DAMN LIES, AND STATISTICS

“Lies, damn lies, and statistics,” is a reference to the abuse of statistics to support a position.  I feel that this cliché has, itself, been abused and resulted in the unnecessary malignment of statistics, which is actually an extremely powerful tool not only to characterize populations or phenomena, but to predict events, with known confidence (such as the application of probit models and logistic regression that take categorical or numeric predictor variables, such as age, income level, and preferences, that describe a customer segment and calculate the probability that customers will purchase a new product).  I am even more dismayed at the cynicism that has come to surround statistics; whilst the cliché describes the intentional and unintentional abuse of statistics out of context or inappropriately with intent to influence rather than inform, most people–even those who have taken an introductory statistics course in university (perhaps especially those, since most people are thoroughly confused and intimidated by the subject after an introductory course)–do not have sufficient understanding of the theoretical bases for statistical techniques to see the power in them.  Our world is not a deterministic place; even the most reliable process will occasionally yield unexpected results.  Thus, it is vitally important that we can quantify the likelihood that an observation is truly a characteristic result of a phenomenon and state how confident we are that a given observation was not the result of random chance.

Binary outcomes can be modeled with the probit model

The rest of this post pertains to the article that my friend, Stacey, posted to Facebook: http://timesofindia.indiatimes.com/Life/Relationships/Man-Woman/Women-are-most-attractive-at-31/articleshow/6187549.cms The article claims that a 2,000 man and woman survey administered by QVC, an American shopping channel, established that “females in their early thirties are seen as more attractive than younger girls as they are more confident and stylish.”  Although the article is clearly intended as a lighthearted attempt to console their aging customers by presenting findings that run contrarily to what most people would expect, published statistics have a way of turning up supporting an opinion that they do not legitimately apply to.  With respect to the QVC survey, I believe that the results should always be accompanied with a disclaimer outlining the specific population for which the findings can be considered valid; else, surely such a finding would eventually be applied to justify discrimination or disadvantage individuals unfairly.

Having just completed RSCH 8200 in my third straight quarter of research design coursework, survey design–specifically, internal and external validity, and reliability–is quite fresh in my mind. Especially now that the “Long-form Census” is so prominent in the news, I thought a quick run-down on sampling and survey design would help us all put the discussions we hear in the news into perspective–I found that many of the arguments presented by “experts” in the news are not compelling to someone that is trained in statistics because they often will take an extreme position which is not necessarily relevant, such as implying to the public that there are not methods to quantify how relevant certain findings are to the general public or how consistently people will give a response.

Critical thinkers who think ahead but lack discipline in their problem solving would, by now, be wanting to ask, “How do you quantify attractiveness? What is considered “attractive” versus “unattractive”? Is a sample of 2,000 adequate to establish this?

In order to have any meaning at all (which is the origin of that saying, “There are lies, d*mn lies, and statistics,” which much maligns statistics, which is actually the best quantitative tool we have to evaluate results and estimate confidence in them), it is important to keep the following considerations in mind (all other aspects of the design being done “by the book”):

EXTERNAL VALIDITY / GENERALIZABILITY:
External validity is a characteristic that quantifies how generalizable the findings are. Do they apply only to native residents of Toronto? all of Ontario? Eastern Canada? To evaluate this in a social sciences setting, we use our understanding of the underlying theory to inform our design (how attractive a person is may be affected by any or all of physiology, culture, how cosmopolitan was someone’s hometown, any specific trauamatic or pleasurable experiences a person had, etc.)

SAMPLING STRATEGY
The choice of sampling strategy used (random sampling, stratified sampling, purposeful non-probability sampling, etc.) is important to match the population the sample is to be representative of. Most people intuitively know that a sample must be random in order to be representative. But being random is not the only important consideration; if the population QVC wishes to characterize comprises 80% women and 20% men, then the 2,000-subject sample should comprise 1,600 women and 400 men–a 50/50 mix would not be representative of their customer base, if gender contributes to aesthetic preference.  Similarly, if men and women of different ages tend to have different preferences, the random sample should also comprise similar age proportions to the population of interest.

SAMPLING SIZE
Whenever poll results are presented in the news, particularly during governmental elections, we are used to hearing “Poll is accurate to within 3.1% 19 times out of 20.” This means that the sample was adequate to be 95% confident that the results from the poll’s sample are within 3.1% of the true results of the entire populaton. In order to do this, we need to consider what kind of statistical analyses will be done (this determines what measures will be relevant), the size of the effect being studied (phenomena with stronger effects generally need smaller samples to be confident of their effects), and the desired confidence (in most cases, 95%). Software such as G*Power 3 (http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/) can be used to calculate the sample size that is required to attain a given margin of error for a given effect size and statistical test. In doctoral research, dissertation committees and Internal Review Boards will generally require justification of proposed sample size in order to assess and minimize the burden on test subjects.

INTERNAL VALIDITY:
There are several facets to Internal Validity, but in general, they all pertain to ascertaining “how well does this survey actually measure the underlying construct that I intend to measure?” For example, face validity assesses how good a question is as a proxy to what you actually want to know: “What was your last grade completed?” is less valid than “How many years of school did you complete?” if you want to compare amount of education to job performance without regard to at what level the schooling was–people who have skipped a grade would have completed a higher grade level for the same number of years of schooling. Content validity describes how completely the survey describes the contributing factors to a phenomenon; in the above example, a survey that records only the number of years of schooling, without regard to at what levels, would lack content validity in describing the relationship between job performance and education completed, because 8 years of education between Grade 1 and Grade 8 would not have the same results as 8 years of education between Grade 9 and completing a Bachelors degree.

RECOMMENDATION: ALWAYS ACCOMPANY RESULTS WITH EXPLICIT DISCLAIMER

While it is clear that QVC intended this survey as a lighthearted consolation to its aging customers by presenting results that seem to run contrary to what most people might guess, statistics have a way of turning up supporting controversial opinions that are not valid for the sample used.  In order to minimize harm to individuals and mitigate the malignment of statistics, QVC should accompany its results with an explicit explanation of the population that its findings can be legitimately applied to.