National Public Radio (NPR) is sounding the alarm regarding big data. It reported on a recent gathering of health insurance professionals in San Diego last June, in Health Insurers Are Vacuuming Up Details About You — And It Could Raise Your Rates. I mentioned the story and its implications to a few recently, trying to gauge their impressions. Reactions ranged from "That's alarming" to "NPR is always sounding an alarm about something." Some take NPR as undeniable truth and others simply do not find it credible. Only you know where you are on that spectrum.
But for the sake of argument, its reporting may at least provide a foundation for discussing the implications of data harvesting and privacy. The news has been full of privacy stories recently. Revelations about who has access to our information, and to whom they are selling it, have become a commonplace staple recently. In response, the Europeans have tightened privacy, as has California. Privacy, it seems, is a serious concern to many.
According to NPR, there is a potential for health insurers to make decisions based on "social determinants of health." It contends that it is not only possible, but practical for everything you buy, eat, or do to be monitored in the modern world. NPR contends that information such as education level, marital status, gender, and race might all be known to your health insurer. Facts such as gun ownership, gym membership, what magazines you subscribe to, and how often you change jobs might also be of interest. And, it asserts that all of that data might be used to "help determine how much you pay for health insurance."
NPR says that industry (data brokers who collect and then sell your details) "are tracking your race, education level, TV habits, marital status, (and) net worth." In our digital age, all of this information is apparently reasonably easy to accumulate, aggregate, and dissect. We voluntarily use loyalty programs at retailers, trading some anonymity for a discount on some products. We shop online and allow those retailers to know much about us based on what and how often we purchase.
Those are perhaps reasonably obvious. But, NPR contends that more subtle things may also be used. For instance, a woman with a name change might signal a new marriage that some database could equate to a probability of "a pricey pregnancy," or alternatively "maybe you're stressed and anxious from a recent divorce." Is our weight changing (buying larger clothes)? Is our income level increasing or decreasing (implicate our food or fitness choices)? Are we a minority? What are the characteristics of our neighborhood? NPR contends that these and more might be stereotyped and used by supercomputers.
Potentially, all of these and more have implications for our health. Or, they may just somehow be linked with healthiness in some computerized analysis. They may predict what health issues we are likely to face. Of course, from a separate set of data, insurance companies are already familiar with the costs and variants of treating such potential health issues. Thus, knowing the cost of an outcome, and predicting the likelihood of that outcome are two important variables in an equation predicting cost. That prediction might mean we can buy coverage or not, and it could affect price.
NPR's perspective is that there appears no denying that insurance companies are accumulating and using data about us today. It explains that some companies admit they are doing so, and explain that their use is benevolent. They say that by studying the person, they may "spot health issues," and thus be prepared to assist their clients "so they get services they need." Some companies are reported to currently decry that such data would be used in determining the pricing decisions regarding health insurance. However, it appears that some believe that such implications are at least possible.
Our immediate reaction may be that our health data is protected. There are federal and state laws that protect our medical records. Anyone who has been to the doctor is familiar with the raft of forms that must be signed regarding those records. I even met a patient once who had attempted to read all of those forms before signing (sarcasm). Truly an ambitious undertaking. However, NPR warns that the data being employed is not from our health records. It contends that instead data is being harvested from public information available on the Internet.
There are multiple concerns in the pricing debate. First, there is concern, discussed above, that personal information would be used in setting prices based upon assumptions as to the manner in which the way we live might be predicted to impact our health. Second, there is a concern that some information gleaned about us from the Internet might be inaccurate about us individually (there was a funny parody ad that featured people concluding you can't put anything on the Internet that is not true - too funny). Third, there is concern that anecdotal relationships between fact and prediction may be inaccurate presumptions about larger populations. And, finally, there are those who see the potential for discrimination based upon this information.
Well, in case you did not know, car insurance companies have discriminated against people for years. Gender has been a fact that they consider in setting rates. As a result, it is likely that a female will have lower automobile insurance rates than a male. In fact, CBS News Miami recently reported that a Canadian changed gender, from male to female, to enjoy the insurance savings that comes from the assumptions insurance companies make about gender and driving. Might someone change their identity otherwise in pursuit of savings?
NPR explains that there are various companies involved in this "data mining" business already. Some are large and others just beginning. Reportedly, one already has data sets encompassing "150 million Americans going back to 1993." And, they are purportedly monitoring your social media, gathering data on you from your online interactions, searches, and interests.
There are those who believe insurers will use this landslide of data to sift and select those that it will insure. NPR calls this "cherry picking," and asserts that insurance companies have practiced it "historically." It suggests that this practice will continue, but that the available data in both breadth and depth will enhance the manner in which, and perhaps the success of, the selections are made.
I have written about the implications of Ross, AI and the new Paradigm Coming (March 2016). Artificial intelligence is intriguing. Once relegated to the back bench of science fiction, AI is rapidly becoming science fact. Ross is a legal research tool built on the IBM Watson foundation. Remember Watson beat humans on Jeopardy some years ago? NPR reports that IBM is using the same learning, evolving, artificial intelligence to assess socioeconomic factors for insurance companies.
The implications of such tools may give us pause. It might be used to identify people who present significant loss risk. However, the analysis might instead focus on demographic or geographic groups who are seen as either injury or disease-prone. NPR quotes one source suggesting that living in the wrong place could cost you money. That thought reminded me of Elaine (Seinfeld) struggling to have food delivered from a particular restaurant. Remember when she asked a man if she could use his apartment to fool the restaurant into delivering to her? Might one conceal their address to similarly affect insurance pricing?
Some of this data connectivity may seem pretty obvious. For example, someone who purchases cigarettes might find it more challenging to buy health insurance. But, as these databases grow in both the volume of people tracked and the health outcomes observed, a variety of changes might be seen. For example, it may turn out that people who subscribe to the New York Times are seen over a period to be more or less likely to require various medical supplies, visit some particular type of specialty, or undergo some procedure. That relationship between reading material and health consumption may be at once entirely accurate and entirely coincidental. However, some logarithms may nonetheless conclude that reading the Times is (good or bad) for your health, and adjust pricing accordingly.
NPR says that it will spend the coming months addressing various aspects of this developing story. It will be curious to watch, regardless of your perspective on NPR and its cohort ProPublica. Whether you are inclined to trust these or not, the subjects of privacy and data harvesting are real. There is indisputable, admitted, evidence that things like gender have influenced insurance decisions in the past. Thus, the real questions likely do not include "Will information about me be used to determine coverage and cost?" The real questions, instead are more likely: (1) which data, (2) how much data, (3) how accurately, and (4) how appropriately, "will information about me be used."
Some may conclude that this is all innocuous and mundane. For my part, I am retreating with the Luddites. I am reverting to only cash purchases, in person, in small stores. I am eliminating my Internet footprint, closing my online shopping accounts, and canceling all of my subscriptions. And, maybe I can figure out a way to know who is reading this blog, and sell that information to some supercomputer data broker somewhere?