Last week the New York Times carried an article entitled The Age of Big Data written by their technology correspondent Steve Lohr. The article makes for very interesting reading, as it addresses technological advances that allow the manipulation and analysis of some of the enormous amounts of data generated through Internet use, data storage and even transport monitoring, to name just a few.
To give a couple of very simple examples the author states that a peak in the search terms ‘Flu symptoms’ in an area correlates to an increase in hospital admissions for Influenza about 2 weeks later. An analysis of the rate of such search terms as ‘house for sale in…’ can give a more accurate prediction of sales rates and house prices the following year than asking estate agents what they think the state of the market will be.
Software is being developed known as ‘sentiment analysis’ that can measure the mood of writers by looking at their choice of words in messages and on blog posts, and this information may be useful in determining negative economic situations and associated job security in the areas involved, allowing for early intervention to avoid drops in living standards.
The author also refers to problems of privacy infringement, as the cross referencing of personal data becomes easier to manage on a much larger scale, and one of the realities created is that anonymous data becomes traceable through cross reference. See this article about AOL data release and traceability for an example.
An article published in the same week on the Health Care Blog entitled Privacy in the age of Big Data addresses some of these issues. Authors Omer Tene and Jules Polonetsky claim that ‘data create enormous value for the global economy, driving innovation, productivity, efficiency, and growth. At the same time, the “data deluge” presents privacy concerns that could stir a regulatory backlash, dampening the data economy and stifling innovation’. They go on to state that ‘in order to craft a balance between beneficial uses of data and the protection of individual privacy, policymakers must address some of the most fundamental concepts of privacy law, including the definition of “personally identifiable information,” the role of consent, and the principles of purpose limitation and data minimization’.
The problem raised is that the use of this data can be of great advantage for society in terms of health and (possibly less so) improving services. Without using the actual word they seem to advocate responsibility in the collection of this data. They advocate the use of a “risk matrix, taking into account the value of different uses of data against the potential risks to individual autonomy and privacy. They argue that where the benefits of prospective data use clearly outweigh privacy risks, the legitimacy of processing should be assumed even if individuals decline to consent. I would argue however that the creation of such a matrix would suffer from all of the issues that are associated with data collection in the first place, namely our inability to see into the future to determine how it may be used and its application with technology that has not yet been developed.
They also address the problem of individual consent, raising the point that an individual may not be able to make responsible decisions about their personal data because they do not possess enough knowledge either about the present state of play or future developments to do so (and I would again argue neither does anybody else).
The article also cites Alessandro Acquisto’s work on privacy and informed consent issues. In a related article on this website I reviewed an as yet unpublished article he co-authored about the development of a face recognition mobile phone app. The app aims to identify a person in real time using information freely available on the web just by taking a photo of them and using the cross-referencing capabilities discussed above. Read the article here.
While I have only touched on the ethical implications here that seem far from resolved, one thing seems certain, in the near future there will be high demand in the job market for those people with joint mathematics and computer degrees.
—————-
(photo: Data Represented in an Interactive 3-D Form by Idaho National Laboratory from Flickr)