Ten years ago, the editor of Wired Magazine published an article claiming the end of theory. "With enough data, the numbers speak for themselves."
The idea that data (or facts) speak for themselves, with no need for interpretation or analysis, is a common trope. It is sometimes associated with a legal doctrine known as Res Ispa Loquitur
- the thing speaks for itself. However this legal doctrine isn't about truth but about responsibility: if a surgeon leaves a scalpel inside the patient, this fact alone is enough to establish the surgeon's negligence.
Or even the world speaks for itself. The world, someone once asserted, is all that is the case, the totality of facts not of things. Paradoxically, big data often means very large quantities of very small (atomic) data.
But data, however big, does not provide a reliable source of objective truth. This is one of the six myths of big data identified by Kate Crawford, who points out, "data and data sets are not objective; they are creations of human design". In other words, we don't just build models from data, we also use models to obtain data. This is linked to Piaget's account of how children learn to make sense of the world in terms of assimilation and accommodation. (Piaget called this Genetic Epistemology
Data also cannot provide explanation or understanding. Data can reveal correlation but not causation. Which is one of the reasons why we need models. As Kate Crawford also observes, "we get a much richer sense of the world when we ask people the why and the how not just the how many".
In the traditional world of data management, there is much emphasis on the single source of truth
. Michael Brodie (who knows a thing or two about databases), while acknowledging the importance of this doctrine for transaction systems such as banking, argues that it is not appropriate everywhere. "In science, as in life, understanding of a phenomenon may be enriched by observing the phenomenon from multiple perspectives (models). ... Database products do not support multiple models, i.e., the reality of science and life in general.". One approach Brodie talks about to address this difficulty is ensemble modelling: running several different analytical models and comparing or aggregating the results. (I referred to this idea in my post on the Shelf-Life of Algorithms
Along with the illusion that what the data tells you is true
, we can identify two further illusions: that what the data tells you is important
, and that what the data doesn't tell you is not important
. These are not just illusions of big data of course - any monitoring system or dashboard can foster them. The panopticon affects not only the watched but also the watcher.
From the perspective of organizational intelligence, the important point is that data collection, sensemaking, decision-making, learning and memory form a recursive loop - each inextricably based on the others. An organization only perceives what it wants to perceive, and this depends on the conceptual models it already has - whether these are explicitly articulated or unconsciously embedded in the culture. Which is why real diversity - in other words, genuine difference of perspective, not just bureaucratic profiling - is so important, because it provides the organizational counterpart to the ensemble modelling mentioned above.
Chris Anderson, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete
(Wired, 23 June 2008)
Michael L Brodie, Why understanding of truth is important in Data Science?
(KD Nuggets, January 2018)
Kate Crawford, The Hidden Biases in Big Data
(HBR, 1 April 2013)
Kate Crawford, The Anxiety of Big Data
(New Inquiry, 30 May 2014)
Bruno Gransche, The Oracle of Big Data – Prophecies without Prophets
(International Review of Information Ethics, Vol. 24, May 2016)
Thomas McMullan, What does the panopticon mean in the age of digital surveillance?
(Guardian, 23 July 2015)
Evelyn Ruppert, Engin Isin and Didier Bigo, Data politics
(Big Data and Society, July–December 2017: 1–7)
Ian Steadman, Big Data and the Death of the Theorist
(Wired, 25 January 2013)
Ludwig Wittgenstein, Tractatus Logico-Philosophicus (1922)
Related posts Information Algebra
(March 2008)How Dashboards Work
(November 2009)Co-Production of Data and Knowledge
(November 2012)Real Criticism - The Subject Supposed to Know
(January 2013)The Purpose of Diversity
(December 2014)The Shelf-Life of Algorithms
Wikipedia: Ensemble Learning
, Genetic Epistemology
, Res ispa loquitur (the thing speaks for itself)
Stanford Encyclopedia of Philosophy: Kant and Hume on Causality
For more on Organizational Intelligence, please read my eBook.https://leanpub.com/orgintelligence/
With great power, as they say, comes great responsibility.
In London this week for Microsoft's Future Decoded
event, according to reporter @richard_speed of @TheRegister, Satya Nadella asserted that an AI trained for one purpose being used for another was "an unethical use".
If Microsoft really believes this, it would certainly be a radical move. In April this year Mark Russinovich, Azure CTO, gave a presentation at the RSA Conference on Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud Defense
Repurposing data and intelligence - using AI for a different purpose to its original intent - may certainly have ethical consequences. This doesn't necessarily mean it's wrong, simply that the ethics must be reexamined. Responsibility by design (like privacy by design, from which it inherits some critical ideas) considers a design project in relation to a specific purpose and use-context. So if the purpose and context change, it is necessary to reiterate the responsibility-by-design process.
A good analogy would be the off-label use of medical drugs. There is considerable discussion on the ethical implications of this very common practice. For example, Furey and Wilkins argue that off-label prescribing imposes additional responsibilities on a medical practitioner, including weighing the available evidence and proper disclosure to the patient.
There are often strong arguments in favour of off-label prescribing (in medicine) or transfer learning (in AI). Where a technology provides some benefit to some group of people, there may be good reasons for extending these benefits. For example, Rachel Silver argues that transfer learning has democratized machine learning, lowered the barriers to entry, thus promoting innovation. Interestingly, there seem to be some good examples of transfer learning in AI for medical purposes.
However, transfer learning in AI raises some ethical concerns. Not only the potential consequences on people affected by the repurposed algorithms, but also potential sources of error. For example, Wang and others identify a potential vulnerability to misclassification attacks.
There are also some questions of knowledge ownership and privacy that were relevant to older modes of knowledge transfer (see for example Baskerville and Dulipovici).
By the way, if you thought the opening quote was a reference to Spiderman, Quote Investigator
has traced a version of it to the French Revolution. Other versions from various statesmen including Churchill and Roosevelt.
Richard Baskerville and Alina Dulipovici, The Ethics of Knowledge Transfers and Conversions: Property or Privacy Rights?
(HICSS'06: Proceedings of the 39th Annual Hawaii International Conference on System Sciences, 2006)
Katrina Furey and Kirsten Wilkins, Prescribing “Off-Label”: What Should a Physician Disclose?
(AMA Journal of Ethics, June 2016)
Marian McHugh, Microsoft makes things personal at this year's Future Decoded
(Channel Web, 2 November 2018)
Rachel Silver, The Secret Behind the New AI Spring: Transfer Learning
(TDWI, 24 August 2018)
Richard Speed, 'Privacy is a human right': Big cheese Sat-Nad lays out Microsoft's stall at Future Decoded
(The Register, 1 November 2018)
Bolun Wang et al, With Great Training Comes Great Vulnerability: Practical Attacks against Transfer Learning
(Proceedings of the 27th USENIX Security Symposium, August 2018)
See also Off-Label
Excellent article by @riptari, providing some context for Gartner's current position on ethics and privacy.
Gartner has been talking about digital ethics for a while now - for example, it got a brief mention on the Gartner website last year. But now digital ethics and privacy has been elevated to the Top Ten Strategic Trends, along with (surprise, surprise) Blockchain.
Progress of a sort, says @riptari, as people are increasingly concerned about privacy.
The key point is really the strategic obfuscation of issues that people do in fact care an awful lot about, via the selective and non-transparent application of various behind-the-scenes technologies up to now — as engineers have gone about collecting and using people’s data without telling them how, why and what they’re actually doing with it.
Therefore, the key issue is about the abuse of trust that has been an inherent and seemingly foundational principle of the application of far too much cutting edge technology up to now. Especially, of course, in the adtech sphere.
And which, as Gartner now notes, is coming home to roost for the industry — via people’s “growing concern” about what’s being done to them via their data. (For “individuals, organisations and governments” you can really just substitute ‘society’ in general.)
Technology development done in a vacuum with little or no consideration for societal impacts is therefore itself the catalyst for the accelerated concern about digital ethics and privacy that Gartner is here identifying rising into strategic view.
Over the past year or two, some of the major players have declared ethics policies for data and intelligence, including IBM (January 2017), Microsoft (January 2018) and Google (June 2018). @EricNewcomer
reckons we're in a "golden age for hollow corporate statements sold as high-minded ethical treatises".
According to the Magic Sorting Hat, high-minded vision
can get organizations into the Ravenclaw or Slytherin quadrants (depending on the sincerity of the intention behind the vision). But to get into the Hufflepuff or Gryffindor quadrants, organizations need the ability to execute
. So it's not enough for Gartner simply to lecture organizations on the importance of building trust.
Here we go round the prickly pear
Prickly pear prickly pear
Here we go round the prickly pear
At five o'clock in the morning.
Natasha Lomas (@riptari
), Gartner picks digital ethics and privacy as a strategic trend for 2019
(TechCrunch, 16 October 2018)
Sony Shetty, Getting Digital Ethics Right
(Gartner, 6 June 2017)
Related posts (with further links)Data and Intelligence Principles from Major Players
(June 2018)Practical Ethics
(June 2018)Responsibility by Design
(June 2018)What is Responsibility by Design
Responsibility by design (RbD) represents a logical extension of Security by Design and Privacy by Design, as I stated in my previous post. But what does that actually mean?
X by design is essentially a form of governance that addresses a specific concern or set of concerns - security, privacy, responsibility or whatever.
- What. A set of concerns that we want to pay attention to, supported by principles, guidelines, best practices, patterns and anti-patterns.
- Why. A set of positive outcomes that we want to attain and/or a set of negative outcomes that we want to avoid.
- When. What triggers this governance activity? Does it occur at a fixed point in a standard process or only when specific concerns are raised? Is it embedded in a standard operational or delivery model?
- For Whom. How are the interests of stakeholders and expert opinions properly considered? To whom should this governance process be visible?
- Who. Does this governance require specialist input or independent review, or can it usually be done by the designers themselves?
- How. Does this governance include some degree of formal verification, independent audit or external certification, or is an informal review acceptable? How much documentation is needed?
- How Much. Design typically involves a trade-off between different requirements, so this is about the weight given to X relative to anything else.
Check out @katecrawford talking at the Royal Society in London this summer. Just an Engineer
Related postsPractical Ethics
(June 2018), Responsibility by Design
Lidl is looking to press ahead with standardizing processes worldwide and chose SAP ERP Retail powered by SAP HANA to do the job (PressBox 2, September 2015)November 2016
Lidl rolls out SAP for Retail powered by SAP HANA with KPS (Retail Times, 9 November 2016)July 2018
Lidl stops million-dollar SAP project for inventory management (CIO, in German, 18 July 2018)
Lidl cancels SAP introduction after spending 500M Euro and seven years (An Oracle Executive, via Linked-In, 20 July 2018)
Lidl software disaster another example of Germany’s digital failure (Handelsblatt Global, 30 July 2018)
I don't have any inside information about this project, but I have seen other large programmes fail on because of the challenges of process standardization. When you are spending so much money on the technology, people across the organization may start to think of this as primarily a technology project. Sometimes it is as if the knowledge of how to run the business is no longer grounded in the organization and its culture but (by some form of transference) is located in the software. To be clear, I don't know if this is what happened in this case.
Also to be clear, some organizations have been very successful at process standardization. This is probably more to do with management style and organizational culture than technology choices alone.
Writing in Handelsblatt Global
, Florian Kolf
and Christof Kerkmann
suggest that Lidl's core mentality was "but this is how we always do it". Alexander Posselt
refers to Schicksalsgemeinschaften, which can be roughly translated as collective wilful blindness. Kolf and Kerkmann also make a point related to the notion of shearing layers.
Altering existing software is like changing a prefab house, IT experts say — you can put the kitchen cupboards in a different place, but when you start moving the walls, there’s no stability.
But at least with a prefab house, it is reasonably clear what counts as Cupboard and what counts as Wall. Whereas with COTS software, people may have widely different perceptions about which elements are flexible and which elements need to be stable. So the IT experts may imagine it's cheaper to change the business process than the software, while the business imagines it's easier and quicker to change the software than the business process.
What will Lidl do now? Apparently it plans to fall back on its old ERP system, at least in the short term. It's hard to imagine that Lidl is going to be in a hurry to burn that amount of cash on another solution straightaway. (Sorry Oracle!) But the frustrations with the old system are surely going to get greater over time, and Lidl can't afford to spend another seven years tinkering around the edges. So what's the answer? Organic planning perhaps?
Thanks to @EnterprisingA
for drawing this story to my attention.
Slideshare: Organic Planning
(September 2008), Next Generation Enterprise Architecture
Related Posts: SOA and Holism
(January 2009), Differentiation and Integration
(May 2010), EA Effectiveness and Process Standardization
(August 2012), Agile and Wilful Blindness
(April 2015).Updated 31 August 2018
More Recent Articles