Thursday, 22nd March 2012
Mathematics doesn't seem personal - numbers tend to obscure intimacy. It's hard to be passionate about 92, 5.2, or 3.14. Well, maybe not 3.14 - that's a pretty cool number.
As data scientists (or, as a colleague of mine prefers, numerical ninjas), we deal with numbers every day; they're our livelihood. However, it's surprisingly easy to forget that those numbers really do represent something. Someone's mortgage, someone's trip to the convenience store on the weekend, or even someone's life-threatening illness.
This is pretty heavy when you get down to it - there's an elephant in the room that we're still coming to terms with. As analysts, we want as much information as possible. This information, unfortunately, comes with an associated loss of privacy. The New York Times recently highlighted the power of analytics by writing about how it's possible to identify whether or someone is pregnant purely off their purchasing patterns.
That in itself doesn't surprise me - being honest, I've seen far more impressive uses of predictive modelling and statistically-based inferences. However, it does flag an important point: as an analyst, what role does ethics play in how we generate insights?
To my mind, this is a hard question. It's also one that needs to be asked - as the amount of data that's available increases, so does the potential for abuse. Medical associations have their own codes of ethics. So do accountants.
Do data scientists also need a code of ethics? And if so, what would one look like?
What are your thoughts?