Monday, March 18, 2013

Big Data Scientists as Renaissance Men and Women

My good friend, John Byington, recently forward me a fascinating article that takes a novel look at the emerging field of Big Data.  The article appears in Data Science Central - The Online Resource for Big Data Scientists (see link below)

In his article, Vincent Granville differentiates between Vertical Data Scientists and Horizontal Data Scientists.  While he does not explicitly use the Renaissance Man/Woman language, it is clear that he has in mind this kind of broadly educated and non-siloed individual when he describes the Horizontal Data Scientist:

"There are two types of data scientists:
  • Vertical data scientists have very deep knowledge in some narrow field. They might be computer scientists very familiar with computational complexity of all sorting algorithms. Or a statistician who knows everything about eigenvalues, singular value decomposition and its numerical stability, and asymptotic convergence of maximum pseudo-likelihood estimators. Or a software engineer with years of experience writing Python code (including graphic libraries) applied to API development and web crawling technology. Or a database guy with strong data modeling, data warehousing, graph databases, Hadoop and NoSQL expertise. Or a predictive modeler expert in Bayesian networks, SAS and SVM.

  • Horizontal data scientists are a blend of business analysts, statisticians, computer scientists and domain experts. They combine vision with technical knowledge. They might not be expert in eigenvalues, generalized linear models and other semi-obsolete statistical techniques, but they know about more modern, data-driven techniques applicable to unstructured, streaming, and big data, such as (for example) the very simple and applied Analyticbridge theorem to build confidence intervals. They can design robust, efficient, simple, replicable and scalable code and algorithms.

Horizontal data scientists also come with the following features:
  • They have some familiarity with six sigma concepts, even if they don't know the word. In essence, speed is more important than perfection, for these analytic practitioners.
  • They have experience in producing success stories out of large, complicated, messy data sets - including in measuring the success.
  • Experience in identifying the real problem to be solved, the data sets (external and internal) they need, the data base structures they need, the metrics they need, rather than being passive consumers of data sets produced or gathered by third parties lacking the skills to collect / create the right data.
  • They know rules of thumb and pitfalls to avoid, more than theoretical concepts. However they have a bit more than just basic knowledge of computational complexity, good sampling and design of experiment, robust statistics and cross-validation, modern data base design and programming languages (R, scripting languages, Map Reduce concepts, SQL)
  • Advanced Excel and visualization skills.
  • They can help produce useful dashboards (the ones that people really use on a daily basis to make decisions) or alternate tools to communicate insights found in data (orally, by email or automatically - and sometimes in real time machine-to-machine mode).
  • They think outside the box. For instance, when they create a recommendation engine, they know that it will be gamed by spammers and competing users, thus they put an efficient mechanism in place to detect fake reviews. 
  • They are innovators who create truly useful stuff. Ironically, this can scare away potential employers, who, despite claims to the contrary and for obvious reasons, prefer the good soldier to the disruptive creator."
The author then goes on to discuss the ramification of these phenomena for recruiters who are attempting to source and to place Data Scientists with their client companies:

"This is also one of the reasons why recruiters can't find data scientists: they find and recruit mostly vertical data scientists. Companies are not yet used to identifying horizontal data scientists - the true money makers and ROI generators among analytic professionals. The reasons are two-fold:
  • Untrained recruiters quickly notice that horizontal data scientists lack some of the traditional knowledge that a true computer scientist, or statistician, or MBA must have - eliminating horizontal data scientists from the pool of applicants. You need a recruiter familiar both with software engineering, business analysts, statisticians and computer scientists, and able to identify qualities not summarized by typical resume keywords, and identify which (lack of) skills are critical from the ones that can be overlooked, to detect these pure gems. 
  • Horizontal data scientists, faced with the prospects a few job opportunities, and having the real knowledge to generate significant ROI, end up creating their own start-up, working independently, sometimes competing directly against the very companies that are in need of real (supposedly rare) data scientists. After having failed more than once getting a job interview with Microsoft, eBay, Amazon or Google, they never apply again, further reducing the pool of qualified talent."

I am blessed with a network that includes a number of these kinds of Horizontal Data Scientists.  I look forward to working with yo and your company if you have the need to hire someone who brings this kind of value to your Big Data projects.


Vertical vs. Horizontal Data Scientists

Data Science Central article: Vertical vs. Horizontal Data Scientists

No comments: