The most challenging positions to recruit for are often the newest hot profession. It must have been fun to find a silicon semiconductor specialist in 1954 -- who wanted to study sand before the transistor was invented? An HTML expert in 1996? SGML experts all had cushy government jobs or worked for publishers. In the last two years, a new creature known as the "data scientist" has emerged as one of the must have hires for many firms. Here at Bright we have assembled an outstanding Data Science group that built our Bright Score, provides interesting data to the media and general public via Bright Labs, and causes endless grief for the engineering team that needs to scale our ideas to our users.
Let’s take a look at our science team on paper. The team is an eclectic mix, consisting of one former nuclear physicist, one neuroscientist, one geophysicist, one astrophysicist, and a mechanical engineer. At a glance, you may think we are trying to invent Warp Drive, and on top of that, not ONE of them had Data Scientist as their last job title. However, each and every one of them have had years of extensive training from some of the brightest minds in the world, have conducted a countless number of hours researching, experimenting, analyzing, and documenting solutions to real world problems, and have had their work critiqued by their peers and published in academic publications. As the old adage goes, “sometimes things are not always what they appear to be,” a statement very true when it comes to finding Data Scientists.
How does one go about hunting these camouflaged “purple squirrel” scientists?
To begin, what is a Data Scientist? It depends who you ask. Common (incorrect) definitions are:
- A Hadoop expert. To those hiring managers that are certain they need a Hadoop expert I submit @DevOps_Borat
- A Machine Learning expert. Every construction project does not need a hammer.
- Kagglers
I define a Data Scientist as someone who knows just enough programming, system administration, and statistics to transform a large, possibly heterogeneous set of unstructured data into actionable intelligence or an actual product. The Data Scientist must also have sufficient visualization and communication skills to be able to convince someone that they did it correctly.
We’ve found that one of the least effective methods for finding a Data Scientist is to log into LinkedIn and search for "Data Scientist." There aren't that many, as Data Science is an emergent field. There are little to no Data Scientists with 5 years experience, because the job simply did not exist (at least not in its current form).
Where, then, does one find the elusive Data Scientist? As famed bank robber Willie Sutton once said, he robbed banks “because that's where the money is.” If you want to find a Data Scientist, find yourself a disgruntled postdoc toiling away on brilliant scientific research, but failing to land a professorship because ... all the professor jobs are taken! (For those of you not familiar with academia, after earning your Ph.D., you typically work for 2-6 years as a postdoctoral research fellow. You are a semi-autonomous, but typically work under a professor that was fortunate enough to get their Ph.D. in the good old days when there were actually professorships to be had.)
Private companies that are detached from the world of academia sometimes give candidates, such as these postdocs, a hard time – having the perception that they must not be hard workers and won’t be able to keep up in fast-paced environments, because they haven’t had a “real” job. The opposite is true, in many cases, people in academia often have to work twice as hard. The grant funding they receive is insufficient to pay for tools that many take for granted ("You don't need that $1000 software license! Write that code yourself!"). Yes, like any other profession, there are a few slackers in academia. There are some questions you can ask to identify and eliminate them early enough in your selection process:
- "Tell me about some peer reviewed papers that you published as first author?" I want people that can finish long, complicated tasks. Nothing takes longer, or is more complicated than publishing a peer reviewed paper. To give you an idea of what that entails, imagine all of the backstabbing people competing with you at work and in your profession, put them behind a wall of anonymity where they critique and criticize every little detail of the project you have been slaving over -- that is a peer reviewer.
- "Tell me about some code you've written that other people use?" Academics tend to be "good enough" programmers. I don't need it to be elegant, but I do need it to work. The best test of whether code is "good enough" is whether at least two other people use it.
- "Explain to me the statistical analysis you used in your thesis" Statistics are like music. Some people play notes, some people make music. People that really understand statistical concepts at a fundamental level usually make the best Data Scientists. Anyone can run an Analysis of Variance in Excel, but is that really the best approach? Ultimately, the worst thing your Data Scientist can do is get fooled by the data.
LinkedIn can still be helpful, and is particularly useful for finding a Data Scientist you respect. Once you identify one, find their connections that are still toiling away in academia, look up their emails on the university web site (yes, they make it that easy for you), and send them an email.
What is true in sports is also true in hiring -- it is better to find a superstar in the draft than it is to find them as a free agent. They are cheaper, and you get them during their most productive years.
David Hardtke, Ph.D., Chief Scientist
Josh Barger, PHR, Director of People Operations