Keeping in line with my goal to make this blog a good resource for career switches into data science, I'll also be posting interviews with people currently in data science. We'll discuss topics like:
- motivations for a career change
- different journeys to getting into data science
- what employers are looking for in a data scientist (and how that may vary between companies)
- opinions regarding self-learners, bootcamp graduates
- backgrounds of data scientists
- cool projects going on
- general career advice
- other topics (open to suggestions!)
This post, we'll kick off the interviews starting with Eric Koziol. Eric began as a structural engineer (like me!) and successfully made the crossover into data science over a year ago and hasn't looked back. I think his story is particularly awesome because he had the determination to make it happen as a self-learner, while I decided to go the bootcamp route after a year of self-learning for reasons in this post.
Tell us about your background and how you got into data science.
I studied Structural Engineering at University of Illinois at Urbana-Champaign (UIUC) and left with a Masters degree in 2010. Afterwards, I started working at Wiss, Janney, Elstner Associates, Inc. (WJE) in Northbrook, IL, where I spent 5 years in structural forensics.
During my time there, I was involved in a lot of material testing, vibrational analysis and instrumentation, all of which required a lot of data analysis. I also developed internal software for our work iPads to mark-up and annotate data for cracks we observed in our field investigations, which ultimately made for faster turnaround times and removed the need for paper notes. I worked a lot on UI & back-end development like creating modules to import our iPad data into CAD and creating automated analysis from the XML data. It was built primarily in C#, which reminded me of the time I was a TA for a startup focused engineering entrepreneurship class (ENG 298) at UIUC and learned Ruby on Rails. I liked learning, so I continued taking online classes like computational finance classes on iTunes U and Coursera. Eventually, I read about Kaggle from Wired and started playing around with linear models in Excel. I placed in the bottom third of my first Kaggle competition and knew I needed to learn more.
It was around this time that my girlfriend (now fiancée), got me a book on complexity theory, which got me started on learning genetic algorithms. I tried to use a genetic algorithm for stock trading that took 30 days to run in MATLAB and ended up not being very good as well. I picked up Python and started using scikit learn to improve my results, which I supplemented with Andrew Ng's Machine Learning course on Coursera. At the time, WJE had me performing vibration monitoring pile driving in NW Indiana, which ended up being 2-4 hours of work a day thanks to our cell modems that alerted us past a certain threshold, so I actually took Andrew Ng's class mostly in my car. I had previously participated in a BattleFin hedge fund competition on Kaggle and placed in the top 25%, so I looked into what they did further and realized that they actually hosted hedge fund competitions where the winners received funding for their systems. So I buckled down and created a model based on neural nets that was ultimately accepted into the competition. I was one of 18 competitors accepted out of around 200. I ended up breaking even (I realized my models were overfit at the end) and finished in the bottom third.
I wanted to improve, so I continued with more Kaggle competitions, reading papers, and more Coursera courses. Eventually, I found myself in the top 0.5% of Kaggle. At this point, I had been at my job for a couple of years and was starting to realize Kaggle was more fun than my actual job. My fiancee had been transferred to Colorado, so we were doing long distance. My life became work, Coursera and Kaggle, then see my fiancée or friends on the weekends. We got engaged in June of 2014, then chose Washington DC to close the distance, where I ended up getting a job as a data science consultant at KPMG.
As a self-learner, how did you get a job and grow your data science network?
Getting a job required a business mindset in which I looked at everything I had done and accomplished, and trying to market myself in the best way possible. I beefed up my LinkedIn and started listing my hedge fund competitions at the start of my profile (Red Thunder Capital), and worked hard to build up my portfolio and GitHub.
As for building a network, I signed up for a data science meetup group in DC once I moved there. I responded to calls for more speakers and offered to give talks on machine learning & R, which people thought was great. When you give talks, people always come up to chat with you afterwards. So I've made a lot of great connections that way.
Benefits/drawbacks to your career change approach - is there anything you would have done differently?
I don't try to think about the past too much since I can't change it- all that matters is moving forward. It's a slippery slope of wondering- If I hadn't gone to UIUC or done Civil Engineering, maybe I wouldn't have picked up the same line of reasoning. Or maybe if I had gone into Computer Science or Chemistry instead, maybe I would have wanted to switch into Structural Engineering later instead of Data Science. Looking back, it's interesting to see how what you like or dislike changes over time- I hated Statistics in school, but it wasn't until I used it later in my job (how many cores do I need to sample to determine the strength I'm actually seeing?) that I actually appreciated it. It's much easier to learn things if you have a goal to accomplish. Now, I learn things to know what's out there, but I don't dive as deeply into a specific topic unless I need it to work on something.
The biggest thing to realize is that if there's something that you don't like where you are headed, then change it. I think I've shown in my life that it's definitely possible. If there's something that you would have done differently, then ask yourself how you can make those adjustments in the future. If you really thought you should have gotten a CS degree, then what can you do to learn those skills now and apply them?
Have you perceived any particular bias towards/against self-learners in data science?
Not particularly. Our team of ~175 data scientists at KPMG (hopefully 350-400 by next year) come as self-learners like me, bootcamp graduates, people switching from other industries, or straight out of school, all from a variety of backgrounds: Economics, Computer Science, Physics, Engineering, Statistics, etc. all with MS or PhD. I think people just want to hire someone that is capable of learning and applying methods. It helps to have prior work experience, or in some way demonstrate that you have experience applying learning to real-world situations. In consulting, you need to be able to act professionally, which people tend to develop more after a first job rather than fresh out of school. Though, PhDs can be attractive fresh out of school since a lot of them have a strong stats background from running experiments for their theses.
Tell us about an interesting problem you're working on.
Anomaly detection for payments is a pretty difficult problem we are working on. Not only is there some subjectivity in how you define an anomaly, but you also have various costs associated with finding those anomalies. We have to interact with the client a lot to determine their accepted level of false positives vs false negatives. Some clients want all potential anomalies to be found (higher false positive rate) while others do not have the bandwidth to investigate every instance picked up (higher false negative rate). It's also very hard to have a ground truth in these problems, since domain knowledge may not be able to tell you the right patterns for finding anomalies. There is a severe class imbalance so supervised methods are harder to apply. We end up using a mix of unsupervised and supervised methods.
Most useful technical & non-technical skill a data scientist can have?
Curiosity and asking questions is a must. Technically, data visualization and communicating results to people who don't understand is important. People won't trust your results if they don't understand your analysis or why your findings are meaningful.
Websites/forums you frequent, or people you follow on twitter?
- Cross Validated (stack-overflow equivalent for statistics)
- Twitter: entrepreneurs, VCs, professors in emerging areas, thought leaders
General advice for someone considering a switch into data science?
Figure out your priorities- make it a priority to learn and actually do things. Switching into data science becomes your life and is all-consuming, so if you're serious about the switch, don't dilly-dally or else you wont successful. Have a sense of urgency.