In the previous article (“4 things every data scientist should know“) we reviewed, at a high level, the major topics that every aspiring data scientist should be aware of. Now we’ll delve into some of the specific skills that are to be expected from any data scientist.
It shouldn’t be a surprise that this is the first item in this list. As mentioned before, data science can be considered almost as a natural evolution of it. Thus, high proficiency in this field can be considered a mandatory skill for any data scientist.
Having to deal with huge & unlabeled data-sets is not an easy challenge, especially if you are trying to derive insights from them! A good understanding on algorithm complexity and techniques for general problem resolution will prove invaluable for the task ahead.
The structured query language has become the standard for interacting with relational databases. What’s more, several NoSQL databases have implemented languages alike (e.g. Couchbase N1QL or Cassandra CQL), so any person with high proficiency in SQL will find it relatively easy to interact with a broad set of data sources.
Python / R:
These languages are the bread and butter of any data scientist. Whether you are analyzing some data or you are running an ML model on a notebook (Like Google Colab) you are almost certainly interacting in one of these two languages. They offer a lot of tools to work with data seamlessly and interaction with the latest AI models (e.g. Through Keras). If you are coming from a mathematics/statistics background we recommend you start by learning R where you will feel right at home. However, if you come from a CS background, then we recommend you start by learning Python instead.
The exploratory data analysis is, perhaps, one of the most important steps when trying to comprehend or gain insight from a data-set, and one that’s usually easily overlooked. A proper EDA can detect (or hint) the presence of bias in the data and gain useful insight. As an interesting example during the “Tweet sentiment extraction” competition held by Kaggle (ref. https://www.kaggle.com/c/tweet-sentiment-extraction) it was possible to infer that the training data provided was gathered around mothers day!
Being able to represent data in a useful and compact manner is by no means an easy task. Proficiency with data visualization tools (e.g. like ggplot2) and a good understanding on each diagram strengths & weaknesses will prove invaluable to effectively communicate any result.
So far we’ve been discussing technical skills regarding data science. However, given the nature of the task, we cannot overlook the importance of soft skills like effective communication and good presentation, key aspects to deliver the results to the right audience.
Do you think there’s something missing from the list? Let us know!
Hope you’ve found this article useful. See you next time.