The first working experience for anybody is special. Having dreamt about developing software applications for years, I realized during my college life that development is not something I would enjoy doing. I enjoyed giving instructions to the computer to perform a task. But going through the Software Development lifecycle was tedious and boring.
I renewed my interest in giving instructions to the computer through an interdisciplinary field called “Data Science”. It combined pretty much everything I wanted to work on. I got to put to use my strengths in Math and Statistics, Computer Science, Analytics, Artificial Intelligence(AI), Machine Learning(ML).
Data Science is the usage of scientific methods, algorithms and systems to extract insights, information and analyze noisy structured or unstructured data. Data is everywhere. Data is involved in each and every aspect of one’s life. Education, finance, health care and IT are some of the prominent industries which utilize data at a large scale. The necessary skills, tools and methods for handling data effective make up data science.
The pipeline of data science includes:
- Data extraction:
Gathering structured or unstructured data from the source. The data is said to be in raw form i.e it cannot be used immediately and needs to be processed. The data is stored in databases or other data storage sites or on the cloud.
- Data Engineering
The data is transformed into a form that can be easily understood by data scientists. The performance of processing this data can be improved significantly through data engineering.
- Data Visualization
Data science is predominantly used by organizations to obtain information or insights from data. This information must be presented to a larger audience. Data visualization helps in achieving this.
My Role as a data scientist
As a data scientist, my responsibilities include understanding the needs of the client, cleaning and processing the data, identifying and dealing with the outliers, offering insights and information based on the data, and providing visualization services and detecting patterns, trends and relationships in data using various types of analytics and reporting tools.
The tools and technologies I use in day-to-day tasks are:
- Scala: Functional programming and using collections to improve performance. Spark native code is also written in Scala.
- Spark: Apache Spark is the engine used for large-scale data processing on clusters.
- Power BI: Microsoft Power BI is used to visually represent the data analysis report.
- MySQL: Database engine for querying raw data.
- Microsoft Excel: Spreadsheet software
Challenges I faced
There were numerous challenges I faced in my professional experience. The quality of work expected of us is high and the timeline is to be strictly adhered to, Spark and Scala are the predominant tech frameworks used in the industry. The learning curve of the distributed data system of Spark was a bit steep and was challenging at times. Adapting to the team and working as a team to deliver top quality results was important.
I would like to thank the team at Deloitte for making me feel at home and welcoming, making me feel at home and answering all my questions and helping me gain experience.