My First Internship

Data Science

Data Science is the usage of scientific methods, algorithms and systems to extract insights, information and analyze noisy structured or unstructured data. Data is everywhere. Data is involved in each and every aspect of one’s life. Education, finance, health care and IT are some of the prominent industries which utilize data at a large scale. The necessary skills, tools and methods for handling data effective make up data science.

  • Data extraction:
  • Data Engineering
  • Data Visualization

My Role as a data scientist

As a data scientist, my responsibilities include understanding the needs of the client, cleaning and processing the data, identifying and dealing with the outliers, offering insights and information based on the data, and providing visualization services and detecting patterns, trends and relationships in data using various types of analytics and reporting tools.

  1. Scala: Functional programming and using collections to improve performance. Spark native code is also written in Scala.
  2. Spark: Apache Spark is the engine used for large-scale data processing on clusters.
  3. Power BI: Microsoft Power BI is used to visually represent the data analysis report.
  4. MySQL: Database engine for querying raw data.
  5. Microsoft Excel: Spreadsheet software

Challenges I faced

There were numerous challenges I faced in my professional experience. The quality of work expected of us is high and the timeline is to be strictly adhered to, Spark and Scala are the predominant tech frameworks used in the industry. The learning curve of the distributed data system of Spark was a bit steep and was challenging at times. Adapting to the team and working as a team to deliver top quality results was important.



