Profession data scientist

Data scientists find and interpret rich data sources, manage large amounts of data, merge data sources, ensure consistency of data-sets, and create visualisations to aid in understanding data. They build mathematical models using data, present and communicate data insights and findings to specialists and scientists in their team and if required, to a non-expert audience, and recommend ways to apply the data. They utilise recommendation engines, spam classifiers, sentiment analysers and classifiers for unstructured and semi-structured data.

Would you like to know what kind of career and professions suit you best? Take our free Holland code career test and find out.

Personality Type

  • Investigative / Realistic

Knowledge

  • Resource description framework query language

    The query languages such as SPARQL which are used to retrieve and manipulate data stored in Resource Description Framework format (RDF).

  • Query languages

    The field of standardised computer languages for retrieval of information from a database and of documents containing the needed information.

  • Statistics

    The study of statistical theory, methods and practices such as collection, organisation, analysis, interpretation and presentation of data. It deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments in order to forecast and plan work-related activities.

  • Visual presentation techniques

    The visual representation and interaction techniques, such as histograms, scatter plots, surface plots, tree maps and parallel coordinate plots, that can be used to present abstract numerical and non-numerical data, in order to reinforce the human understanding of this information.

  • Data mining

    The methods of artificial intelligence, machine learning, statistics and databases used to extract content from a dataset.

  • Information extraction

    The techniques and methods used for eliciting and extracting information from unstructured or semi-structured digital documents and sources.

  • Online analytical processing

    The online tools which analyse, aggregate and present multi-dimensional data enabling users to interactively and selectively extract and view data from specific points of view.

  • Information categorisation

    The process of classifying the information into categories and showing relationships between the data for some clearly defined purposes.

  • Data models

    The techniques and existing systems used for structuring data elements and showing relationships between them, as well as methods for interpreting the data structures and relationships.

Skills

  • Handle data samples

    Collect and select a set of data from a population by a statistical or other defined procedure.

  • Execute analytical mathematical calculations

    Apply mathematical methods and make use of calculation technologies in order to perform analyses and devise solutions to specific problems.

  • Deliver visual presentation of data

    Create visual representations of data such as charts or diagrams for easier understanding.

  • Design database scheme

    Draft a database scheme by following the Relational Database Management System (RDBMS) rules in order to create a logically arranged group of objects such as tables, columns and processes.

  • Establish data processes

    Use ICT tools to apply mathematical, algorithmic or other data manipulation processes in order to create information.

  • Perform data cleansing

    Detect and correct corrupt records from data sets, ensure that the data become and remain structured according to guidelines.

  • Collect ICT data

    Gather data by designing and applying search and sampling methods.

  • Report analysis results

    Produce research documents or give presentations to report the results of a conducted research and analysis project, indicating the analysis procedures and methods which led to the results, as well as potential interpretations of the results.

  • Implement data quality processes

    Apply quality analysis, validation and verification techniques on data to check data quality integrity.

  • Develop data processing applications

    Create a customised software for processing data by selecting and using the appropriate computer programming language in order for an ICT system to produce demanded output based on expected input.

  • Build recommender systems

    Construct recommendation systems based on large data sets using programming languages or computer tools to create a subclass of information filtering system that seeks to predict the rating or preference a user gives to an item.

  • Normalise data

    Reduce data to their accurate core form (normal forms) in order to achieve such results as minimisation of dependency, elimination of redundancy, increase of consistency.

  • Interpret current data

    Analyse data gathered from sources such as market data, scientific papers, customer requirements and questionnaires which are current and up-to-date in order to assess development and innovation in areas of expertise.

  • Manage data collection systems

    Develop and manage methods and strategies used to maximise data quality and statistical efficiency in the collection of data, in order to ensure the gathered data are optimised for further processing.

Optional knowledge and skills

ldap manage data perform data mining unstructured data linq integrate ict data create data models define data quality criteria manage ict data architecture manage ict data classification n1ql business intelligence data quality assessment mdx xquery sparql

Common job titles

  • Data scientist
  • Analyst/r programmer/data scientist
  • Data scientist / entry level
  • Data scientist/quantitative analyst, engineering
  • Product analyst, data science
  • Research scientist, google brain (united states)
  • Senior data scientist-predictive analytics
  • Freelance data scientist