Data Science (in the Real World)
Data is the new oil. Similar to raw oil directly from the well, raw data also cannot be used to fuel machine learning algorithms. Instead, it needs to be carefully refined and the precious useful information needs to be separated from irrelevant noisy information. In this installment of the workshop series we shed light on the importance of data quality assessments, data preprocessing and knowledge transfer between domain experts and data scientists.Data Science (in the Real World)
Target group:
People interested in data science (researchers, practitioners, …)
Abstract:
Data is the new oil. Similar to raw oil directly from the well, raw data also cannot be used to fuel machine learning algorithms. Instead, it needs to be carefully refined and the precious useful information needs to be separated from irrelevant noisy information. In this installment of the workshop series we shed light on the importance of data quality assessments, data preprocessing and knowledge transfer between domain experts and data scientists. In addition we will discuss a selection of pitfalls and even paradoxical data science results. We will not only acknowledge their existence but aim to also provide practical advice on how to handle situations like skewed datasets, for example cases where there are only a few examples in a dataset of a potentially undesired phenomenon.
After the event you will know:
-
- Data quality concerns & KPIs
-
- Code & data books
-
- Paradoxical data science results
-
- Data cleaning & preprocessing
-
- Data validation
-
- Model debugging
-
- Non-linear correlation & correlation does not imply causation
-
- Feature selection & outlier detection
-
- Machine learning with skewed & imbalanced data
Speaker
Research Area Manager Knowledge Discovery
Oliver Pimas
Big Data Lab
Research Area Manager Data Management
03:00 - 05:00
Data Science (in the Real World)
Target group:
People interested in data science (researchers, practitioners, …)
Abstract:
Data is the new oil. Similar to raw oil directly from the well, raw data also cannot be used to fuel machine learning algorithms. Instead, it needs to be carefully refined and the precious useful information needs to be separated from irrelevant noisy information. In this installment of the workshop series we shed light on the importance of data quality assessments, data preprocessing and knowledge transfer between domain experts and data scientists. In addition we will discuss a selection of pitfalls and even paradoxical data science results. We will not only acknowledge their existence but aim to also provide practical advice on how to handle situations like skewed datasets, for example cases where there are only a few examples in a dataset of a potentially undesired phenomenon.
After the event you will know:
-
- Data quality concerns & KPIs
-
- Code & data books
-
- Paradoxical data science results
-
- Data cleaning & preprocessing
-
- Data validation
-
- Model debugging
-
- Non-linear correlation & correlation does not imply causation
-
- Feature selection & outlier detection
-
- Machine learning with skewed & imbalanced data
Speaker
Research Area Manager Knowledge Discovery
Oliver Pimas
Big Data Lab
Research Area Manager Data Management