Société Générale – How to tackle Data Quality issues

In my talk, I will start by presenting to what extend the world is more and more dependent on data: in the societies, families, individuals, and organizations of all types. There will be a brief introduction to machine learning and the impact of bad quality data on machine learning prediction result and the risk of missing the promise and potential of technology if we do not address data quality. Then I will introduce what is data quality making a parallel with health problems and present real issues lived in companies related to bad data quality. I will present each of the six data quality dimensions: accuracy, uniqueness, completeness, validity, consistency and timeliness. Then whenever a data quality problem is found, to gain support for data quality work and investments it is important to show the business impact that management can understand the value of information quality. Then I will present the data quality process, starting from data profiling, define the data quality rules, conduct the assessment, resolve the data quality issues and finally monitor and control. Each step will be detailed in the presentation. Finally, I will explain how to prevent future data errors, indeed, too often the tendency is to skip prevention and start immediately correcting current errors. Preventing future data errors means that a business has processes that produce quality data, instead of facing the time and cost of future data-cleansing activities.

Speaker

Dr. Sahar Changuel, Senior Data Manager | Société Générale

I have a PhD on machine learning and Natural Language Processing (NLP) and during the last 10 years of experience I didn’t stop working with data in its different format structured and unstructured: text extracted from documents or from social media, figures, extracts from data bases, etc. I manipulated data in different use cases and for different services: education, media, audit, finance… data is stored either in local databases, in a datalake, in the Cloud or can be spread in different spreadsheets. My mandate was almost the same each time, it consists on making the data usable and exploitable in the best way that fits the business need. In order to make the best use of data there is one fundamental requirement: data must be in a good quality, which can be a real challenge in some situations. Today I am a Senior Data Manager and a Data Quality Manager and my objective is to promote the data quality in my department, and to put in place a framework to remediate data quality issues.