data cleansing services
We offer a range of Data Health services alongside our Data Health products to help you highlight & understand areas for improvement in the current state of your SAP data.
Data Health Services
Our data cleansing services will help you highlight & understand areas for improvement in the current state of your SAP data. We will analyse both the data & importantly the processes associated with data (e.g the creation of customer or vendor onboarding or how your data is managed and by whom).
Once our analysis has been completed, you will better understand your data priorities are across multiple data domains. This process ultimately supports your data strategy and ongoing management of the data lifecycle. We typically focus on supply chain objects initially, covering Material (Product), Customer and Vendor, but all of the other objects can be analysed.
Utilise our data cleanse automation tools, like Ditto and Align to quickly and effectively remediate data or remove duplicates. Accelerate your data cleanse and start leveraging your data today.
Tools to discover, cleanse, and transform data
Data Maturity Index
The Bluestonex Data Maturity Index is a framework to measure how organisations use data. This framework is an attempt to understand data strategy, data governance, data quality, data velocity, the processes affected by data, and finally the user experience for those who maintain data in the organisation. The Data Maturity Index is a number between 1 and 5 and is visualised by a gauge chart.
How is it measured?
The Bluestonex We send out a Data Maturity survey to the users who primarily deal with data. In this survey, we wish to gain an insight into how the organisation uses data. We are interested in data strategy, data governance, data quality, data velocity, the processes affected by data, and finally the user experience for those who maintain data in the organisation. It takes about 10 mins to complete the survey and all the responses are confidential.
How is it analysed?
The Bluestonex The Bluestonex Data Maturity Index can be analysed with the help of a dashboard where an overall score is given based on the data collected by the survey. In addition to this, we analyse each area and provide a score for them (data strategy, data governance, data quality, data velocity, the processes, and user experience). All these scores are out of 5 and give a better understanding of the current state of data in the organization. The dashboard also gives an insight into the survey audience and analyses a few of the responses under each category.
Ditto Duplicate Checks
Ditto is your all-in-one data duplicate and consistency check tool. Powerful algorithms power this simple application that gives you complete control over your search criteria. Search by multiple values and also define a match weighting, allowing searches for exact matches or similar records
Align functionality allows you to quickly and efficiently correct any missing or incorrect data in your SAP system. Identify misaligned data records en masse, either manually, or as a background process using a thorough yet simple selection criteria. This allows corrections to be made to user-specified ranges or groups of records
A Quick Guide to Data Cleansing
Data Surveys are one of the first steps we take when determining the health and quality of a customer's data.
Here are answers to some of the most common questions we get asked about Data Health and Quality.
Data is gold dust for business decision-making, but like gold, it must be refined, processed and looked after to take its shiny, valuable form. This ‘solid data’ can save business hundreds of hours of work and launch it ahead of competitors. However, getting data to a gold standard requires clean data from the outset and throughout.
Data Cleansing also referred to as data cleaning or data scrubbing is a broad term given to the data refinement process. This includes finding and removing incorrect, corrupted, poorly formatted, duplicate, or incomplete data within a dataset. The end goal of this is to make data as accurate as possible and maintain that quality in order for it to be used for better strategic decision-making.
Cleaning Data is a key part of the data management process and is normally undertaken in the data preparation phase by data quality analysts and engineers or other data management professionals. However, manual data cleansing is often tedious and time-consuming. Modern automation tools and techniques make data cleansing a simpler and much more effective process.
If you put garbage data in, you get garbage analytics out. It’s a classic saying but with increasing independence on data for accurate decision making and continuous business processes, it’s never been so important. Bad data expels misleading information, which leads to flawed business decisions, misdirected strategies, lost opportunity and operational complications. It’s a knock-on effect which ends in increased internal costs and a potential loss of revenue. We’re not talking about a small loss- IBM estimated that data quality issues cost organisations in the U.S. a total of $3.1 trillion in 2016.
The format for cleansing data isn’t a ‘one size fits all’ solution. It varies based on the type of information the organisation uses and stores and the framework the organisation operates on. That said, there are some essential steps which can be used as a rough guide to clean data:
1. Data Survey and inspection: Mass audits are undertaken to identify the current condition and quality level of the data landscape. This is useful to spot trends in errors and build an idea of the cause of the issues.
2. Remove duplicates: Duplicate data can be just as misleading in decision-making as erroneous data, and it can lead to Managers overestimating results. They commonly occur due to multiple sources working on copies of the same data.
3. Fix structural issues: Structural errors can be caused by informal naming conventions, typos, or incorrect capitalisation. They result in mislabelling or fragmentation of datasets. An example of this would be two sets for the same topic, like “Master Data” and “master_data”.
4. Filter unwanted outliers: Improper data entry can lead to erroneous data which obviously doesn’t fit in when being analysed. Outliers aren’t always incorrect and need removing, but it’s always worth checking in case human error is to blame.
5. Handle Missing Data: It’s hard to spot missing data as how can you know what’s missing if it isn’t there? That said, it can’t be written off because of this because missing data can lead to incorrect results or can block them all together, if rejected by data processing software. To resolve, it can be more effective to remove effected data all together or input the data based on research or existing data. These are not 100% reliable but will be better than leaving the data with missing elements.
6. Validate the work: Once the bulk of the cleaning is done validation should be undertaken to answer the following:
• Does the data make sense?
• Does the data follow the appropriate rules for its field?
• Does it prove or disprove your working theory, or bring any insight to light?
• Can you find trends in the data to help you form your next theory?
• If not, is that because of a data quality issue?
7. Standardise and cement in policy: Once data is clean, it’s essential to keep it that way. New governance policies must be introduced and successfully adopted for this to happen. The success of adoption can be influenced by training and explaining the importance of proper data entry to colleagues. This should be done until it becomes an organisational habit.
8. Ongoing Monitoring: When cleaning is all done and dusted, errors can still crop up occasionally. It’s important to stay one step ahead of these and resolve them as/and when they occur. Kept up on a regular/ ad-hoc basis reduces the need for time-consuming larger cleans in the future.
These steps can all be undertaken manually, but they are time-consuming and frankly, tedious. The repetitive nature of seeking out errors manually can lead to some being missed, preventing data from being properly cleaned. To resolve both concerns at once, utilise Data Cleaning tools like Ditto and Align which automate and speed up the data cleaning process.
When completed fully and properly, there are numerous benefits to data cleansing, both immediate and feeding through too many different layers of a business. These benefits include:
Better decision-making: High-quality data produces more accurate analytics. With better analytics, better decisions follow. This gives businesses the opportunity to scale and claim competitive advantage at speed- essential in an ever-changing market.
Better operational performance: When high-quality data flows around your organisation freely, organisational processes can proceed unhindered and optimised. This includes supply chain issues, stock requests, customer queries and financial documents. In short, tasks get done faster, as soon as they are needed to.
Increased use of data: Cleaning data remediates or removes unused data. If left unused, it’s a waste of storage and a loss of a virtual asset. When data can be trusted, more of it can be used more often, leading to better total leverage of data as an asset.
Reduced data costs: When your data landscape is clean, erroneous or redundant data is eliminated, this reduces total storage space, which in itself can be costly. Furthermore, IT and data management aren’t left having to fix issues when they occur, saving time and money in the long run.
Less-frustrated employees: When data is accurate and flows well, employees can get on with the tasks that matter to them, rather than solving several issues or making assumptions to get to something not quite as good. Good data quality has a role to play in leaving employees feeling empowered and satisfied in the work they do.
Data Cleaning is not an easy job. There are many considerations at every step with repercussions across the organisation's data landscape. It has to take into account process complexities, which vary in every organisation. The main risks of data cleaning are:
Time-consuming: Depending on the amount of data an organisation uses, data cleaning can take a long time to manually complete.
Remediating all errors: When manually cleaned, identifying and remediating every error or issue can be difficult. Human error means a single error can get through the net, leading to the potential for unreliable data post-clean.
Deciding how to resolve errors without affecting analytics: Once errors or irrelevant data are identified, the next step is to work out how to remediate it. Should it be removed from the system? Can the data be fixed and kept? Each choice will have an impact on the analytics or systems that the data is attached to.
Sufficient resources and organisational support: Data cleanses take time and effort. Data Managers must get the resources they need to undertake a full clean. Plus, with potential changes and disruption to the data landscape, support and understanding from colleagues must be achieved.
Remediating inconsistencies across various business units: The same data may be used for various departments. Sales and marketing, supply chain and manufacturing and many others. This can lead to inconsistencies between the two departments on the same data. Bridging these to create a master document can be challenging- especially so that all user parties are satisfied.
The frequency of data cleaning depends on the size of an organisation and the amount/ way that the organisation uses its data. However, a good rule of thumb is to undertake a routine data clean every three to six months for large organisations and twelve for smaller organisations.
That said, once an initial deep data clean has been performed, data cleans should become fairly rare if monitoring and compliance are carried out properly. Furthermore, data cleaning automation tools can be deployed to carry out incremental cleans or take care of larger cleans with greater efficiency.
If data isn’t cleaned, it can lead to various negative consequences for businesses. These include:
• Inaccurate, duplicate, or outdated data can result in incorrect insights and decisions, impacting business strategies.
• Poor data quality affects customer satisfaction, as incorrect information leads to communication errors and delays.
• Inadequate data can hamper marketing efforts, targeting the wrong audience and wasting resources.
• Incomplete data affects analytics, hindering accurate performance measurement and forecasting.
• Additionally, data compliance issues may arise, leading to legal consequences and reputational damage.
Overall, not cleaning data reduces efficiency, increases operational costs, and jeopardises business growth, making data cleansing essential for successful and reliable operations.
Automation is fast becoming a common way to increase efficiency and reduce time spend on data cleanses. It is advisable to keep the approval side of the cleanse manual- as data best kept could be removed. However, automation is a much more reliable way to quickly and accurately identify all errors, duplicates and irrelevant data. This takes away much of the manual pains of checking data one file at a time.