Deequ

Use Cases

Validating Customer Data

Ensuring customer data loaded into a CRM system is complete, accurate, and consistent.

VIEW EXECUTION STEPS

Load customer data into a Spark DataFrame.

Define checks for completeness of required fields (e.g., name, email, address).

Define checks for uniqueness of customer IDs.

Define checks for data type validation (e.g., email format, phone number format).

Run the VerificationSuite to identify any violations.

Quarantine and fix the data to meet data quality requirements before loading into the CRM.

Monitoring Log Data Quality

Detecting anomalies and errors in log data ingested into a central logging system.

VIEW EXECUTION STEPS

Load log data into a Spark DataFrame.

Define checks for expected log message formats.

Define checks for the presence of required fields (e.g., timestamp, log level, message).

Define checks for detecting unusual log patterns or error rates.

Run the VerificationSuite to identify any anomalies.

Trigger alerts or automated remediation based on the detected issues.

Validating Financial Transaction Data

Ensuring financial transaction data used for reporting and analysis is accurate and reliable.

VIEW EXECUTION STEPS

Load financial transaction data into a Spark DataFrame.

Define checks for data accuracy (e.g., ensuring amount values are within expected ranges).

Define checks for completeness of required fields (e.g., transaction ID, account number, date).

Define checks for consistency between related data elements.

Run the VerificationSuite to identify any data errors.

Investigate and correct any detected errors before generating financial reports.

Data Quality Monitoring for Machine Learning Pipelines

Preventing data quality issues from negatively impacting machine learning model performance.

VIEW EXECUTION STEPS

Load training data into a Spark DataFrame.

Define checks for data completeness, accuracy, and consistency.

Define checks for feature distributions and statistical properties.

Run the VerificationSuite before training the model.

Quarantine and fix any data quality issues.

Retrain the model with validated data.

Data Migration Validation

Ensuring data is migrated correctly and completely from an old system to a new system.

VIEW EXECUTION STEPS

Load data from the source and target systems into separate Spark DataFrames.

Define checks for data completeness (e.g., record counts).

Define checks for data consistency between corresponding fields in the source and target systems.

Run the VerificationSuite to identify any discrepancies.

Investigate and reconcile any data migration issues.

About Deequ

Core Capabilities

Main Tasks

Data Profiling

Key Features

Metrics Repository

Data Profiling

Anomaly Detection

Automatic Constraint Suggestion

Incremental Metrics Computation

Use Cases

Validating Customer Data

Monitoring Log Data Quality

Validating Financial Transaction Data

Data Quality Monitoring for Machine Learning Pipelines

Data Migration Validation

Quick Start Guide

Pros

Cons

Frequently Asked Questions

Reviews & Ratings

AI Verdict

Write a Review

Feedback & Questions

User Comments

Open Source

Specs

Core Tasks

Analytics

Categories

Use Deequ For

Alternative Tools

YData Fabric

Stoplight

Polyglot Labs

Langfuse

FinRL

finmarketpy

FindBugs

Fiji (Fiji Is Just ImageJ)

Data Interface