Back

Data validation as a service – accelerating client data onboarding

Duration: ~1 month

Customer location: The US

Client: B2B SaaS platform

Industry: Fintech

Services: AI/ML ● Data Services

Tech stack: AWS ● Django ● Python

Dedicated team behind the project

Daniel

Tech Lead

Ali

Senior Software Data Engineer

Alex

Senior Software Engineer

Yuliia

Customer Success Manager

The client

WayThru is a US-based fintech multi-tenancy platform that helps debt management companies operate more efficiently by scheduling debts through various payment providers. The company’s ethos is rooted in humanity, promoting a modern and empathetic approach to debt recovery.

As a mid-sized SaaS platform processing sensitive financial data for multiple customers, WayThru needed to replace manual client onboarding with an automated pipeline capable of handling diverse file formats, large data volumes, and strict compliance requirements.

The challenge

The client’s onboarding process relied heavily on manual data imports and validations. Each new customer provided files in different formats, often with inconsistent column naming, missing values, or schema mismatches.

This led to:

Onboarding delays of weeks due to manual validation, scripting, and continuous back-and-forth communication with clients
High error risk when dealing with 100k–400k+ row  CSV/Excel files
Inconsistent data quality, including incorrect types,  missing mandatory fields, and encoding issues
Difficulty supporting complex data formats (XML, JSON, CSV, Excel) and meeting security or compliance requirements

What was done

Ralabs developed and delivered a data and structure validation solution tailored to the client’s needs. The system automated file ingestion, schema mapping, data validation, and reporting.

Key steps included:

Designing a data validation pipeline with configurable schema mapping per client

Building an admin and user portal to support role-based permissions for data uploads and validations

Integrating AI-powered semantic column matching using OpenAI to resolve inconsistent header naming

Implementing data type validation and lightweight type conversion (e.g., parsing dates, coercing text to numeric values)

Adding automated reporting and audit trail with exportable validation reports

Testing performance on large files (50–70MB, up to millions of rows) to ensure scalability

Preparing deployment for isolated server environments, with minimal infrastructure requirements

Implemented features:

CSV/Excel upload interface

Admins and end users can upload structured data files directly through a simple interface, eliminating manual file handoffs.

Schema management per tenant

Each client’s dataset is validated against its own configurable schema, ensuring onboarding stays consistent across tenants.

Column header and content matching with AI

The system combines exact matching with OpenAI-powered semantic matching to resolve inconsistent column names.

Missing and extra column detection

Automatically identifies required fields that are missing and flags surplus columns that can be ignored or mapped.

Data quality checks

Validates column values against expected types and ranges, and can enforce referential rules to catch logical inconsistencies.

Validation thresholds and error dashboard

Configurable controls define how strict the validation should be, with reporting dashboards that show mismatched fields and error rates.

Role-based access

Admins can manage schemas and imports, while regular users upload files and view reports in a limited scope.

Exportable validation reports

Validation results can be exported in Markdown or PDF for audit, QA, or client communication.

Results

Built a working prototype in just 2 weeks

A fully working prototype was created in just 2–3 weeks, enabling early testing and faster client feedback.

Validated CSV files up to 400k rows (50–70MB)

The prototype processed CSV files up to 400k rows (50–70MB) while maintaining accuracy in column mapping and validation.

Cut onboarding time from weeks to hours

Automated validation reduced manual file checks and cut onboarding time from weeks of scripting and corrections to hours.

Improved data quality with automatic error detection

The system flagged corrupted values, missing mandatory fields, and schema mismatches automatically, improving consistency.

Self-service onboarding

Clients can upload files and receive immediate validation feedback without waiting for engineering support.

Scalable foundation

The delivered pipeline supports multi-tenant onboarding and leaves room for future AI-driven anomaly detection and enrichment

Tech stack

Previously, onboarding a new client’s data file required days of manual review — validating fields, catching formatting errors, and going back and forth with the client on adjustments. With the ETL tool, a preliminary file review now takes 30 minutes or less. It flags missing headers, surfaces data quality issues before ingestion, and speeds up every re-submission cycle.

Thomas Kersting
Technical Director at WayThru Innovations