Data validation as a service – accelerating client data onboarding

Duration: ~1 month

Customer location: The US

Client: B2B SaaS platform
Industry: Fintech
Services: AI/MLData Services
Tech stack: AWSDjangoPython

Dedicated team behind the project

The client

WayThru is a US-based fintech multi-tenancy platform that helps debt management companies operate more efficiently by scheduling debts through various payment providers. The company’s ethos is rooted in humanity, promoting a modern and empathetic approach to debt recovery.

As a mid-sized SaaS platform processing sensitive financial data for multiple customers, WayThru needed to replace manual client onboarding with an automated pipeline capable of handling diverse file formats, large data volumes, and strict compliance requirements.

The challenge

The client’s onboarding process relied heavily on manual data imports and validations. Each new customer provided files in different formats, often with inconsistent column naming, missing values, or schema mismatches.

This led to:

  • Onboarding delays of weeks due to manual validation, scripting, and continuous back-and-forth communication with clients
  • High error risk when dealing with 100k–400k+ row 
CSV/Excel files
  • Inconsistent data quality, including incorrect types, 
missing mandatory fields, and encoding issues
  • Difficulty supporting complex data formats (XML, JSON, CSV, Excel) and meeting security or compliance requirements

What was done

Ralabs developed and delivered a data and structure validation solution tailored to the client’s needs. The system automated file ingestion, schema mapping, data validation, and reporting.

Key steps included:

Designing a data validation pipeline with configurable schema mapping per client

Building an admin and user portal to support role-based permissions for data uploads and validations

Integrating AI-powered semantic column matching using OpenAI to resolve inconsistent header naming

Implementing data type validation and lightweight type conversion (e.g., parsing dates, coercing text to numeric values)

Adding automated reporting and audit trail with exportable validation reports

Testing performance on large files (50–70MB, up to millions of rows) to ensure scalability

Preparing deployment for isolated server environments, with minimal infrastructure requirements

Implemented features:

01

CSV/Excel upload interface

Admins and end users can upload structured data files directly through a simple interface, eliminating manual file handoffs.

02

Schema management per tenant

Each client’s dataset is validated against its own configurable schema, ensuring onboarding stays consistent across tenants.

03

Column header and content matching with AI

The system combines exact matching with OpenAI-powered semantic matching to resolve inconsistent column names.

04

Missing and extra column detection

Automatically identifies required fields that are missing and flags surplus columns that can be ignored or mapped.

05

Data quality checks

Validates column values against expected types and ranges, and can enforce referential rules to catch logical inconsistencies.

06

Validation thresholds and error dashboard

Configurable controls define how strict the validation should be, with reporting dashboards that show mismatched fields and error rates.

07

Role-based access

Admins can manage schemas and imports, while regular users upload files and view reports in a limited scope.

08

Exportable validation reports

Validation results can be exported in Markdown or PDF for audit, QA, or client communication.

Results

01

Built a working prototype in just 2 weeks

A fully working prototype was created in just 2–3 weeks, enabling early testing and faster client feedback.

02

Validated CSV files up to 400k rows (50–70MB)

The prototype processed CSV files up to 400k rows (50–70MB) while maintaining accuracy in column mapping and validation.

03

Cut onboarding time from weeks to hours

Automated validation reduced manual file checks and cut onboarding time from weeks of scripting and corrections to hours.

04

Improved data quality with automatic error detection

The system flagged corrupted values, missing mandatory fields, and schema mismatches automatically, improving consistency.

05

Self-service onboarding

Clients can upload files and receive immediate validation feedback without waiting for engineering support.

06

Scalable foundation

The delivered pipeline supports multi-tenant onboarding and leaves room for future AI-driven anomaly detection and enrichment

Tech stack

Let’s talk solutions

    By submitting this form, you agree to our Privacy Policy.



    Roman Rodomansky

    CTO & Co-Founder at Ralabs

    Andrii Yasynyshyn

    CEO & Co-Founder at Ralabs

    Other cases

    You got it right!

    Only 21% of people can identify an accessible visual.

    Your question