Data validation as a service – accelerating client data onboarding
Customer location: The US
Dedicated team behind the project
Tech Lead
Senior Software Data Engineer
Senior Software Engineer
Customer Success Manager
The client
WayThru is a US-based fintech multi-tenancy platform that helps debt management companies operate more efficiently by scheduling debts through various payment providers. The company’s ethos is rooted in humanity, promoting a modern and empathetic approach to debt recovery.
As a mid-sized SaaS platform processing sensitive financial data for multiple customers, WayThru needed to replace manual client onboarding with an automated pipeline capable of handling diverse file formats, large data volumes, and strict compliance requirements.
The challenge
The client’s onboarding process relied heavily on manual data imports and validations. Each new customer provided files in different formats, often with inconsistent column naming, missing values, or schema mismatches.
This led to:
- Onboarding delays of weeks due to manual validation, scripting, and continuous back-and-forth communication with clients
- High error risk when dealing with 100k–400k+ row CSV/Excel files
- Inconsistent data quality, including incorrect types, missing mandatory fields, and encoding issues
- Difficulty supporting complex data formats (XML, JSON, CSV, Excel) and meeting security or compliance requirements
What was done
Ralabs developed and delivered a data and structure validation solution tailored to the client’s needs. The system automated file ingestion, schema mapping, data validation, and reporting.
Designing a data validation pipeline with configurable schema mapping per client
Building an admin and user portal to support role-based permissions for data uploads and validations
Integrating AI-powered semantic column matching using OpenAI to resolve inconsistent header naming
Implementing data type validation and lightweight type conversion (e.g., parsing dates, coercing text to numeric values)
Adding automated reporting and audit trail with exportable validation reports
Testing performance on large files (50–70MB, up to millions of rows) to ensure scalability
Preparing deployment for isolated server environments, with minimal infrastructure requirements
Implemented features:
CSV/Excel upload interface
Admins and end users can upload structured data files directly through a simple interface, eliminating manual file handoffs.
Schema management per tenant
Each client’s dataset is validated against its own configurable schema, ensuring onboarding stays consistent across tenants.
Column header and content matching with AI
The system combines exact matching with OpenAI-powered semantic matching to resolve inconsistent column names.
Missing and extra column detection
Automatically identifies required fields that are missing and flags surplus columns that can be ignored or mapped.
Data quality checks
Validates column values against expected types and ranges, and can enforce referential rules to catch logical inconsistencies.
Validation thresholds and error dashboard
Configurable controls define how strict the validation should be, with reporting dashboards that show mismatched fields and error rates.
Role-based access
Admins can manage schemas and imports, while regular users upload files and view reports in a limited scope.
Exportable validation reports
Validation results can be exported in Markdown or PDF for audit, QA, or client communication.
Results
Built a working prototype in just 2 weeks
A fully working prototype was created in just 2–3 weeks, enabling early testing and faster client feedback.
Validated CSV files up to 400k rows (50–70MB)
The prototype processed CSV files up to 400k rows (50–70MB) while maintaining accuracy in column mapping and validation.
Cut onboarding time from weeks to hours
Automated validation reduced manual file checks and cut onboarding time from weeks of scripting and corrections to hours.
Improved data quality with automatic error detection
The system flagged corrupted values, missing mandatory fields, and schema mismatches automatically, improving consistency.
Self-service onboarding
Clients can upload files and receive immediate validation feedback without waiting for engineering support.
Scalable foundation
The delivered pipeline supports multi-tenant onboarding and leaves room for future AI-driven anomaly detection and enrichment
Tech stack
Other cases
Team size: 5 developers
Team size: 5 developers
Team size: 6 developers
Team size: 6 developers
Team size: 4 developers
Team size: 3 developers
Team size: 4 developers
Team size: 6 developers
Team size: 3 developers
