Problem

Data Localization & Control Challenges in AI

Data Localization & Control Challenges in AI

As India advances its digital transformation, large volumes of voice, image, and health-related data are being gathered to train AI models—often containing sensitive personal information that must comply with strict data localization norms. Existing tools frequently rely on third-party servers or lack the flexibility to enforce privacy and regulatory mandates. This creates an urgent need for solutions that enable project-level control, real-time monitoring, and secure, scalable cloud deployment within Indian jurisdiction.

As India advances its digital transformation, large volumes of voice, image, and health-related data are being gathered to train AI models—often containing sensitive personal information that must comply with strict data localization norms. Existing tools frequently rely on third-party servers or lack the flexibility to enforce privacy and regulatory mandates. This creates an urgent need for solutions that enable project-level control, real-time monitoring, and secure, scalable cloud deployment within Indian jurisdiction.

Our Solution

We’ve built a modular, privacy-conscious data collection platform that gives you full control from form creation to cloud storage. The platform is ideal for teams working with healthcare, voice, or image data in regulated environments.

We’ve built a modular, privacy-conscious data collection platform that gives you full control from form creation to cloud storage. The platform is ideal for teams working with healthcare, voice, or image data in regulated environments.

You can deploy the solution on any cloud, be it AWS, GCP, or your own on-prem setup—while ensuring data never leaves Indian territory. It also includes project management capabilities, enabling you to define forms, assign collectors, and monitor collection progress seamlessly.

You can deploy the solution on any cloud, be it AWS, GCP, or your own on-prem setup—while ensuring data never leaves Indian territory. It also includes project management capabilities, enabling you to define forms, assign collectors, and monitor collection progress seamlessly.

The Bigger Picture

This isn’t just a data collection tool—it’s a foundation for sovereign AI development. Open-sourcing this platform means anyone building privacy-respecting, India-first AI solutions can adopt it, extend it, and deploy it confidently.

This isn’t just a data collection tool—it’s a foundation for sovereign AI development. Open-sourcing this platform means anyone building privacy-respecting, India-first AI solutions can adopt it, extend it, and deploy it confidently.

Who can use it

Government
Government

Large-scale surveys or building health/voice datasets

Large-scale surveys or building health/voice datasets

AI Research labs
AI Research labs

Domain-specific multimedia data while meeting ethical and regulatory standards

Domain-specific multimedia data while meeting ethical and regulatory standards

NGOs
NGOs

Operating in remote or offline regions

Operating in remote or offline regions

Health Tech
Health Tech

Collecting regulated patient data

Collecting regulated patient data

Why Is This Important Now?

  • Data is the new oil, but we must extract it responsibly.


  • AI model training depends on large-scale, high-quality data—often containing sensitive information.


  • India’s data governance priorities require that this data be stored and processed within national boundaries.


  • By enabling secure, compliant, and scalable data collection, this platform ensures you're future-ready.

  • Data is the new oil, but we must extract it responsibly.


  • AI model training depends on large-scale, high-quality data—often containing sensitive information.


  • India’s data governance priorities require that this data be stored and processed within national boundaries.


  • By enabling secure, compliant, and scalable data collection, this platform ensures you're future-ready.

Key Features & Functionality

Key Features & Functionality

Data Ingestion Layer

Data Ingestion Layer

The primary interface for users to collect and manage field data efficiently.

The primary interface for users to collect and manage field data efficiently.

Offline-First

Offline-First

Collect data in low/no connectivity zones and sync when online

Collect data in low/no connectivity zones and sync when online

Role-Based Login

Role-Based Login

Secure access based on user roles; collectors only see relevant forms

Secure access based on user roles; collectors only see relevant forms

Latest Dataset Sync

Latest Dataset Sync

Automatically fetches updated forms for offline use

Automatically fetches updated forms for offline use

Multimodal Data Collection

Multimodal Data Collection

Capture text, images, videos, and audio

Capture text, images, videos, and audio

Informed Consent

Informed Consent

Built-in prompts to collect user consent ethically

Built-in prompts to collect user consent ethically

Better Outcomes

Better Outcomes

View pending forms and monitor sync status per project

View pending forms and monitor sync status per project

Storage Status Monitoring

Storage Status Monitoring

Alerts users on device storage capacity to prevent data loss

Alerts users on device storage capacity to prevent data loss

Draft Support

Draft Support

Save partial forms and resume later

Save partial forms and resume later

Magic URL

Generate shareable links to forms that work across devices.

Generate shareable links to forms that work across devices.

Project Management Dashboard

A centralized control panel for admins and researchers to manage forms, monitor field activity, and securely export datasets.

Form Builder

Design and customize data collection forms

User & Project Assignment

Assign datasets and tasks to field users

Live Activity Tracking

Visualize real-time field data submission status

Sync & Storage Monitor

Track sync success, device storage, and usage patterns

Secure Data Export

Export clean, encrypted datasets to use elsewhere

Performance Indicators

Manual Review Time

Measures grievance resolution speed with automation

Query Resolution

Tracks quality and usefulness of automated responses

Citizen Satisfaction

Based on user feedback and trust-building outcomes

Usage Trends

Includes grievance volume, repeat users, and adoption rates

Error Rate

Monitors incorrect routing and resolution outcomes

Efficiency Gains

Tracks time and cost savings for government teams

System Health

Assesses department-wise responsiveness and service quality

Model Accuracy

WER, BLEU, and classification metrics for NLP/ML models

Technical Architecture

Technical Architecture

Key Components

Key Components

  • Django Backend

  • Data Collector PWA

  • Admin Portal UI app

  • Postgres Database

  • S3 Storage Backend

  • Django Backend

  • Data Collector PWA

  • Admin Portal UI app

  • Postgres Database

  • S3 Storage Backend

Dependencies

Dependencies

We have a hard dependency on s3 as of now, rest of the tool is platform independent

We have a hard dependency on s3 as of now, rest of the tool is platform independent

How to Use

Pre-requisties

(Languages, libraries, system requirements)

Python

Version 3.8+

Storage

10 GB+

Usage Guide

Follow these steps to use the system

Opportunities for colloboration

We encourage contributions to

  • Developers to improve the PWA, offline syncing, and backend integrations

  • Designers to enhance usability for field users and admins

  • NGOs or research teams to pilot the tool in real-world settings

  • Contributors to help with documentation, testing, and deployment support

Inner-Source Info

This project is licensed under the Apache License 2.0, a permissive inner-source license that allows commercial use, modification, distribution, and private use. It requires preserving copyright and license notices, grants contributors’ patent rights, and permits redistribution under different terms without mandating source code disclosure.

This project is licensed under the Apache License 2.0, a permissive inner-source license that allows commercial use, modification, distribution, and private use. It requires preserving copyright and license notices, grants contributors’ patent rights, and permits redistribution under different terms without mandating source code disclosure.

Interested in Forking?

Reach out to us for more information on the source code, repository links and detailed usage guides—we'll email them to you!

Contact Us

Contact Us

Interested in Forking?

Reach out to us for more information on the source code, repository links and detailed usage guides—we'll email them to you!

Contact Us

Contact Us

Interested in Forking?

Reach out to us for more information on the source code, repository links and detailed usage guides—we'll email them to you!

Contact Us

Contact Us

Contributors

Team or Contributors

Aakash Pant

Engineering Manager

Vipin Samaria

Software Development Engineer II

Nehal Singhal

Product Manager

Contact Persons

Nehal Singhal

Product Manager

Email ID:

community.kiran@wadhwaniai.org

About

Challenges

Developer Library

Resources

Contact

CompanyName @ 202X. All rights reserved.

About

Challenges

Developer Library

Resources

Contact

CompanyName @ 202X. All rights reserved.

About

Challenges

Developer Library

Resources

Contact

CompanyName @ 202X. All rights reserved.

Wadhwani AI @ 2025. All rights reserved.