India is prioritizing the use of Nvidia chips to build "sovereign AI" infrastructure to process data within its borders for various applications, particularly those involving India's diverse languages and public services. The universal population data itself is not explicitly mentioned as one of the first datasets to be "coded," but government and private initiatives will leverage non-personal datasets from various sectors to build foundational AI models.
Key areas of focus for initial data processing and AI model development include:
- Indian Languages: A primary objective is to develop large language models (LLMs) and small language models (SLMs) trained on India's 300+ distinct languages and dialects to serve the large population. Tech Mahindra is using Nvidia's Hindi-language AI model to develop a custom model called "Indus 2.0," focused on Hindi and its dialects.
- Government Services (AI4Bharat): The AI systems are intended to enhance governance and public service access through initiatives like "2047: Citizen Connect" and "AI4Pragati".
- Sector-Specific Data: Core datasets for a national AI platform (AIKosha) are being contributed by ministries related to agriculture, weather forecasting, and Bhashini (the national language technology mission).
- Healthcare and Legal Tech: Generative AI is expected to have a profound impact on industries like healthcare for personalized medicine and legal tech for document analysis.
- Enterprise and Research: The new AI factories are intended to support large businesses, startups, and research centers running AI workloads in the cloud and on-premises for applications like digital content creation and financial services.
The emphasis on data sovereignty is to ensure that AI models are trained on local data under Indian rules, preventing foreign entities from exploiting Indian datasets and keeping the value within the country. While processing of public data is part of the long-term plan for governance, the initial immediate focus is on developing foundational language models and sector-specific applications using non-personal data where available.




No comments:
Post a Comment