Data (24)
-
Dictionary FOSS Alar Kannada - English Corpus
Authoritative open data Kannada - English dictionary corpus by V. Krishna. 150,000 Kannada entries and 240,000 English definitions.
-
FOSS Bengali News Corpus
Bengali news article corpus from indicnlp.org.
-
Dictionary FOSS Datuk Corpus
Authoritative Malayalam - Malayalam dictionary corpus with 83,000 entries and 106,000 definitions.
-
FOSS DNLP Telugu Corpus
WAC style Telugu text corpus with 23+ million sentences and 280+ million tokens.
-
Dictionary FOSS English-Odia word pairs
This dataset contains 2,15,000+ English-Odia word/phrase pairs.
-
FOSS Grandham
Grandham is a project intended to make available reliable bibliographic information on all Malayalam books published in Kerala and elsewhere.
-
FOSS HindEnCorp
Paralel Hindi - English sentence alined translation text corpus with 132,000+ entries.
-
Audio Proprietary Hindi Speech Recognition Corpus
Hindi speech audio recordings from 200+ speakers spanning 308 hours with transcriptions annotations.
-
FOSS Hindi WikiData Translation Corpus
87,000+ Hindi - English translation word pairs extracted from Wikipedia.
-
FOSS IIT-Bombay English - Hindi corpus
English - Hindi translation corpus.
-
FOSS Indic Tweet Corpus
Text scrapped from Twitter with 7.9+ million entries for Telugu and 17.6+ million entries for Hindi.
-
FOSS Indica Languages Parallel Corpus
Parallel translation corpus of multiple Indian languages.
-
Audio Proprietary IndicTTS
A corpus of Indian languages covering 13 major languages of India with 10,000+ spoken sentences each of mono and English recorded by both male and female native speakers.
-
FOSS Malayalam News Corpus
Malayalam news corpus scraped from multiple Malayalam news portals from indicnlp.org.
-
FOSS Malayalam Speech Corpus
Speech corpus curated by Swathanthra Malayalam Computing.
-
FOSS Malayalam Text Corpus
A collection of assorted Malayalam text with 800,000+ lines and 9.8+ million words curated by Swathanthra Malayalam Computing.
-
Audio FOSS Microsoft Speech Corpus
Conversational and phrasal speech training and test data for Telugu, Tamil and Gujarati along with transcripts. 125,000+ entries.
-
FOSS Odia monolingual corpus
This dataset contains 5,50,000+ Odia news articles.
-
Dictionary FOSS Odia-Odia structured dictionary dataset
This dataset is a dictionary containing 1,21,658 Odia words and their meaning.
-
Dictionary FOSS Olam English - Malayalam Corpus
Open data crowd sourced English - Malayalam dictionary corpus with 60,000 entries and 125,000 definitions.
-
Dictionary FOSS Shabdatharavali
Shabdatharavali 1917 edition.
-
FOSS Tamil Corpus
Tamil corpus and Tamil dataset from various sources
-
FOSS Tamil News Corpus
Tamil news article corpus from indicnlp.org.
-
FOSS Urdu Monolingual Corpus
5.4+ million automatically parts-of-speech text tagged Urdu entries.