| Abstract: | The importance of conserving languages, any language in general, using Natural language processing (NLP). This can be done by building corpus that can be accessed by computer systems, so it becomes feasible to build linguistic tools that can help in the growth of the language. The objective of the work is to demonstrate, that in the present era the creation and enrichment of existing corpora and tools for any low resource language is a fundamental requirement for conserving the language, as well as a key facilitator. NLP resources for many Indian languages are not keeping pace with the growing digitization of the economy and of public life. The potential for NLP to cater to digital markets as well as social media flourishing in the Indian hinterland is growing rapidly, but the availability of corpora and NLP tools is lagging behind. The list of deprived languages includes not only the 18 officially recognized regional languages in India, but many more spoken languages of small communities or tribal populations. Automatic NLP of Indian languages of small communities like has been limited by the scarcity of the language-specific digital linguistic resources. Raw textual data are available in ample quantities, but they are not in a form suited for NLP. Among the different kinds of data are newspaper articles, literary works and poems. Such data sets need to be collected into corpora, and further, annotated in different ways according to the intended application. NLP techniques are a challenge for low resource languages because NLP techniques require linguistic knowledge that can be only developed by experts and some by speakers of that language, and they require a lot of labelled data which is again expensive to generate. |