Word Sense Disambiguation Using Semantic Web for Tamil to English Statistical Machine Translation

Santosh Kumar T.S.    


        Machine Translation has been an area of linguistic research for almost more than two decades now. But it still remains a very challenging task for devising an automated system which will deliver accurate translations of the natural languages. However, great strides have been made in this field with more success owing to the development of technologies of the web and off late there is a renewed interest in this area of research.         Technological advancements in the preceding two decades have influenced Machine Translation in a considerable way. Several MT approaches including Statistical Machine Translation greatly benefitted from these advancements, basically making use of the availability of extensive corpora. Web technology web3.0 uses the semantic web technology which represents any object or resource in the web both syntactically and semantically.  This type of representation is very much useful for the computing systems to search any content on the internet similar to lexical search and improve the internet based translations making it more effective and efficient.       In this paper we propose a technique to improve existing statistical Machine Translation methods by making use of semantic web technology. Our focus will be on Tamil and Tamil to English MT. The proposed method could successfully integrate a semantic web technique in the process of WSD which forms part of the MT system. The integration is accomplished by using the capabilities of RDFS and OWL into the WSD component of the MT model. The contribution of this work lies in showing that integrating a Semantic web technique in the WSD system significantly improves the performance of a statistical MT system for a translation from Tamil to English.       In this paper we assume the availability of large corpora in Tamil language and specific domain based ontologies with Tamil semantic web technology using web3.0. We are positive on the expansion and development of Tamil semantic web and subsequently infer that Tamil to English MT will greatly improve the disambiguation concept apart from other related benefits. This method could enable the enhancement of translation quality by improving on word sense disambiguation process while text is translated from Tamil to English language. This method can also be extended to other languages such as Hindi and Indian Languages.