Published in Findings of the Association for Computational Linguistics: ACL-IJCNLP, 2022
Authors: Kapoor, A., Dhawan, M., Goel, A., Arjun, T.H, Bhatnagar, A., Agrawal, V., Agrawal, A., Bhattacharya, A., Kumaraguru, P., Modi, A.
Keywords: AI for Social Good, Judicial AI, MultiTask Learning, Low-Resource Language, Hierarchical Transformers
code, pdf
Abstract
Populous countries (e.g., India) are burdened with a considerable backlog of legal cases. This calls for the development of automated systems that could process legal documents and augment legal practitioners. To develop such data-driven systems, there is a dearth of high-quality corpora. The problem gets even more pronounced in the case of low resource language (e.g., Hindi). In this resource paper, we introduce the Hindi Legal Documents Corpus (HLDC), a corpus of 900K legal documents in Hindi. The documents are cleaned and structured to enable the development of downstream applications. Further, as a usecase for the corpus, we introduce the task of Bail Prediction. We experiment with a battery of models and propose a multi-task learning (MTL) based model. MTL models use summarization as an auxiliary task along with bail prediction as the main task. Results on different models are indicative of the need for further research in this area.