Building an Elasticsearch-Powered Question Answering System: Setup and Pipeline Construction

Explore the construction of an open-source Question Answering System leveraging Elasticsearch and Haystack, utilizing the 'Error Analysis of Pretrained Language Models (PLMs) in English-to-Arabic Machine Translation' paper as the open-source file.

open-domain question answering system

In this notebook, we'll configure the initial segment of our open-domain question answering system, focusing on setting up the database component using Elasticsearch. We'll utilize pypdf2 to import the "Error Analysis of Pretrained Language Models (PLMs) in English‑to‑Arabic Machine Translation" PDF file. Then, we'll employ a BM25 retriever with the deepset/bert-base-cased-squad2 PLM model to generate queries related to the content of the PDF.

Steps

Elasticsearch Setup

For this project, we will be building an open-domain question answering system. There are three major components to such a system:

  • Database

  • Retriever

  • Reader

In this notebook we will setup the first part, the database - where we will be using Elasticsearch.

Haystack pipeline here will build the haystack pipeline that will employ a BM25 retriever with the deepset/bert-base-cased-squad2 PLM model to generate queries related to the content of the PDF.

check the article : Building an Elasticsearch-Powered Question Answering System: Setup and Pipeline Construction