PDF Topic Extractor
- Role: NLP Engineer
- Client: M&G Investment
- Technology: NLP - Phrase Extraction
- Demo URL: Click Here
Project Description :
The objective of this project is to extract text from PDF and sucessfully clean the text. We later run an unsupervised phrase extraction algorithem to classify text into a different topics. This helps user to Analyze any document into a well defined section and he can query the topic for which the tool will rerturn the most relevant paragraphs.
Responsibilities: :
- PDF scraping, text extraction and preprocessing.
- Text Cleaning – font Identification.
- Extracting important phrases from short paragraph text.
- User input text matching to different extracted phrases.
- Making of Streamlit based UI of the same.
- Packaging the model into docker and deploy on heroku server.