Improve the model prediction time in huggingface transformer models without GPU

I am using huggingface transformers models for quite a few tasks, it works good but the only problem is the response time. It takes around 6-7 seconds to generate result while some times it even takes around 15-20 seconds. I tried on google collab using GPU, the performance in GPU is too fast within just a seconds it processes the result. As there is a limitation of GPU on my current server is there any way to increase the response time of the models using CPU only.

Currently using Text Summarization using GooglePegasus Model.
https://huggingface.co/google/pegasus-xsum

And Parrot Paraphrasing : which internally uses bert model from transformers
https://huggingface.co/prithivida/parrot_paraphraser_on_T5

slight improvement will also help!

Source: Python Questions

LEAVE A COMMENT