I am using GPT-J for text summarizer in Hindi. I gave a prompt to it as a text and asked to summarize it. But it shows an error as "ValueError: index can’t contain negative values". I have tried the same with the english passage and that is working fine. def infer(context, top_p=0.9, temp=1.0, gen_len=512): tokens ..
am using Jupyter Lab to run. It has pre-installed tf2.3_py3.6 kernel installed in it. It has 2 GPUS in it. PyTorch Lightning Version (e.g., 1.3.0): ‘1.4.6’ PyTorch Version (e.g., 1.8): ‘1.6.0+cu101′ Python version: 3.6 OS (e.g., Linux): system=’Linux’ CUDA/cuDNN version: 11.2 How you installed PyTorch (conda, pip, source): pip I am saving the best model ..
I have a batch size of 1 and number of transformer layers is 1. I have images that are very big so I have created embeddings using ResNet18 as an intermediate representation for tiles of my images. Because my images don’t include different number of tiles, I also have used some sort of masking/zero filling ..
Consider the following training corpora: dataset1: composed of French instances dataset2: dataset1 + Arabic instances test_dataset (for both scenarios): composed of French instances (the same annotation guidelines were used for both languages). After analyzing the results of our preliminary experimental setup, we chose BERT as our baseline system. Considering the different languages involved, we experimented ..
When decoding / translating a test dataset after training on the base Transformer model (Vaswani et. al.), I sometimes see this token "unk" in the ouput. "unk" here refers to an unknown token, but my question is what is the reasoning behind that? Based on https://nlp.stanford.edu/pubs/acl15_nmt.pdf, does it mean that the vocab I built for ..
I want to train a transformer model for time series prediction without relying on teacher forcing. Here’s my training loop: # src shape is [n_batch, src_length, d_model] –> [N, S, D] # tgt and gt (groundtruth) shape is [n_batch, tgt_length, d_model] –> [N, T, D] l = 0. for src, tgt, gt in loader: src, ..
I have been reading the official guide here (https://www.tensorflow.org/text/tutorials/transformer) to try and recreate the Vanilla Transformer in Tensorflow. I notice the dataset used is quite specific, and at the end of the guide, it says to try with a different dataset. But that is where I have been stuck for a long time! I am ..
I want to update the XLSR wav2vec2 weights via unlabeled training data(.wav audios) of my domain. Or you can say that I want to pretrain it in that way it can get exposed to my data before I start it to fine-tune it on label data. This is the model from hugging face model = ..
I am working on a problem where I need to reconstruct an image in a Pix2Pix-like manner. The data has some attributes what would make a Transformer/Perceiver IO favourable in my opinion (e.g. the y axis contains information about the location, but is not necessarily neighboured to the row above, hence Convolutions assume a structure ..
Can I extract a similar word from the model? Also, is it possible to assess the similarity between two words, not sentences, but only words? I’ll hope you help. Source: Python..