Project 2 FAQ
This page was last updated: Friday, 24-Jul-2020 09:43:32 AEST
by blair@cse.unsw.edu.au
1. Is the provided return values for post incorrect in the provided preprocessing class?
Yes, an earlier version of the assignment had two values being returned here (batch and vocab), however this function should only return batch.
2. Can we use additional libraries in our preprocessing?
No. An earlier version of the assignment stated that additional libraries could be requested to be installed however this is no longer the case and only torch, torchtext, and the python standard lib may be used.
3. Why are the lengths of the sequences provided in part 2?
If you examine the input tensors you will see that they are
*padded*. This is because each sequence will be a different length, so
to allow PyTorch to function in the batch setting, padding tokens, in
this case 0-vectors, are added to either the beginning or end of the
individual sequences. After doing this, you can stack the individual
sequences and you have your batch (another neat thing PyTorch will do
in the dataloader is group similar length sentences together, so this
padding is minimized and computation is faster).
In any case, this means that the *output* of the LSTM will contain
hidden and output states generated from these padded tokens. It's
possible that the reset gate may learn to ignore these, but it's not
guaranteed - so the final output (both in the first returned value and
the second) will not be the hidden state at the end of the sequence,
but the hidden state at the end of the padding. To get around this, we
need to let PyTorch know what to return. Conveniently, PyTorch has a
built-in object that an LSTM will already process in this way. This is
torch.nn.utils.rnn.PackedSequence
, which contains both
the padded input tensor, and each sequences lengths. If an object of
this type is passed into torch.nn.LSTM
, the final hidden
state returned (which is what you want) is the hidden state calculated
at the end of the sequence. Note that torch.nn.LSTM
returns output in the form "output, (h_n, _c_n)"
;
h_n
is what we are referring to here. If you want to use
the output, you will need to get the tensor inside the PackedSequence
object.
Finally, note that PyTorch has a built-in method for creating a
PackedSequence from a tensor and list of
lengths: torch.nn.utils.rnn.pack_padded_sequence
.
Back to Project 2 | Back to
the main page