In comparison to frequently made use of Decoder-only Transformer models, seq2seq architecture is more ideal for training generative LLMs given much better bidirectional awareness on the context.For this reason, architectural specifics are similar to the baselines. What's more, optimization configurations for several LLMs can be found in Table VI a