HIERARCHICAL ENCODING AND CONDITIONAL ATTENTION IN NEURAL MACHINE TRANSLATION
Natalia Trankova , MSc - Skolkovo Institute of Science and Technology, New York, 10280, USA Dmitrii Rykunov , BSc – National Research University Higher School of Economics, New York, 10013, USA Ivan Serov , McKinsey & Company, Data Science division, New York, 10007, USA Ivan Giganov , MSc - Northwestern University, Chicago, IL, 60654, USA Yaroslav Starukhin , QuantumBlack, AI by McKinsey, Boston, MA 02110 USAAbstract
The advent of Transformer models has significantly advanced Neural Machine Translation (NMT), particularly in sequence-to-sequence tasks, yet challenges remain in maintaining coherence and meaning across longer texts due to the model's traditional focus on independent phrase translation. This study addresses these limitations by proposing an enhanced NMT framework that integrates cross-sentence context through redesigned positional encoding, hierarchical encoding, and conditional attention mechanisms. The research critiques the shortcomings of existing positional encoding methods in capturing discourse-level context, introducing a novel hierarchical strategy that preserves structural and semantic relationships between sentences within a document. By employing a source2token self-attention mechanism to encode sentences and a conditional attention mechanism to selectively aggregate the most relevant context, the proposed model aims to improve translation accuracy and consistency while reducing computational complexity. The findings demonstrate that this approach not only enhances the quality of translations but also mitigates the computational costs typically associated with processing longer sequences. However, the model's effectiveness is contingent on the presence of clear document structure, which may limit its applicability in more irregular texts. The study's contributions offer significant implications for the development of more contextually aware and computationally efficient NMT systems, with potential applications in domains requiring high fidelity in translation, such as legal and academic fields. The proposed methods pave the way for future research into further optimization of context integration in NMT and exploring its application in multilingual and specialized domain contexts. Limitations include the additional computational overhead introduced by the hierarchical and conditional attention mechanisms, which may affect performance in low-resource environments. Nonetheless, this work represents a substantial step forward in addressing the complexities of document-level translation.
Keywords
Neural Machine Translation (NMT), Transformer Model, Cross-Sentence Context
References
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, "Attention Is All You Need.," CoRR abs/1706.03762 (2017). arXiv:1706.03762 http://arxiv.org/abs/1706.03762, 2017.
J. Gehring, M. Auli, D. Grangier, D. Yarats and Y. N. Dauphin, "Convolutional Sequence to Sequence Learning," CoRR abs/1705.03122 (2017). arXiv:1705.03122 http://arxiv.org/abs/1705.03122, 2017.
H. Choi, K. Cho and Y. Bengio, "Context-DependentWord Representation for Neural Machine Translation," CoRR abs/1607.00578 (2016). arXiv:1607.00578 http://arxiv.org/abs/1607.00 [2]578 , 2016.
J. Tiedemann and Y. Scherrer, "Neural Machine Translation with Extended Context," CoRR abs/1708.05943 (2017). arXiv:1708.05943 http://arxiv.org/abs/1708.05943, 2017.
T. Shen, T. Zhou, G. Long, J. Jiang, S. Pan and C. Zhang, "DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding," CoRR abs/1709.04696 (2017). arXiv:1709.04696 http://arxiv.org/abs/1709.04696, 2017.
N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton and J. Dean, "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer," CoRR abs/1701.06538 (2017). arXiv:1701.06538 http://arxiv.org/abs/1701.06538, 2017.
Y. Bengio, N. Léonard and A. C. Courville, "Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation," CoRR abs/1308.3432 (2013). arXiv:1308.3432 http://arxiv.org/abs/1308.3432, 2013.
Article Statistics
Copyright License
Copyright (c) 2024 Natalia Trankova, Dmitrii Rykunov, Ivan Serov, Ivan Giganov, Yaroslav Starukhin
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.