Articles | Open Access | DOI: https://doi.org/10.37547/tajet/Volume06Issue09-07

HIERARCHICAL ENCODING AND CONDITIONAL ATTENTION IN NEURAL MACHINE TRANSLATION

Natalia Trankova , MSc - Skolkovo Institute of Science and Technology, New York, 10280, USA
Dmitrii Rykunov , BSc – National Research University Higher School of Economics, New York, 10013, USA
Ivan Serov , McKinsey & Company, Data Science division, New York, 10007, USA
Ivan Giganov , MSc - Northwestern University, Chicago, IL, 60654, USA
Yaroslav Starukhin , QuantumBlack, AI by McKinsey, Boston, MA 02110 USA

Abstract

The advent of Transformer models has significantly advanced Neural Machine Translation (NMT), particularly in sequence-to-sequence tasks, yet challenges remain in maintaining coherence and meaning across longer texts due to the model's traditional focus on independent phrase translation. This study addresses these limitations by proposing an enhanced NMT framework that integrates cross-sentence context through redesigned positional encoding, hierarchical encoding, and conditional attention mechanisms. The research critiques the shortcomings of existing positional encoding methods in capturing discourse-level context, introducing a novel hierarchical strategy that preserves structural and semantic relationships between sentences within a document. By employing a source2token self-attention mechanism to encode sentences and a conditional attention mechanism to selectively aggregate the most relevant context, the proposed model aims to improve translation accuracy and consistency while reducing computational complexity. The findings demonstrate that this approach not only enhances the quality of translations but also mitigates the computational costs typically associated with processing longer sequences. However, the model's effectiveness is contingent on the presence of clear document structure, which may limit its applicability in more irregular texts. The study's contributions offer significant implications for the development of more contextually aware and computationally efficient NMT systems, with potential applications in domains requiring high fidelity in translation, such as legal and academic fields. The proposed methods pave the way for future research into further optimization of context integration in NMT and exploring its application in multilingual and specialized domain contexts. Limitations include the additional computational overhead introduced by the hierarchical and conditional attention mechanisms, which may affect performance in low-resource environments. Nonetheless, this work represents a substantial step forward in addressing the complexities of document-level translation.

Keywords

Neural Machine Translation (NMT), Transformer Model, Cross-Sentence Context

References

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, "Attention Is All You Need.," CoRR abs/1706.03762 (2017). arXiv:1706.03762 http://arxiv.org/abs/1706.03762, 2017.

J. Gehring, M. Auli, D. Grangier, D. Yarats and Y. N. Dauphin, "Convolutional Sequence to Sequence Learning," CoRR abs/1705.03122 (2017). arXiv:1705.03122 http://arxiv.org/abs/1705.03122, 2017.

H. Choi, K. Cho and Y. Bengio, "Context-DependentWord Representation for Neural Machine Translation," CoRR abs/1607.00578 (2016). arXiv:1607.00578 http://arxiv.org/abs/1607.00 [2]578 , 2016.

J. Tiedemann and Y. Scherrer, "Neural Machine Translation with Extended Context," CoRR abs/1708.05943 (2017). arXiv:1708.05943 http://arxiv.org/abs/1708.05943, 2017.

T. Shen, T. Zhou, G. Long, J. Jiang, S. Pan and C. Zhang, "DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding," CoRR abs/1709.04696 (2017). arXiv:1709.04696 http://arxiv.org/abs/1709.04696, 2017.

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton and J. Dean, "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer," CoRR abs/1701.06538 (2017). arXiv:1701.06538 http://arxiv.org/abs/1701.06538, 2017.

Y. Bengio, N. Léonard and A. C. Courville, "Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation," CoRR abs/1308.3432 (2013). arXiv:1308.3432 http://arxiv.org/abs/1308.3432, 2013.

Article Statistics

Copyright License

Download Citations

How to Cite

Natalia Trankova, Dmitrii Rykunov, Ivan Serov, Ivan Giganov, & Yaroslav Starukhin. (2024). HIERARCHICAL ENCODING AND CONDITIONAL ATTENTION IN NEURAL MACHINE TRANSLATION. The American Journal of Engineering and Technology, 6(09), 45–55. https://doi.org/10.37547/tajet/Volume06Issue09-07