Explain how the Transformer architecture differs from traditional sequence-to-sequence models in machine learning. Additionally, discuss how attention mechanisms within Transformers contribute to their performance, providing an example of a real-world application.