Get To The Point: Summarization with Pointer-Generator Networks

 

Abstract  

Neural sequence-to-sequence models have  provided a viable new approach for abstractive  text summarization (meaning  they are not restricted to simply selecting  and rearranging passages from the original  text). However, these models have two  shortcomings: they are liable to reproduce  factual details inaccurately, and they tend  to repeat themselves. In this work we propose  a novel architecture that augments the  standard sequence-to-sequence attentional  model in two orthogonal ways. First,  we use a hybrid pointer-generator network  that can copy words from the source text  via pointing, which aids accurate reproduction  of information, while retaining the  ability to produce novel words through the  generator. Second, we use coverage to  keep track of what has been summarized,  which discourages repetition. We apply  our model to the CNN / Daily Mail summarization  task, outperforming the current  abstractive state-of-the-art by at least 2  ROUGE points

Existing System 

The extractive  approach is easier, because copying large chunks of text from the source document ensures  baseline levels of grammaticality and accuracy.  On the other hand, sophisticated abilities that are  crucial to high-quality summarization, such as  paraphrasing, generalization, or the incorporation  of real-world knowledge, are possible only in an  abstractive framework (see Figure 5).  Due to the difficulty of abstractive summarization,  the great majority of past work has been extractive  (Kupiec et al., 1995; Paice, 1990; Saggion  and Poibeau, 2013). However, the recent success  of sequence-to-sequence models (Sutskever et al., 2014), in which recurrent neural networks  (RNNs) both read and freely generate text, has  made abstractive summarization viable (Chopra  et al., 2016; Nallapati et al., 2016; Rush et al.,  2015; Zeng et al., 2016). Though these systems  are promising, they exhibit undesirable behavior  such as inaccurately reproducing factual details,  an inability to deal with out-of-vocabulary (OOV)  words, and repeating themselves

Proposed System

In this paper we present an architecture that  addresses these three issues in the context of  multi-sentence summaries. While most recent abstractive  work has focused on headline generation  tasks (reducing one or two sentences to a  single headline), we believe that longer-text summarization  is both more challenging (requiring  higher levels of abstraction while avoiding repetition)  and ultimately more useful. Therefore we  apply our model to the recently-introduced CNN/  Daily Mail dataset (Hermann et al., 2015; Nallapati  et al., 2016), which contains news articles (39  sentences on average) paired with multi-sentence  summaries, and show that we outperform the stateof-  the-art abstractive system by at least 2 ROUGE  points.  Our hybrid pointer-generator network facilitates  copying words from the source text via pointing  (Vinyals et al., 2015), which improves accuracy  and handling of OOV words, while retaining  the ability to generate new words.

Conclusion 

In this work we presented a hybrid pointergenerator  architecture with coverage, and showed  that it reduces inaccuracies and repetition. We applied  our model to a new and challenging longtext  dataset, and significantly outperformed the  abstractive state-of-the-art result. Our model exhibits  many abstractive abilities, but attaining  higher levels of abstraction remains an open research  question.

References 

  1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference  on Learning Representations.
  2. Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, and Hui Jiang. 2016. Distraction-based neural networks for modeling documents. In International Joint  Conference on Artificial Intelligence.
  3. Jackie Chi Kit Cheung and Gerald Penn. 2014. Unsupervised sentence enhancement for automatic summarization. In Empirical Methods in Natural Language  Processing.
  4. Sumit Chopra, Michael Auli, and Alexander M Rush. 2016. Abstractive sentence summarization with attentive recurrent neural networks. In North American  Chapter of the Association for Computational  Linguistics.

5 .Michael Denkowski and Alon Lavie. 2014. Meteor  universal: Language specific translation evaluation  for any target language. In EACL 2014 Workshop  on Statistical Machine Translation.  John Duchi, Elad Hazan, and Yoram Singer. 2011.  Adaptive subgradient methods for online learning  and stochastic optimization. Journal of Machine  Learning Research 12:2121–2159.

  1. Jiatao Gu, Zheng Lu, Hang Li, and Victor OK Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Association for  Computational Linguistics.
  2. Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou, and Yoshua Bengio. 2016. Pointing the unknown words. In Association for Computational  Linguistics.
  3. Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt,Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines  to read and comprehend. In Neural Information  Processing Systems.  Hongyan Jing. 2000. Sentence reduction for automatic  text summarization. In Applied natural language  processing.
  4. Philipp Koehn. 2009. Statistical machine translation. Cambridge University Press.
  5. Julian Kupiec, Jan Pedersen, and Francine Chen. 1995. A trainable document summarizer. In International ACM SIGIR conference on Research and development  in information retrieval.