We fine-tuned early versions of our model on a corpus of documents with manually-generated summaries that were consistent with typical use cases. Self-supervised pre-training results in an ML model that has general language understanding and generation capabilities, but a subsequent fine-tuning stage is critical for the model to adapt to the application domain. However, a number of challenges remained to apply this research advancement into a product.Īpplying Recent Research Advances to Google Docs Data Pegasus achieved state-of-the-art results on a varied set of summarization datasets. The intuition is to make the pre-training as close as possible to the summarization task. In particular, GSP attempts to mask sentences that are considered essential to the document through different heuristics. In Pegasus pre-training, also called Gap Sentence Prediction (GSP), full sentences from unlabeled news articles and web documents are masked from the input and the model is required to reconstruct them, conditioned on the remaining unmasked sentences. The Pegasus work took this idea one step further, by introducing a pre-training objective customized to abstractive summarization. Then, in a subsequent fine-tuning stage, the model learns to apply these abilities on a specific task, such as summarization or question answering. In self-supervised pre-training, a model uses large amounts of unlabeled text to learn general language understanding and generation capabilities. The combination of Transformers with self-supervised pre-training (e.g., BERT, GPT, T5) led to a major breakthrough in many NLU tasks for which limited labeled data is available. Still, these models require large amounts of manually labeled data to train sufficiently, so the advent of Transformers alone was not enough to significantly advance the state-of-the-art in document summarization. The introduction of Transformers provided a promising alternative to RNNs because Transformers use self-attention to provide better modeling of long input and output dependencies, which is critical in document summarization. Early applications of the sequence-to-sequence paradigm used recurrent neural networks (RNNs) for both the encoder and decoder. A neural network then learns to map input tokens to output tokens. A popular method for combining NLU and NLG is training an ML model using sequence-to-sequence learning, where the inputs are the document words, and the outputs are the summary words. Document writers can then view, edit, or ignore the suggested document summary.Īutomatically generated summaries would not be possible without the tremendous advances in ML for natural language understanding (NLU) and natural language generation (NLG) over the past five years, especially with the introduction of Transformer and Pegasus.Ībstractive text summarization, which combines the individually challenging tasks of long document language understanding and generation, has been a long-standing problem in NLU and NLG research. Building on grammar suggestions, Smart Compose, and autocorrect, we see this as another valuable step toward improving written communication in the workplace.Ī blue summary icon appears in the top left corner when a document summary suggestion is available. While all users can add summaries, auto-generated suggestions are currently only available to Google Workspace business customers. Readers can also use this section, along with the outline, to understand and navigate the document at a high level. However, the document writer maintains full control - accepting the suggestion as-is, making necessary edits to better capture the document summary or ignoring the suggestion altogether. Today we describe how this was enabled using a machine learning (ML) model that comprehends document text and, when confident, generates a 1-2 sentence natural language description of the document content. To help with this, we recently announced that Google Docs now automatically generates suggestions to aid document writers in creating content summaries, when they are available. However, composing a document summary can be cognitively challenging and time-consuming, especially when a document writer is starting from scratch. When a new document is received, readers often wish it included a brief summary of the main points in order to effectively prioritize it. Posted by Mohammad Saleh, Software Engineer, Google Research, Brain Team and Anjuli Kannan, Software Engineer, Google Docsįor many of us, it can be challenging to keep up with the volume of documents that arrive in our inboxes every day: reports, reviews, briefs, policies and the list goes on.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |