Showing posts with label Data Science Application. Show all posts
Showing posts with label Data Science Application. Show all posts

Data Science Applications - Page Rank Text Summarization


What Is Text Summarization?


Text summarization is the most significant application in natural language processing.

It assists with reducing the quantity of original text and extracting just the relevant information.

The technique of text summarizing is also known as data reduction.

It entails generating an outline of the original text that allows the user to get key bits of information from that text in a much shorter amount of time.

Text Summarization Processes Types

The process of text summarizing may be categorized in many ways, including: The classification of the text summarizing process is shown in Figure.

As demonstrated, text summarization may be categorized into many categories, each of which can be further subdivided.

Depending on the number of documents Text summary is further divided into categories depending on the number of pages in a document: 

• Single: 

Because the outline is short, clear, and concise, it becomes more important.

Some subdocuments may be combined to form a single document.

They may be created out of certain subdocuments' documents that place unusual emphasis on different viewpoints, despite the fact that these reports all cover the same topic.

• Several: 

A multi-document summary is a technique for managing a large amount of data in multiple linked supply documents by including just the most important information or main concepts in a little amount of space.

A multi-document report has recently become a hot topic in automated summarization.

A. Based on the Usage Summary Text summarization may be further subdivided into the following categories depending on summary usage: 

• Generic Summaries: Generic summaries do not target any specific cluster since they are written for a large audience.

• Query-based: Query-based or subject-focused inquiries are tailored to an individual's or a group's unique requirements and address a single issue.


The goal of query-based text summarization is to extract fundamental information from the original text that answers the question.

The proper response is presented in a small, predetermined amount of words.

B. Techniques-based Text summarization may be further divided into subcategories based on the following techniques: 

• Supervised: 

  • Supervised text summarization is similar to supervised key extraction in that it is supervised.
  • Essentially, if you have a collection of documents and human-generated summaries for them, you can learn the characteristics of phrases that make them a good fit for inclusion in the summary.

• Unsupervised: 

  • The use of unsupervised key extraction eliminates the need for training data.
  • It approaches the problem from a different perspective.
  • Rather of trying to learn explicit characteristics that characterize important words, the TextRank algorithm takes use of the content's structure to choose key phrases that seem "central" to the text, similar to how PageRank selects major websites.

C. Based on the Textual Characteristics of the Summary Text summarization may be classified into a variety of groups depending on the features of the summary text, such as:

• Abstractive Summarization: 

  • Abstractive summarization methods change the material by adding new phrases, rephrasing, or inserting terms not found in the original text.
  • For a flawless abstractive summary, the model must first understand the text before expressing it with new words and phrases.
  • Complex elements like as generalization, paraphrase, and integrating real-world information are included.

• Extractive Summarization:

  • Summarization creates summaries by combining various portions of phrases taken from the source material.
  • In such situations, rating the importance of different phrases is often a major improvement.
  • A selection of essential data is extracted and then reassembled to provide a summary.


Algorithm of PageRank.

Around 1998, Page and Brin collaborated to create and improve the PageRank set of criteria. It was primarily used in the prototype of Google's search engine.

The purpose of this collection of criteria is to determine the popularity, or importance, of a website based on the concept of web interconnectivity.

According to the theory, a web page with more incoming hyperlinks performs a larger function than a web page with fewer incoming hyperlinks.

  • A online page having a hyperlink from a web page considered to be of extreme importance is also significant.
  • PageRank is one of the most widely used ranking algorithms, and it was created as a method for analyzing Weblinks.
  • The PageRank algorithm is used to calculate the weight of online pages, and it is the same concept that Google uses to give a rank to a web page based on a search result.