Showing posts with label big data. Show all posts
Showing posts with label big data. Show all posts

Data Science Applications - Page Rank Text Summarization

 



What Is Text Summarization?

 


Text summarization is the most significant application in natural language processing.


It assists with reducing the quantity of original text and extracting just the relevant information.

The technique of text summarizing is also known as data reduction.

It entails generating an outline of the original text that allows the user to get key bits of information from that text in a much shorter amount of time.




Text Summarization Processes Types



The process of text summarizing may be categorized in many ways, including: The classification of the text summarizing process is shown in Figure.

As demonstrated, text summarization may be categorized into many categories, each of which can be further subdivided.



Depending on the number of documents Text summary is further divided into categories depending on the number of pages in a document: 



• Single: 


Because the outline is short, clear, and concise, it becomes more important.

Some subdocuments may be combined to form a single document.

They may be created out of certain subdocuments' documents that place unusual emphasis on different viewpoints, despite the fact that these reports all cover the same topic.


• Several: 


A multi-document summary is a technique for managing a large amount of data in multiple linked supply documents by including just the most important information or main concepts in a little amount of space.

A multi-document report has recently become a hot topic in automated summarization.



A. Based on the Usage Summary Text summarization may be further subdivided into the following categories depending on summary usage: 


• Generic Summaries: Generic summaries do not target any specific cluster since they are written for a large audience.

• Query-based: Query-based or subject-focused inquiries are tailored to an individual's or a group's unique requirements and address a single issue.

 

The goal of query-based text summarization is to extract fundamental information from the original text that answers the question.

The proper response is presented in a small, predetermined amount of words.




B. Techniques-based Text summarization may be further divided into subcategories based on the following techniques: 




• Supervised: 


  • Supervised text summarization is similar to supervised key extraction in that it is supervised.
  • Essentially, if you have a collection of documents and human-generated summaries for them, you can learn the characteristics of phrases that make them a good fit for inclusion in the summary.


• Unsupervised: 


  • The use of unsupervised key extraction eliminates the need for training data.
  • It approaches the problem from a different perspective.
  • Rather of trying to learn explicit characteristics that characterize important words, the TextRank algorithm takes use of the content's structure to choose key phrases that seem "central" to the text, similar to how PageRank selects major websites.




C. Based on the Textual Characteristics of the Summary Text summarization may be classified into a variety of groups depending on the features of the summary text, such as:




• Abstractive Summarization: 


  • Abstractive summarization methods change the material by adding new phrases, rephrasing, or inserting terms not found in the original text.
  • For a flawless abstractive summary, the model must first understand the text before expressing it with new words and phrases.
  • Complex elements like as generalization, paraphrase, and integrating real-world information are included.



• Extractive Summarization:


  • Summarization creates summaries by combining various portions of phrases taken from the source material.
  • In such situations, rating the importance of different phrases is often a major improvement.
  • A selection of essential data is extracted and then reassembled to provide a summary.

 




Algorithm of PageRank.



Around 1998, Page and Brin collaborated to create and improve the PageRank set of criteria. It was primarily used in the prototype of Google's search engine.


The purpose of this collection of criteria is to determine the popularity, or importance, of a website based on the concept of web interconnectivity.



According to the theory, a web page with more incoming hyperlinks performs a larger function than a web page with fewer incoming hyperlinks.


  • A online page having a hyperlink from a web page considered to be of extreme importance is also significant.
  • PageRank is one of the most widely used ranking algorithms, and it was created as a method for analyzing Weblinks.
  • The PageRank algorithm is used to calculate the weight of online pages, and it is the same concept that Google uses to give a rank to a web page based on a search result.






 

Who or What is a Data Scientist?




If you look up the terms "data scientist" on the internet, you'll probably find a lot of different definitions. Data science is used by a data scientist to address various business challenges and challenges. 

When people understood that a data scientist uses data, different mathematical or statistical functions and operations, and other scientific areas and applications to make sense of the data in a database, the name "data scientist" was coined. 


Data Scientists' Responsibilities 


A data scientist is a person who uses their knowledge of specialized scientific subjects to solve various data challenges. 

He uses a variety of mathematical, statistical, and computer science components in his work. He doesn't have to be an expert in any of these disciplines. 

He would, however, employ some technologies and solutions in order to come up with the best answers and reach critical conclusions for the organization's development and progress. 

When compared to the data accessible in the data set, a data scientist discovers a way to display the data in a useable format. They deal with data that is both organized and unstructured. Let's take a closer look at business intelligence and how it differs from data science. 

You've probably heard of business intelligence, and most people mix up data science and business intelligence. We'll look at some of the distinctions between the two to help you understand.


Disparities: Data Science and Business Intelligence are two terms that are often used interchangeably. 


Let's have a better understanding of these words before we look at the distinctions between data science and business intelligence. 


Business Intelligence:


  1. An enterprise can gain insight and hindsight in an existing data collection using business intelligence (BI) to explain various trends in the data collection. 
  2. Businesses may use BI to gather data from both internal and external sources, prepare it, and execute queries on it to get the information they need. 
  3. They may then develop the necessary dashboards in order to answer various queries or find answers to various business challenges. Businesses can also use BI to assess specific future events. 


Data science:


  1. Data science, on the other hand, takes a unique approach to data analysis. You can explain any knowledge or insight in the data set using a forward-looking method. 
  2. You may use data science to evaluate current or historical data to forecast results. 
  3. This is one method most businesses try to make well-informed judgments. They may respond to a variety of open-ended queries. 


The following characteristics distinguish data science from business intelligence:








Why Should You Use Data Science?





Organizations used to deal with limited amounts of data before collecting data from every device they utilized. Using business intelligence tools, it was simple to evaluate and comprehend the facts and relationships within the data set. 

Traditional business intelligence solutions were designed to operate with structured data sets, however today's data is mostly semi-structured or structured. 

It is critical to recognize that the majority of data collected nowadays is semi-structured or unstructured. 

Simple business intelligence systems are incapable of processing this sort of data, especially when enormous amounts of data are acquired from many sources. 

As a result, powerful and complicated analytical techniques and tools are required to process, evaluate, and derive some insights from the data. 

Data science has grown in popularity for other reasons as well. Let's have a look at how data science is applied in various fields. Service to Customers What a wonderful thing it would be to know exactly what your consumers desire. 


Do you believe you can leverage existing data, such as purchase history, browsing history, income, and age, to learn more about your customers? 


This information may have been available to you in the past. You can efficiently deal with vast quantities of data and discover the proper goods to suggest to your consumers because you employ various mathematical and statistical models. This is a fantastic strategy to increase your company's revenue. 


Autonomous Vehicles 

How would you feel if you could drive yourself home in your car? Several businesses are aiming to create and enhance self-driving automobile technology. To generate a map of the surrounding area, the automobiles acquire live data from numerous sensors such as lasers, radars, and cameras. This information is used by the car's algorithm to decide whether to accelerate, slow down, park, stop, overtake, and so on. Machine learning algorithms are often used in these methods. 


Predictions

Let's look at how data science can be used to predictive analytics. Take the case of weather forecasting. The algorithms gather and evaluate data from planes, satellites, radars, ships, and other sources. This aids in the creation of the essential models. These models may be used to forecast the occurrence of any natural disaster. You can use this knowledge to take the required precautions to save lives.






What is Data Science?

 


Data has replaced oil as the new commodity, and every business, regardless of sector, is seeking for innovative methods to handle and store massive amounts of data. Until 2010, most businesses found this a difficult task. 

The goal for each organization was to create a framework or solution that would allow them to store massive amounts of data. Because Hadoop and other platforms have made it simpler for enterprises to store vast amounts of data, they are also focusing on techniques and solutions for processing data. Data science is the only way to do this. 

It's crucial to remember that data science is the way of the future. It's critical to understand what data science is, especially if you want to contribute value to your company. 


Data Science: An Overview 


Data science is a collection of methods, techniques, philosophies, and languages used to uncover hidden patterns within a data set's variables. 

This may prompt you to ask how this differs from the data analysis that has been done for years. The reason is that previously, we could only utilize tools and algorithms to describe the variables in a data set; however, data science makes it simpler to anticipate outcomes. 

A data analyst solely analyses previous data sets to describe what is happening in the present. 

A data scientist, on the other hand, merely looks at the data to see if there are any insights to be gained from it. He also employs complex algorithms to determine the likelihood of an event occurring. He examines the facts from a variety of perspectives. 

Data science is utilized to make educated judgments based on existing data set forecasts. To get this information, you may use a variety of analytics on the data collection. In the next sections, we'll go through these in more detail. 


Predictive Casual Analytics 


Predictive causal analytics is required if you wish to create a model that predicts the possibilities or consequences of a future event. Assume you work for a credit firm and lend money to people depending on their credit scores. 

You'll be concerned about your clients' capacity to pay back the money you've given them. Using payment history, you may create models to do predictive analysis on the data. This might assist you in determining whether or not the consumer will pay you on time.


Prescriptive Analytics 


It's possible that you'll need to employ a model that can make the necessary judgments and adjust the parameters based on the data set or inquiry. 

You'll need to employ prescriptive analytics to do this. This type of analytics is mainly concerned with giving accurate data so that you can make an informed decision. 

This form of analytics may also be used to forecast a variety of related events and actions. 

A self-driving automobile is an example of this sort of analytics. This is something we've looked at before. You may utilize the data obtained from the automobiles to run a variety of algorithms and utilize the findings to make the car smarter. 

This makes it easy for the automobile to make the appropriate judgments when it comes to turning, slowing down, speeding up, or determining which way to go. 


Artificial Intelligence (AI) 


Make forecasts Using unstructured, semi-structured, and structured data sets, you may create predictions using a variety of machine learning methods. Assume you work for a financial institution and have access to transactional data. 

To forecast future transactions, you'll need to create a model. You'll need a supervised machine-learning method to complete this analysis. These methods are used to train the computer with previously collected data. 

You may also design and train a model to detect potential frauds based on previous data using supervised machine learning methods. 


Pattern Recognition 


You won't find variables in every data set that you can utilize to create the appropriate predictions. This isn't correct. Every data collection contains a hidden pattern, which you must discover in order to generate the needed predictions. 

Because there are no pre-defined labels in the data set with which to categorize the variables, you'll need to utilize an unsupervised model. Clustering is one of the most frequent techniques for detecting patterns. 

Assume you work for a telephone firm and are entrusted with determining where towers should be placed in order to construct a network. 

The clustering technique may then be used to determine where towers should be placed to guarantee that every user in the region receives the best signal strength. 


It's critical to grasp the differences between data science and data analytics methodologies, based on the examples above. Only to a limited extent does the latter encompass the use of forecasts and descriptive analytics. Data science, on the other hand, is mainly concerned with the use of machine learning and predictive casual analytics. Now that you know what data science is, let's look at why companies need to employ it in the first place.