Text mining - Data mining project

Published on . Written by

Text mining - Data mining project

Introduction


Skyfi Labs Projects
Text Mining as known as text analysis is a technology which helps to convert unstructured textual data to structured textual data. It is a part of data mining which is also known as Text Data Mining. We can explain it by an example of our emails. Some mails automatically get into spam. These are detected as unwanted mails which are there in your inbox. So if you want to see the practical approach of text mining the continue this article to the end. Skyfi Labs helps students to learn more technologies by providing many courses and technical articles.

Read more..

SLNOTE
Description

There is a huge amount of textual data present in blogs, books, news articles etc. So it is necessary to extract effective and efficient use of such huge quantities of textual content by automated extraction of textual content and the analysis of the extracted content. So in this part, we are going to analyse the textual data, individual text and comparison of text also. So there is a brief overview of the technology. The article is somehow useful for the engineering student especially for having CS and IT background.


SLLATEST
Practical Approach

  1. Install the following packages or libraries -
  • Numpy- Used for arrays and stack development
  • Pandas- Used for sorting and tables
  • Scipy- Used for linear algebra, integration and statistics
  • Sklearn- Used for the operation on complex data
  • Matplotlib- used for 2D graph plotting
  • Nltk- Used for dealing with unstructured data
  1. Also, we are going to use regular expressions, codecs for reading the text files etc. Also, download everything in NLTK.
  2. Here you can use and platform like colab, jupyter notebook etc.
  3. Then we have to read the data from first.txt file. As earlier, we have mentioned the codecs package used for text reading.
  4. Then the next step is to work on data. We have to filter the data by using regular expressions.
  5. You can create a new function to calculate the word frequency. e.g. ‘Laptop’ is a word which appeared 20 times in the text file etc. 
  6. Next part is that we have to find the most common words from the first.txt file. It will display the absolute frequency and relative frequency of the most common words in the text file. We can save it in the .csv file by to_csv(“name.csv”) command.
  7. For the comparison purpose, we have to do the same thing with second.txt file by calculating the most common words and save it to .csv file.
  8. Now, these two csv files will get appear at the same location of the text files.
  9. Next part is to compare the text, for that we have to create a word frequency data frame.
  10. Then we have to display the most distinctive words by the following command: dist_df.head()
  11. Then you can save the list of most distinctive words in another .csv file as we did it before.
  12. You can save the words according to your wish.
So this is the basic overview and practical approach of text mining. You can learn more by enrolling our courses. This article gives a basic overview of data mining and clears the concept of what is data mining.


SLDYK
Kit required to develop Text mining - Data mining project:
Technologies you will learn by working on Text mining - Data mining project:


Any Questions?


Subscribe for more project ideas