Oceanographic Big Data Text Categorization Algorithm Based on Improved Mutual Information
Abstract
Research on automatic text categorization of the oceanographic big data, has always been the core work of establishing a oceanographic information database based on oceanographic big data information platform. However, in face of extreme complex Internet data and the professional needs of the oceanographic field, the categorization accuracy of the traditional tf-idf algorithm is difficult to meet the demands. Based on traditional tf-idf algorithm and mutual information algorithm, I propose improved tf-idf-miow algorithm in order to meet the demands of text categorization of the oceanographic big data. Optimize the mutual information algorithm, calculate the correlation coefficient between the characteristic word and the oceanographic field. In this way, I set the weight of oceanographic field: miow by this correlation coefficient, and bring miow into traditional tf-idf algorithm. The results of automatic text categorization experiments show that the recall rate of tf-idf-miow in oceanographic field is 10.33% higher than of traditional tf-idf algorithm, and the f1-score is improved by 6.92%.
Keywords
Oceanographic big data, Text categorization, Mutual Information, TF-IDF
DOI
10.12783/dtcse/aita2017/15995
10.12783/dtcse/aita2017/15995
Refbacks
- There are currently no refbacks.