0

Full Content is available to subscribers

Subscribe/Learn More  >

P-SMOTE: One Oversampling Technique for Class Imbalanced Text Classification

[+] Author Affiliations
Jingjing Wang, Wen Feng Lu, Han Tong Loh

National University of Singapore, Singapore

Paper No. DETC2011-47313, pp. 1089-1098; 10 pages
doi:10.1115/DETC2011-47313
From:
  • ASME 2011 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
  • Volume 2: 31st Computers and Information in Engineering Conference, Parts A and B
  • Washington, DC, USA, August 28–31, 2011
  • Conference Sponsors: Design Engineering Division and Computers and Information in Engineering Division
  • ISBN: 978-0-7918-5479-2
  • Copyright © 2011 by ASME

abstract

The importance of mining patents to support product design has been recognized, because patents are the major information source to support innovation and contain novel ideas, which usually cannot be found in published academic papers. In patent text mining, a basic issue is patent classification. However, automatic patent classification is difficult. One potential cause of the difficulty is the imbalanced dataset i.e. the interested positive class is minor while uninterested negative class is major. To alleviate the problem of imbalanced dataset and improve the performance of a Support Vector Machine (SVM) classifier, this study proposes P-SMOTE, a new oversampling technique which focuses on the blank spaces along positive borderline of a SVM. The proposed technique was firstly investigated on Reuters-21578, which is a standard text classification dataset. Then, P-SMOTE was applied to a design patent document dataset. It was observed that a SVM classifier with P-SMOTE, compared to a SVM classifier only, successfully achieved better results.

Copyright © 2011 by ASME

Figures

Tables

Interactive Graphics

Video

Country-Specific Mortality and Growth Failure in Infancy and Yound Children and Association With Material Stature

Use interactive graphics and maps to view and sort country-specific infant and early dhildhood mortality and growth failure data and their association with maternal

NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In