期刊文献+

文本分类中结合评估函数的TEF-WA权值调整技术 预览 被引量:25

A Weight Adjustment Technique with Feature Weight Function Named TEF-WA in Text Categorization
在线阅读 下载PDF
分享 导出
摘要 文本自动分类面临的难题之一是如何从高维的特征空间中选取对文本分类有效的特征,以适应文本分类算法并提高分类精度.针对这一问题,在分析比较特征选择和权值调整对文本分类精度和效率的影响后,提出了一种结合评估函数的TEF-WA权重调整技术,设计了一种新的权重函数,将特征评估函数蕴含到权值函数,按照特征对文本分类的辨别能力调整其在分类器中的贡献.实验结果证明了TEF-WA权值调整技术在提高分类精度和降低算法的时间复杂度方面都是有效的. Text categorization (TC) is an important research direction in Text Mining. It aims to assign one or more predefined category label(s) for a text document, and provides efficient methods for documents management and information searching. A major problem in automatic text categorization is how to select the best feature subset from the original high feature space in order to make the categorization algorithm work efficiently and improve the precision. In this paper, the methods of feature selection and weight adjustment techniques are discussed and analyzed, and their influence on text classification precision and efficiency is pointed out. Furthermore, the TEF-WA (term evaluation function-weight adjustment) is introduced. We introduce a new weight function, which includes feature weight evaluation function and adjusts the effect of the feature term in the classifier according to the feature term's strength. To evaluate the TEF-WA method, experiments are carried by using several different scale training document collection, various term evaluation functions such as document frequency, information gain, expected cross entropy, CHI, the weight of evidence for text, term frequency formula or document frequency formula. The experiment results have proved that the TEF-WA technique is efficient in promoting the classification precision and reducing the compute complexity.
作者 唐焕玲 孙建涛 陆玉昌 Tang Huanling, Sun Jiantao, and Lu Yuchang(Department of Computer and Information Engineering , Yantai Vocational Institute, Yantai 264025)(Department of Computer Science and Technology , Tsinghua University, Beijing 100084)
出处 《计算机研究与发展》 EI CSCD 北大核心 2005年第1期 47-53,共7页 Journal of Computer Research and Development
基金 国家自然科学基金,国家重点基础研究发展计划(973计划)
关键词 向量空间模型(VSM) 特征选择 权重调整 特征评估函数 文本分类 vector space model feature selection weight adjustment techniques feature evaluation function text categorization
  • 相关文献

参考文献9

  • 1李凡 鲁明羽 等.文本特征选择新方法的研究[J].清华大学学报,2001,41(7):98-101. 被引量:8
  • 2A. McCallum, K. Nigam. A comparison of event models for naive Bayes text classification. In: Proc. of the AA, A-98 Workshop on Learning for Text Categorization. Menlo Park, CA: AAAI Press,1998. 41--48. 被引量:1
  • 3J. Rocchio. Relevance feedback in information retrieval. The SMART Retrieval System: Experiments in Automatic Document Processing. Englewood Cliffs, NJ: Prentice-Hall, 1971. 313-323. 被引量:1
  • 4G. Salton, A. Wong, C. S6 Yang. A vector space model for automatic indexing. Communications of the ACM, 1975, 18(11): 613--620. 被引量:1
  • 5D. Mladenic, M. Grobelnik. Feature selection for unbalanced class distribution and native Bayes. In: Proc. of the 16th lnt'lConf. on Machine Learning 1CMI-99. San Francisco, CA:Morgan Kaufmann, 1999. 258--267. 被引量:1
  • 6Y. Yang, J. P. Pedersen. A comparative study on feature selection in text categorization. In: Proc. of the 14th lnt'l Conf.on Machine Learning (1CML'97). San Francisco, CA: Morgan Kaufmann, 1997. 412--420. 被引量:1
  • 7M. Sahami. Using machine learning to improve informationaccess: [Ph. D. dissertation]. San Francisco, CA: Computer Science Department, Stanford University, 1999. 被引量:1
  • 8Shrikanth Shankar, George Karypis. A feature weight adjustment algorithm for document categorization. The KDD2000, Boston,2000. 被引量:1
  • 9T. Joachims. A probabilistic of the Rocchio algorithm with TFIDF for text categorization. The 14th Int'l Conf. on Machine Learning(ICML'97), Nasville, TN, USA, 1997. 被引量:1

共引文献7

同被引文献211

引证文献25

二级引证文献134

投稿分析

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部 意见反馈