..............................
            ..............................
            ..............................
            
An Anti-Spam Filter Based on One-Class IB Method in Small Training Sets
        
        We  present  an  approach  to  email  filtering  based  on  one-class  Information  Bottleneck  (IB)  method  in  small  training
sets.When  themes  of  emails  are  changing  continually,  the  available  training  set  which  is  high-relevant  to  the  current  theme
will be small. Hence, we further show how to estimate the learning algorithm  and how to filter the  spam in the  small training
sets. First, In order to preserve classification accuracy and avoid over-fitting while substantially reducing trainingset size, we
consider the learning framework as the solution of one-class centroid onlyaveraged by highly positive emails, and second, we
design  a  simple  binary  classification  model  to  filters  spam  by  the  comparison  of  similarity  between  emails  and  centroids.
Experimental  results  show  that  in  small  training  sets  our  method  can  significantly  improve  classification  accuracy  compared
with the currently popular methods, such as: Naive Bayes, AdaBoost and SVM.    
            [1]Allias N., Noor M., Ismail M., and Silva K., A HybridGini PSO-SVMFeature Selection Based on TaguchiMethod:An EvaluationonEmail FilteringinProceedings of the8thInternational Conference on Ubiquitous Information Management and Communication, Siem Reap, pp. 94-97,2014.
[2]Androutsopoulos I.,Koutsias J., Chandrinos K., George Paliouras G., and Spyropoulos C., An Evaluationof Naive BayesianAnti-spam Filteringavailable at: http://arxiv.org/pdf/cs/0006013.pdf,last visited 2000.
[3]Androutsopoulos I., Learning toFilter Unsolicited Commercial E-MailTechnical Report, National Center for Scientific Research, 2004. 684The International Arab Journal of Information Technology, Vol. 13, No. 6, November 2016
[4]Barigou F.,a, Beldjilali B., and Atmani B., Using Cellular Automata for Improving KNN Based Spam The International Arab Journal ofInformation Technology, vol. 11, no. 4, pp. 345-353, 2014.
[5]Blanzieri E.andBrylA., ASurveyofLearning- basedTechniquesofEmail Spam Filtering Artificial Intelligence Review, vol. 29, no. 1, pp. 63-92, 2008.
[6]Burns R., MorphAdorner: Morphological
[7]Carreras X. andMarquez L., Boosting Trees for Anti-Spam Email available at: http://web.cs.ucla.edu/~miodrag/cs259-security/ carreras01boosting.pdf, last visited 2001.
[8]Carreras X.,M rquez L., and Padr L., A Simple Named Entity Extractor in Proceedings of the7thConferenceon Natural Language Learning, Reykjavik, pp. 152-155, 2003.
[9]Chang C.andLin C., LIBSVM:A Libraryfor Support Vector Machinesavailable at: http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm. pdf,last visited2011.
[10]Cormack G., TREC 2007 Spam Track inProceedings of the 6thText REtrieval Conference, Maryland, USA, pp. 1-16, 2007.
[11]Cover M.andThomas J.,Thomas. Elements of Information Theory, Wiley Press, New York, 1991.
[12]Crammer K.,Talukdar P., andPereiraF., A Rate-distortionOne-classModeland its ApplicationstoClusteringin Proceedings of the25thInternational Conferenceon Machine learning, Helsinki, pp. 184-191, 2008.
[13]Csisz I and Tusn dy G., InformationGeometry andAlternating Minimization Procedures Statistics andDecisions, pp. 205-237, 1984.
[14]El-Yaniv R.,Fine S., and Tishby N., Agnostic Classification of Markovian Sequencein Proceedings of the10thAnnual Conference on Neural Information Processing Systems, pp. 465- 471, 1997.
[15]Harremoes P. and Tishby N., TheInformation Bottleneck RevisitedorHowtoChooseaGood Distortion MeasureinProceedings of the29th IEEE International Symposium on Information Theory, Nice, France, pp. 566-570, 2007.
[16]Kosmopoulos A.,Paliouras G., and Androutsopoulos I., AdaptiveSpam Filtering using onlyNaive Bayes Text Classifiersin Proceedings of the5thConference on Email and Anti-Spam, Mountain View,pp. 1-3,2008.
[17]Michelakis E., Filtron: ALearning-basedAnti- Spam FilterinProceedings of the1st Conference on Email and Anti-Spam, California, pp. 1-8,2004.
[18]Rish I., AnEmpirical Studyof theNaiveBayes ClassifierinProceedings of the17th International Joint Conference on Artificial Intelligence, Washington State, pp. 41-46, 2001.
[19]Sahami M.,Dumais S., Heckerman D., and Horvitz E., A BayesianApproachtoFiltering Junk E-Mailavailable at: http://research.microsoft.com/en- us/um/people/horvitz/spam.pdf,last visited1998.
[20]Sculley D. and Wachman G., RelaxedOnline SVMs forSpam FilteringinProceedings of the 30thAnnual International Conferenceon Research andDevelopmentinInformation Retrieval, Amsterdam, pp. 415-422, 2007.
[21]Sculley D. and Wachman G., Relaxed Online SVMs in the TREC Spam Filtering available at: http://trec.nist.gov/pubs/trec16/papers/tuftsu.spa m.final.pdf,last visited2007.
[22]Tishby N.,Pereira F., and Bialek W., The Information Bottleneck available at: http://arxiv.org/pdf/physics/0004057.pdf,last visited1999. Chen Yangreceived his BE and ME degreesfrom the School of InformationEngineering, Zhengzhou University.Currently, he isa PhD candidateinSchool of Information,RenminUniversityof China,China andis alsoanassistantin School of Software Engineering at Zhengzhou University of Light Industry, China. His research interests includemachine learning andBigDatasystem. Shaofeng Zhaoreceived his BE and ME degreesfrom the School of Information Engineering, Zhengzhou University.Currently, heis an Assistantin library at Henan University of Economics and Law, China. His research interests include cloud computing and cloud storage. Dan Zhangreceived her BE degree from the school of computer, Henan University of Economicsand Law, and ME degree from the School of Information Engineering, Zhengzhou University.Currently, sheis anEngineerin Geophysical Exploration Center of China Earthquake Administration. Her research interests includecomplex system and machine learning. An Anti-SpamFilter Based on One-Class IB Method in Small Training Sets685 Junxia Mareceived her ME degree from Zhengzhou University. Currently, sheis a lecturer in the School of Software Engineering at Zhengzhou University of Light Industry, China. Her research interests includeartificial intelligence, data mining
