Knowledge Discovery in Databases (KDDs) is the process of identifying valid, novel, useful, and understandable patterns from large data sets. Data Mining is the core of the KDD process, involving algorithms that explore the data, develop models, and discover significant patterns. Data mining has emerged as a key tool for a wide variety of applications, ranging from national security to market analysis. Many of these applications involve mining data that include private and sensitive information about users. Many applications that employ data mining techniques involve mining data that include private and sensitive information about the subjects. One way to enable effective data mining while preserving privacy is to anonymize the data set that includes private information about subjects before being released for data mining. One way to anonymize data set is to manipulate its content so that the records adhere to k-anonymity. Two common manipulation techniques used to achieve k-anonymity of a data set are generalization and suppression. Generalization refers to replacing a value with a less specific but semantically consistent value, while suppression refers to not releasing a value at all. Generalization is more commonly applied in this domain since suppression may dramatically reduce the quality of the data mining results if not properly used.
 M. Kantarcioglu, J. Jin, and C. Clifton, “When Do Data Mining Results Violate Privacy?” Proc. 2004 Int’l Conf. Knowledge Discovery and Data Mining, pp. 599-604, 2004.  L. Rokach, R. Romano, and O. Maimon, “Negation Recognition in Medical Narrative Reports,” Information Retrieval, vol. 11, no. 6, pp. 499-538, 2008.  M.S. Wolf and C.L. Bennett, “Local Perspective of the Impact of the HIPAA Privacy Rule on Research,” Cancer-Philadelphia Then Hoboken, vol. 106, no. 2, pp. 474-479, 2006.  P. Samarati and L. Sweeney, “Generalizing Data to Provide Anonymity When Disclosing Information,” Proc. 17th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, vol. 17, p. 188, 1998.  L. Sweeney, “k-Anonymity: A Model for Protecting Privacy,” Int’l J. Uncertainty, Fuzziness, and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.  L. Sweeney, “Achieving k-Anonymity Privacy Protection Using Generalization and Suppression,” Int’l J. Uncertainty, Fuzziness, and Knowledge-Based Systems, vol. 10, no. 5, pp. 571-588, 2002.  B.C.M. Fung, K. Wang, and P.S. Yu, “Top-Down Specialization for Information and Privacy Preservation,” Proc. 21st IEEE Int’l Conf.Data Eng. (ICDE ’05), pp. 205-216, Apr. 2005.  K. Wang, P.S. Yu, and S. Chakraborty, “Bottom-Up Generalization: A Data Mining Solution to Privacy Protection,” Proc. Fourth IEEE Int’l Conf. Data Mining, pp. 205-216, 2004.  L. Tiancheng and I. Ninghui, “Optimal K-Anonymity with Flexible Generalization Schemes through Bottom-Up Searching,” Proc. Sixth IEEE Int’l Conf. Data Mining Workshops, pp. 518-523,2006. S.V. Iyengar, “Transforming Data to Satisfy Privacy Constraints,” Proc. Eighth ACM SIGKDD, pp. 279-288, 2002.  B.C.M. Fung, K. Wang, and P.S. Yu, “Anonymizing Classification Data for Privacy Preservation,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 5, pp. 711-725, May 2007.  K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Incognito: Efficient Full Domain k-Anonymity,” Proc. 2005 ACM SIGMOD, pp. 49-60, 2005.  A. Friedman, R. Wolff, and A. Schuster, “Providing k-Anonymity in Data Mining,” Int’l J. Very Large Data Bases, vol. 17, no. 4, pp. 789-804, 2008.  A. Friedman, R. Wolff, and A. Schuster, “Providing k-Anonymity in Data Mining,” Int’l J. Very Large Data Bases, vol. 17, no. 4, pp. 789-804, 2008.  R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1/2, pp. 273-324, 1997.  V.S. Verykios, E. Bertino, I.N. Fovino, L.P. Provenza, Y. Saygin, and Y. Theodoridis, “State-of-the-Art in Privacy Preserving Data Mining,” ACM SIGMOD Record, vol. 33, no. 1, pp. 50-57, 2004.  A. Agrawal and R. Srikant, “Privacy Preserving Data Mining,” ACM SIGMOD Record, vol. 29, no. 2, pp. 439-450, 2000.  B. Gilburd, A. Schuster, and R. Wolff, “k-TTP: A New Privacy Model for Large-Scale Distributed Environments,” Proc. 10th ACM SIGKDD, pp. 563-568, 2004.  Z. Yang, S. Zhong, and R.N. Wright, “Privacy-Preserving Classification of Customer Data without Loss of Accuracy,” Proc. Fifth Int’l Conf. Data Mining, 2005. J. Roberto, Jr. Bayardo, and A. Rakesh, “Data Privacy through Optimal k-Anonymization,” Proc. Int’l Conf. Data Eng., vol. 21, pp. 217-228, 2005.  A. Blum, C. Dwork, F. McSherry, and K. Nissim, “Practical Privacy: The SuLQ Framework,” Proc. 24th ACM SIGMODSIGACT- SIGART Symp. Principles of Database Systems, pp. 128- 138, June 2005.  S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee, “Toward Privacy in Public Databases,” Proc. Theory of Cryptography Conf., pp. 363-385, 2005.  K. Wang, B.C.M. Fung, and P.S. Yu, “Template-Based Privacy Preservation in Classification Problems,” Proc. Fifth IEEE Int’l Conf. Data Mining, pp. 466-473, 2005.  E. Bertino, B.C. Ooi, Y. Yang, and R.H. Deng, “Privacy and Ownership Preserving of Outsourced Medical Data,” Proc. Int’l Conf. Data Eng., vol. 21, pp. 521-532, 2005.  G. Aggarwal, A. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, “Approximation Algorithms for k-Anonymity,” J. Privacy Technology, 2005.  A. Meyerson and R. Williams, “On the Complexity of Optimal k-Anonymity,” Proc. 23rd ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems, pp. 223-228, 2004.  P. Samarati, “Protecting Respondents’ Identities in Microdata Release,” IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001.  K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Mondrian Multidimensional k-Anonymity,” Proc. 22nd Int’l Conf. Data Eng., p. 25, Apr. 2006.  K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Workload-Aware Anonymization,” Proc. 12th ACM SIGKDD, pp. 277-286, 2006. 346 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 22, no. 3, march 2010  L. Sweeney, “Datafly: A System for Providing Anonymity in Medical Data,” Proc. IFIP TC11 WG11.3 11th Int’l Conf. Database Security XI: Status and Prospects, pp. 356-381, 1997.  P. Sharkey, H. Tian, W. Zhang, and S. Xu, “Privacy-Preserving Data Mining through Knowledge Model Sharing,” Privacy, Security and Trust in KDD, pp. 97-115, Springer, 2008.
Privacy-preserving data mining, k-anonymity, deindentified data, decision trees.