KDD2016 awards list deciphered

Joint compilation: Zhang Min, Gao Fei

Introduction: KDD2016 is the premier interdisciplinary conference gathering researchers and practitioners in data science, data mining, knowledge discovery, large-scale data analysis and big data. At the end of the article, the paper is attached to the original net disk download link

2016 SIGKDD Test of Time Award

The award was presented to the authors of outstanding papers that have had significant impact in the data mining research community during the KDD conference in the past decade.

Award winner:

Jure Leskovec (Stanford);

Jon Kleinberg (Cornell University)

Christos Faloutsos (Carnegie Mellon University)

Significance: This article presents new discoveries in terms of how the growth and development of the graphs and networks of the real world has grown over time. These findings fundamentally shape our understanding of the evolution and growth of the real-world network, and in many areas have stimulated a wealth of online measurement, modeling structures, and network evolution research.

This paper studies some of the ever-evolving real-world networks and identifies two rules for network growth: (1) the Densification Power Law, and (2) the Shrinking Diameter Principle. The densification power law finds that the increase in the number of edges in the network is the driving force for the number of nodes in the network (eg, twice the number of nodes, three times the number of edges). The principle of shrinkage finds that the diameter of the network usually shrinks with the number of nodes in the network. When these two findings are proposed, they are essentially different from what we think of as the evolution of the network: The traditional perception is that, as time passes, the average degree remains the same, and the network diameter gradually increases with the number of nodes.

Currently no network evolution model can capture the observed empirical patterns, so this article also proposes a series of network growth models, including the "Forest Fire" model, which generates graphs that show densified power laws, shrinking diameters, and Other basic graphic attributes include strong clustering and slope distribution.

Award-winning Thesis: Graphs Over Time: Densification Power Law, Diameter of Contraction, and Possible Interpretations (Graphs over Time: Densification Laws, Shrinking Diameters and Possible explanations (KDD 2005))

Abstract: How do real charts evolve over time? What are the "normal" growth patterns in society, technology, and information networks? Many studies have discovered patterns in static graphs, determining properties in a large single network snapshot, or in a very small number of snapshots; these include heavy tail, community, and small-world phenomena of in- and out-degree distributions, and Other properties. . However, due to the lack of information on the evolution of the network in the middle of a long period of time, it has been difficult to translate these findings into perspectives—a trend over time. We studied a lot of real graphics and observed some surprising phenomena. First, the density of most of these graphs increases over time, and the number of nodes grows linearly with the number of edges. Second, the average distance between nodes tends to shrink over time, which is consistent with the traditional understanding that such a distance parameter should be a function that increases slowly with the number of nodes (like O(log n) or O(log( Log n))) Conversely, existing graph generation models do not exhibit this type of behavior even at the qualitative level. We provide a new graph generator based on the propagation process of “forest fire” with For simple, intuitive reasons, only a few parameters (such as the "flammability" node) are needed, and the resulting graph shows the highest level of attribute observation in both the previous work and this study.

2016 SIGKDD Dissertation Award

This award is given to graduate students who have made outstanding contributions in the fields of data science, machine learning and data mining.

Review criteria:

· Papers on KDD related knowledge

·The thesis main idea is originality

·It has scientific significance

· Thesis depth and reliability (including experimental methods, theoretical results, etc.)

· The overall presentation and readability of the papers (including organization, writing style and presentation, etc.)

Award winner:

Danai Koutra (student) and Christos Faloutsos (advisor) at Carnegie Mellon University

Award-winning paper: Exploring and Making Sense of Large Graphs

Abstract: Graphs represent different links between pages, connecting adjacent neurons in our brain, and often span billions of nodes. In this vast amount of data, how can we find its most important structure? How do we detect critical events, such as the attack on a computer system or the formation of a disease in the human brain? This article rejects the (I) extensible, principled algorithm (combined with globalization and regional understanding of graphics), and (ii) applies in two ways:

· Single-chart exploration: We show how to summarize the important structure of the graph, and supplement and reason (using some previous information and network structure to effectively understand the information of all entities).

· Exploration of multiple graphs: We have summarized the idea of ​​space-time graphs for pattern discovery. We also believe that similarity is a sub-problem in many applications with multiple icons and promotes the development of network alignment and similarity methods.

We have applied our approach to a large amount of data, including a 2 Web map with 6.6 billion edges, a Twitter map with 1.8 billion edges, and a brain map with 90 million edges.

Applied Data Science Track

Best Paper: Ranking Relevance in Yahoo Search (Yahoo Search)

Abstract: Search engines play a vital role in our daily lives. Relevance is the core issue of commercial search engines. It has attracted thousands of researchers from academia and industry and has been conducting research for decades. The relevance of modern search engines has gone far beyond textual matching and is now facing enormous challenges. Semantic divergence between queries and URLs is a major obstacle to improving the underlying relevance. Clicks help provide hints to improve relevance, but unfortunately for most tail queries, the click information is too sparse, noisy, or completely lost. For synthetic correlation, the recent and location sensitivity of the results is also critical. In this article, we give an overview of the related solutions in Yahoo search engine. We introduce three key techniques for basic correlation: ranking functions, semantic matching features, and query rewriting. We also describe solutions for near-insensitive correlations and position-sensitive correlations. This work builds on Yahoo search’s existing 20-year efforts, summarizes the latest developments, and provides a series of solutions to practical relevance. The performance of the report is based on Yahoo's commercial search engine, which has tens of billions of URLs through ranking system indexing and services.

First author introduction

Dawei Yin

Agency: JD.COM Research Director

Research Interests: Machine Learning, Algorithms, Data Mining, Pattern Recognition, etc.

Best student paper: Contextual intent Tracking for Personal Assistants

Summary: In the area of ​​smart personal assistants, a new form of advice is emerging such as Apple's Siri, Google Now, and Microsoft Cortana, which can “properly recommend the right information at the right time” and actively help you “get things right”. . This type of recommendation requires accurate tracking of the user's intentions at the time, ie what type of information the user intends to know (eg, weather, stock prices) and what tasks they intend to accomplish (eg, playing music, taxiing). The user's intention is closely related to the context, including the external environment, such as time and place, and the user's internal activities (which can be felt by the personal assistant). The complex co-occurrence and sequence correlation between context and intent, and the contextual signals are also very mixed and sparse. This makes the relationship between modeling context and intention to become a challenging task. In order to solve the intent tracking problem, we propose Kalman filter regularize PARAFAC2 (KP2) real-time forecasting model, which can represent the structure and joint movement between context and intention. The KP2 model leverages collaboration capabilities on the user and learns each user's personalized dynamic system to ensure efficient real-time prediction of user intent. Most of the experiments used real-world datasets from business personal assistants. The results showed that the KP2 model was clearly superior to all other methods, and provided inspiring inspiration for the deployment of large-scale active advice systems in personal assistants.

First author introduction

Yu sun

School: Department of Computing and Information Systems, University of Melbourne

Research direction: contextual behavior mining, reinforcement learning, optimal location discovery, space/time indexing, algorithm design/analysis.

More paper information:

· A Contextual Collaborative Approach for App Usage Forecasting, (UbiComp, 2016)

· Reverse Nearest Neighbor Heat Maps: A Tool for Influence Exploration, (ICDE, 966-977, 2016)

Research Track

Best Paper: FRAUDAR: Graphical fraud that limits the use of camouflage

Abstract : Based on the users and the products they comment on, or the followers and followers' map information, how do we identify false comments or follow comments? Existing fraud detection methods (spectrum detections, etc.) try to identify dense subgraphs of nodes that are less in contact with the remaining charts. These fraudsters can use "pretend" means to circumvent these detection methods by adding comments or following comments with sincere goals and making these comments look "normal." Worse, some fraudsters use honest users' "hacker accounts" and this disguise is indeed organized. Our research focuses on finding fraudsters who use camouflage or hacking accounts. We propose FRAUDAR, an algorithm used to (a) resist camouflage methods, (b) provide an upper limit to the validity of fraudsters, and (c) an algorithm that can be effectively applied to real data. The experimental results obtained under various attack conditions indicate that FRAUDAR is superior to its maximum competition algorithm in detecting the accuracy of camouflage fraud and non-facsimile fraud. In addition, in a real-life experiment using the 1.47 billion edge chart of Twitter follower-followers, FRAUDAR successfully detected a sub-chart containing more than 4,000 detected accounts, and most of those who have Twitter accounts indicated that they used It is a service purchased by followers.

First author introduction

Bryan Hooi

School: PhD, Department of Machine Learning and Statistics, Carnegie Mellon University

Research direction: Figure and time series anomaly detection.

Academic Achievements:

· A General Suspiciousness Metric for Dense Blocks in Multimodal Data. IEEE International Conference on Data Mining (ICDM), 2015.

· Matrices, Compression, Learning Curves: preparation, and the GROUPNTEACH algorithms. PAKDD 2016.

Best Student Paper: TRIEST: Counting Local and Global Triangles in Full Dynamic Streams Using Fixed Storage Capacity

Abstract : Full-motion diagrams are presented in the form of a hostile stream of edge insertions and deletions. In such a full-motion diagram we count the number of global and local triangles (ie from events to each vertex), for the final The number of triangles we have proposed TRIEST, a set of one-pass flow algorithm to calculate its unbiased, low variance, high quality approximation. Our algorithm has been using stored samples and their variants to make use of user-specific storage capacity. This algorithm is in stark contrast to previously used algorithms that require the use of difficult-to-choose parameters (eg, a fixed sample probability) and cannot guarantee the amount of storage it uses. We analyze the estimated worth of variance and the results show new limits on the concentration of these numbers. Our experimental results based on the hypergraph proved that TRIEST surpasses the current optimal algorithm and shows a small update period.

First author introduction

Lorenzo De Stefani

School: Ph.D. in Computer Science, Brown University.

Academic Achievements:

· Reconstructing Hidden Permutations Using the Average-Precision (AP) Correlation Statistic(AAAI 2016: 1526-1532)

2016 SIGKDD Annual Meeting Innovation Award

Winner: PHILIP S.Yu

ACM SIGKDD is pleased to announce that PHILIP S.Yu has won the 2016 Innovation Award. Yu has made scientific contributions and has made a profound impact on research in the areas of big data mining, convergence and anonymization.

The ACM SIGKDD Innovation Award is the highest technological achievement award in the field of knowledge discovery and data mining (KDD). This award is mainly given to individuals who have made outstanding technological innovations in the KDD field and have long-term influence on the development of theory and practice in the research field. team. The scientific contributions of these individuals or teams have had a major impact on the direction of research and development in the field, or brought about major innovations in the practical application of research results, and played a role in the development of the business system.

Over the years, PHILIP S.Yu has made outstanding contributions to the development of the guidelines and data mining of the KDD Conference and has been unanimously endorsed. Before the term “big data” became popular in recent years, Yu’s research on issues related to big data has been a long time ago. Over the course of the study, more than 900 papers have been published, and the number of citations reached 73,000. This is knowledge discovery. Various related fields, including frequent pattern mining, clustering, classification, anomaly detection, recommendation, feature extraction, similarity search, spam detection, and data anonymization, have made significant contributions. His research focuses on mining unconventional types of data, including data streams, images/networks, and text. With respect to data stream mining, its main contribution is to capture the concept drift in real time. In image/network mining, its contribution is reflected in the use of structural frameworks of data or linkages. These structural frameworks are inherently potential or evolving. In an entity object, the network consists of various types of connections and nodes. In order to better explore the availability of various data in the era of big data, Yu's recent research is more about multi-resource learning. It mainly refers to the fusion of data obtained from multiple resources, including multi-view data and multi-mode. State data, its research has been practically applied in many aspects, applications include social networks, e-commerce, health and brain informatics and smart cities.

Dr. Yu has received many prestigious awards, including the 2013 IEEE Computer Society Technical Achievement Award for the outstanding contributions and innovations of Big Data for scalable indexing, search, search, mining, and anonymity, and for The IEEE ICDM Research Contribution Award for 2003 was made in the field of data mining. His dissertation also won the ICDM 2013 Top Impact Paper Award and the EDBT Time Test Award (2014).

Dr. Yu is an ACM and IEEE Fellow. He is the chief editor of the ACM data knowledge discovery journal and is the chief editor of IEEE Knowledge and Data Engineering (2001-2004).

Dr Yu received a bachelor's degree in EE from National Taiwan University, a master's degree and doctoral degree in EE from Stanford University, and an MBA from New York University.

The former SIGKDD Innovation Award winners are as follows: Rakesh Agrawal, Jerome Friedman, Heikki Mannila, Jiawei Han, Leo Breiman, Ramakrishnan Srikant, Usama M. Fayyad, Raghu Ramakrishnan, Padhraic Smyth, Christos Faloutsos, J. Ross Quinlan, Vipin Kumar, Jon Kleinberg , Pedro Domingos, and Hans-Peter Kriegel.

The SIGKDD Innovation Award includes a badge and a $2,500 check issued in San Francisco on Sunday, August 14 at the 22nd ACM SIGKDD Knowledge Discovery and Data Mining (KDD-2016) International Conference. After the award ceremony, Dr. Yu will begin the Innovation Award speech.

2016 SIGKDD Service Award: WEI WANG

Winner: Wei Wang

ACM SIGKDD is pleased to announce Wei Wang's 2016 Service Award, which recognizes its outstanding technical contributions to the fundamentals and practices of data mining and its outstanding service to data mining associations.

The ACM SIGKDD Service Award is the highest service award in the area of ​​knowledge discovery and data mining (KDD). This award is mainly issued to individuals and groups that have made outstanding professional services and contributions in the field of knowledge discovery and data mining.

For a long time, Wei Wang has been serving the data mining field to promote the rapid development of the field. As a world-leading researcher in the field of data mining, she has been the core organizer of key data mining conferences for many years, including ACM KDD, ICDM, SIAM data mining, and has served on more than 100 project committees. In addition, she has served as chairperson of countless award committees, including ACM Data Knowledge Mining, IEEE Knowledge and Data Engineering, Knowledge and Information Systems, Data Mining and Knowledge Discovery, and Assistant Editor of the IEEE Big Data Journal.

In addition, Wei Wang is a pioneer scientist who applied data mining methods to the field of biomedicine. Following the first ACM Conference on Bioinformatics, Computer Biology, and Biomedical Informatics, she served as the core organizer of the University. She has also served on other advanced bioinformatics conferences such as ISMB, RECOMB and BIBM, and has served as an assistant editor of the IEEE/ACM Journal of Computer Biology and Bioinformatics. In view of his leadership in the interdisciplinary field, he was elected to the Board of Directors of ACM Special Interest Groups in Bioinformatics, Computer Biology and Biomedical Informatics in 2015.

Wei Wang has always devoted himself to recruitment, inspiring and promoting young researchers' careers, especially those female groups and minority groups. In order to increase opportunities for students, especially female students and ethnic minority students, to attend high-level conferences, she took the lead in using the NST Fund to support student travel scholarships. The scholarship amount was five times the past, allowing hundreds of students to have the opportunity to Attend these meetings.

At the ACM BCB conference, she worked hard to improve the status of women in the field of computer science. The conference featured major lectures by prominent female scholars, forums for the promotion of research by female teachers and students, and the award of travel scholarships for female students.

Wei Wang holds a master's degree from Binghamton University and a doctorate from the University of California, Los Angeles. Currently, she is a professor at the University of California, Los Angeles, and serves as co-director of the University's Scalability Analysis Institute and the National Center for Health Research's BD2K Center-Coordination Center. Wei Wang has made outstanding contributions to high latitude data clustering, sequential pattern mining, and image mining. She is a pioneer scientist who is known to apply data mining methods to the field of biomedicine and has published more than 150 research papers, of which two have won best paper awards. Her contribution in the field of data mining research was greatly recognized. She won the NSF Business Award, known as the Microsoft Research Fellow, and the Philip and Ruth Hettleman Award for Art and Academic Achievement, the Okawa Basic Research Award, and the CDM. Outstanding Service Award.

The 14 former SIGKDD Service Award winners are as follows: Gregory Piatetsky-Shapiro, Ramasamy Uthurusamy, Usama Fayyad, Xindong Wu, The Weka team, Won Kim, Robert Grossman, Sunita Sarawagi, Osmar R. Zaïane, Bharat Rao, Ying Li, Gabor Melli, Ted Senator, and Jian Pei.

The award, which includes a badge and a check for $2,500, will be presented at the 22nd ACM SIGKDD Knowledge Discovery and Data Mining (KDD-2016) International Conference in San Francisco on Sunday, August 14th.

Via:KDD2016 Awards

The original paper download: Baidu network disk

PS : This article was compiled by Lei Feng Network (search “Lei Feng Network” public number) and it was compiled without permission.