Hinrich Schütze, LMU Munich
Multilinguality: Boon or Bane for Representation Learning?
Information-rich representations of text often decrease sample complexity when an natural language processing (NLP) system is trained on a task. One effective way of producing such representations is the traditional NLP pipeline: tokenization, tagging, parsing etc. An alternative are so-called embeddings that represent text in a high-dimensional real-valued space that is smooth and thereby supports generalization. Most commonly, words are represented as embeddings, but more recently contextualized embeddings like ELMo have been proposed. I will address two challenges for embeddings in this talk. First, are induced representational spaces domain-specific or can we learn a “universal” space that is a faithful text representation across text types and languages? I will report on experiments on a parallel corpus of 1200 languages that suggest the latter is true. Second, ambiguity is a problem for embeddings because it can conflate what should be different points in the space. Ambiguities multiply in multilingual embedding spaces. I will present experimental results that suggest that most problems with ambiguity are based on our intuitions about low-dimensional spaces and can be avoided in high-dimensional spaces if representations are properly designed.
Hinrich Schütze is professor of computational linguistics and director of the Center for Information and Language Processing at LMU Munich in Germany. Before moving to Munich in 2013, he taught at the University of Stuttgart. He received his PhD in Computational Linguistics from Stanford University in 1995 and worked on natural language processing and information retrieval technology at Xerox PARC, at several Silicon Valley startups and at Google 1995-2004 and 2008/9. He is a coauthor of Foundations of Statistical Natural Language Processing (with Chris Manning) and Introduction to Information Retrieval (with Chris Manning and Prabhakar Raghavan).
Rushed Kanawati, Université Paris 13
Community Detection in Multiplex Networks: Algorithms and Applications
Research in modeling, analyzing and mining large-scale networks has attracted an increasing effort in the last few years. A major trend of work in network modeling and mining concerns analyzing homogeneous static networks (i.e. one snapshot of a network). However, in real world settings, networks are often dynamic, heterogeneous, and both nodes and links can be described by a set of attributes. The concept of multiplex network has been recently proposed to ease modeling real-world networks A multiplex network is often represented as a multi-layer network composed of a set of nodes related to each other with different types of relations. In this talk, we first introduce the concept of multiplex networks, then we focus on the hot problem of community detection in multiplex networks. A review of recent approaches that deal with this problem will be made. Different real-world applications of these approaches will also be presented mainly in the area of recommender systems.
Gemma Boleda, Pompeu Fabra University
At the crossroad between discrete and continuous aspects of language
We use language to talk about the world. Linguistic reference is a unique phenomenon in that it needs to resolve a tension between generalization (words like “cat” need to be applicable to very different entities) and individuation (when we refer to a particular cat, we don’t want to mix it up with different cats). Traditional approaches to reference in Linguistics, Cognitive Science, and Artificial Intelligence have been biased towards one of the two aspects: Symbolic approaches are geared towards individuation, and struggle with generalization, while the converse applies to continuous distributed approaches. I will report on our ongoing research modeling linguistic reference in distributed terms, accounting for both generalization and individuation, in two complementary lines of research: (1) Modeling concepts and entities, (2) tracking referents in dialogue.
Gemma Boleda is the head of the Computational Linguistics and Linguistic Theory (COLT) research group at U. Pompeu Fabra in Barcelona, Spain, which she joined as a tenure-track researcher in 2017. Previously, she held post-doctoral positions in Spain, the USA (UT Austin), and Italy (CIMeC, U. Trento). In her research, currently funded by an ERC Starting Grant, Dr. Boleda uses computational methods to better understand the semantics of natural languages.
Eyke Hüllermeier, Paderborn University
Toward On-the-Fly Machine Learning
The talk starts with a brief historical outline of the evolution of intelligent systems design and corresponding AI paradigms, specifically elaborating on the increasingly important role of learning from data. Motivated by the widespread use of AI these days, along with the ever-growing quest for automation, so-called On-the-Fly (OTF) Computing is sketched as a novel computing paradigm. OTF computing aims at the provision of individually configured software services in a market environment that comprises different types of agents, including service providers and end-users. We envision On-the-Fly Machine Learning (OTF-ML) as an instantiation of the OTF computing paradigm for the case of machine learning, that is, the on-the-fly selection, configuration, provision, and execution of machine learning and data analytics functionality as requested by an end-user. As such, OTF-ML can be seen as an extension of the idea of automated machine learning (AutoML). First attempts at addressing the challenges of OTF-ML are presented, including ML-Plan as a new AutoML tool based on hierarchical planning.
Eyke Hüllermeier is a full professor in the Department of Computer Science at Paderborn University, Germany, where he heads the Intelligent Systems Group. He graduated in mathematics and business computing, received his PhD in computer science from the University of Paderborn in 1997, and a Habilitation degree in 2002. Prior to returning to Paderborn in 2014, he spent two years as a Marie Curie fellow at the Institut de Recherche en Informatique de Toulouse (IRIT) in France (1998-2000) and held professorships at the Universities of Marburg (2002-04), Dortmund (2004), Magdeburg (2005-06) and again Marburg (2007-14).
His research interests are centered around methods and theoretical foundations of artificial intelligence, with a specific focus on machine learning and reasoning under uncertainty. He has published more than 300 articles on these topics in top-tier journals and major international conferences, and several of his contributions have been recognized with scientific awards. Professor Hüllermeier is Co-Editor-in-Chief of Fuzzy Sets and Systems, one of the leading journals in the field of Computational Intelligence, and serves on the editorial board of several other journals, including Machine Learning, the International Journal of Approximate Reasoning, and the IEEE Transactions on Fuzzy Systems. He is a coordinator of the EUSFLAT working group on Machine Learning and Data Mining and head of the IEEE CIS Task Force on Machine Learning.
Lukas Vermeer, Booking.com
Data Science vs. Data Alchemy
The “Big Data” and “Data Science” rhetoric of recent years seems to focus mostly on collecting, storing and analysing existing data. Data which many seem to think they have “too much of” already. However, the greatest discoveries in both science and business rarely come from analysing things that are already there. True innovation starts with asking Big Questions. Only then does it become apparent which data is needed to find the answers we seek. In this session, we relive the true story of an epic voyage in search of data. A quest for knowledge that will take us around the globe and into the solar system. Along the way, we attempt to transmute lead into gold, use machine learning to optimise email marketing campaigns, experiment with sauerkraut, investigate a novel “Data Scientific” method for sentiment analysis, and discover a new continent. This ancient adventure brings new perspectives on the Big Data and Data Science challenges we face today. Come and see how learning from the past can help you solve the problems of the future.
Lukas combines industry experience in online experimentation and Data Science with an academic background in computing science and machine learning.
Highly motivated, quick-witted, eager to learn, coach and teach, and able to think outside of any given box, Lukas has excellent analytical skills, communicative abilities and technical dexterity.
He can unravel the problem, explain the answer and build the solution.
Currently employed by the world’s leading accommodation website, Lukas is responsible for the internal tooling and training that helps product development improve the customer experience in measurable steps through thousands of experiments.