Open Analytics Day -  Wednesday (26.11)


Gaël Varoquaux

Researcher, INRIA

Gaël Varoquaux is an INRIA faculty researcher working on computational science for brain imaging in the Neurospin brain research institute (Paris, France). His research focuses on modeling and mining brain activity in relation to cognition. Years before the NSA, he was hoping to make bleeding-edge data processing available across new fields, and he has been working on a mastermind plan building easy-to-use open-source software in Python. He is a core developer of scikit-learn, joblib, and Mayavi, a nominated member of the PSF, and often teaches scientific computing with Python using scipy-lectures.github.com. His infrequent thoughts can be found at gael-varoquaux.info.

Simple big data, in Python

26 November, 2014 Wednesday

Big data, ...everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is going it, so everyone claims they are doing it. Whether you have terabytes of data or not, a sophisticated
analysis will quickly push the limits of your hardware and your engineering team.
Indeed, data, big or small, doesn't speak for itself. Machine learning can build from data predictive models with optimal decision logic. But machine learning can also be a tricky and expensive endeavor.
In this talk, I will discuss how the combination of a powerful Python machine learning stack, with scikit-learn and joblib at it's core, and simple data-processing patterns can be used for leading-edge data analysis with efficiency in mind. The efficiency that matters is not only about using hardware well, but also about human factors: quick, agile, short-cycles to understand the data best.


Matt Dowle

A data.table R csomag fejlesztője

Matt's background is investment banking and hedge funds in London, quantitative equity research and trading both low and high frequency. He began as a programmer with Lehman Brothers in 1996 then moved to become an analyst with Salomon Brothers (later becoming Citigroup) where he was fortunate to learn S-PLUS from Patrick Burns (author of S Poetry and the R Inferno).

He switched to R in 2002 after a comparison of predicted vs realised tracking errors for random portfolios taking 1 hour in S-PLUS was reduced to 1 minute by R with minimal code changes. He moved to Concordia Advisors in 2004 and Winton Capital in 2008, both hedge funds. He is now taking a career break and working on his R package data.table together with collaborators. In 2014 he is giving tutorials at R/Finance in Chicago and useR! in Los Angeles.

My Journey to R

26 November, 2014 Wednesday

In this presentation Matt speaks about how he discovered R and why he loves it. He shares his experiences using R since 2002 and also tells the story of data.table, the package he created for making manipulating data in R comfortable.


Martin Alvarez

Advisory Board Coordinator, ePSI Platform / CTIC

Martin Alvarez-Espinar, Coordinator of the Advisory Board at the ePSI Platform and W3C Office Manager in Spain, has wide knowledge on Web standards and eGovernment. He works at CTIC as Open Data consultant, designing Open Data and PSI re-use strategies for public administrations.

Open Data reuse in the European public sector

26 November, 2014 Wednesday

Open Data is more than a trend in Europe. It brings new business opportunities, increases trust in governments, and enables innovation. The recent amendment of the Directive 2003/98/EC on Public Sector Information re-use includes important changes, such as considering Open Data as a genuine right for citizens. July 2015, deadline to transpose the Directive into national laws, will be an important milestone for innovation, transparency and democracy in Europe.


Tomaž Kaštrun

BI-CRM Analyst & Developer, Spar ICS GmbH, Spar Slovenija d.o.o.

Tomaz Kastrun is a BI developer for Spar ICS GmbH Austria and for Spar Slovenija d.o.o.. He focuses on data mining and programming. Currently he is mainly working with SAP and SAS tools for BI and is an active member of several SQL Server community with more than 15+ years experience on statistics and databases.

Twitter text mining with segmentation in R

26 November, 2014 Wednesday

This session will be focusing on use case of text mining twitter twits. After establishing certified and authenticated connection between R and Twitter, using power of R library for text mining, one can extract the terms, compare the documents create statistics. In this session we will also use R for doing segmentation of twits in order to see, what groups of followers an account has and what are the main topics as well as most common words relevant for each group.


Romain François

co-author of the dplyer R package

Romain has been operating as a freelance R consultant since 2008. He is interested in all things R, but has focused a lot of energy in making the connection between R and C++ as a way to leverage performance. Lately, Romain has written the C++ internals of Hadley Wickham’s popular dplyr package for R.

Introduction to the dplyr R package

26 November, 2014 Wednesday

dplyr is an increasingly popular R package from Hadley Wickham, that presents itself as a grammar for data manipulation. The package works with simple verbs — functions that take data frames as first argument — that each have a well identified role in the data manipulation pipeline: filtering rows, selecting columns, mutating columns, grouping by one or several variables, summarizing and joining. Even though Romain is more focused on the C++ internals of dplyr, this presentation is a quick sightseeing tour of the various dplyr verbs.

Csernai Eszter

Csernai Eszter


Eszter is a data scientist at BalaBit, where she is working on BalaBit's new security analytics product, Blindspotter. Blindspotter uses machine learning algorithms to model users' behaviour on a computer network with the goal of spotting unusual patterns possibly signalling security incidents.
Prior to BalaBit, Eszter worked at Gallup on comparative analyses of social science data from international survey projects, social media and open data, and at Morgan Stanley, where she reviewed and tested stochastic models used for the valuation of interest rate derivatives.
She studied linguistics at ELTE, and holds a master's degree in Quantitative Economics from Corvinus University of Budapest.

Analyzing text data using Python

26 November, 2014 Wednesday

Even though analyzing unstructured text data is rapidly becoming an everyday task for data scientists, it is never a routine task due to the special challenges that free text presents for automated analysis. Fortunately, python has an ecosystem of powerful packages which make the natural language processing workflow relatively smooth and manageable.
In this session, we are going to show through examples how the numpy, pandas, pytables, scikit-learn, and nltk libraries can be used to solve some typical natural language processing problems.


Merész Gergő

Health Economist, Syreon Research Institute

Merész Gergő (MSc) owns a master’s degree in health policy with a specialization in health economics, and a bachelor’s degree in sociology. He has been working at Syreon since September 2010, focusing mainly on health economic modeling, statistical analysis, and data extraction from international health-care databases.

Coauthor networks: beyond citations

26 November, 2014, Wednesday

The scientific community itself generates significant amount of data, which is publicly available via literature databases. By collecting these data on publications, coauthor networks can be constructed, yet this process involves data mining, cleaning and visualization in a larger scale. The flexibility of R provides a suitable framework to develop and use a script which is capable the cover the whole analysis, from data entry to visualization. The R-script is presented via a working example of the Hungarian health economist community.


Nagy Dávid

Associate, i-insight

David Nagy spent his last 2 years as a data scientist, while currently finishing his studies as a Master of Chemical Engineering. He is currently working at i-insight, a great data intensive company engaged in supporting business decision with insight all over the world. David has broad experience in probability theory and statistics, practice in mathematical and process modelling, stable knowledge in Big Data technologies (Aster, Hadoop). Besides his young age he has the necessary understanding of the multiple industries including oil and gas, healthcare and banking. Currently his focus is to strengthen his knowledge with deep understanding of finance.

Social Network Analysis in R

26 November, 2014, Wednesday

“But SNA is not just a methodology; it is a unique perspective on how society functions. Instead of focusing on individuals and their attributes, or on macroscopic social structures, it centers on relations between individuals, groups, or social institutions.” The aim of the presentation is to give a basic understanding about the social networks, it's representations, analysis and about the meanings of the derived metrics. The R programming languages and it's relevant packages will be used as one of the most widely used open source data analysis platform.

Bo Werth

Statistician, OECD

After studying economics and Chinese in Germany, Bo lived in Brazil before moving to Paris. In 2010 he started working as statistician to learn about the creation of analytical datasets using SQL and SAS. Using R linear algebra calculations he discovered it's strength in plotting - reporting systems and interactive applications start to change the way we work. Bo instructs in-house seminars to inform curious colleagues about the capabilities of open source software.

Data processing and analysis with R at OECD

26 November, 2014 Wednesday

Collecting and harmonising data from various formats and sources, reproducible and flexible generation of graphics for analysis and communication are our core statistical activities. Re-occurring tasks suggest the establishment of reporting systems to better control speed and quality of outputs. R user community developments entail very suitable tools to rapidly set up prototype systems with sufficient performance. After definition of in- house usage standards for open sources tools, duplication of efforts can be reduced when researchers document their procedures and make them available using version control and code sharing platforms.

Gyalogh Kálmán

Director, Scheller & Walker Kft.

Gyalogh Kálmán studied Electrical Engineering at BME and graduated in 1985. In 1991 he finished a postgraduate course for engineers at the University of Economic Sciences. Since 2008 he’s been working with full time trading forex and developing trading strategies.

FOREX algorithmic trading using „R”

26 November, 2014, Wednesday

Forex market (short introduction) Introduction into the FOREX market. How it works, how you can be part of it. The reason choosing Forex. Why „R” The reason behind using „R” for this project. Showing the environment and packages applied in development. Trading idea. Throwing up a hypothetic strategy idea, after some analysis. First test of the strategy. Demonstration of a trading simulator written by ourself applied to the strategy detailed above. Optimalization (generate_heatmap). As the first simulation showes some hope we try to see if there is an optimal parameter set for the strategy. Backtest in large. „Large scale backtesting” using the optimal parameter set derived above for a long dataset. Problems and solutions. Backtest with PBO. There is a great problem developing automated strategies: Overfitting. We try to solve this by using the package PBO. Optimalization + forward testing. Now we are convinced that our strategy worth testing for real-life usage. Showing the forward testing stage here. Real life example (MT4 + „R”) Putting the whole into work. Connecting together Metatrader and „R” showing it working real time.

Pécsy Gábor

Senior Manager Data Enrichment, Meltwater

Gábor holds a master's degree in Computer Science and Mathematics, interested in programming methodology, in formal and natural languages and in machine learning. Currently he's leading the Data Enrichment Team of Meltwater (http://www.meltwater.com/). At DET he's leading a group of researchers and software engineers building an NLP and data enrichment system which can process large volume of data and extract valuable information for the customers of Meltwater.

Finding Information Outside the Firewall

26 November, 2014, Wednesday

Traditionally BI focused on data owned by the organization, data within the firewall. However, in the last 15 years there’s been an explosion of information outside the firewall. Managing that data can be helpful for recruiting, marketing, sales development, but also for product development, by giving companies better information about what potential customers want.
Classical BI uses technologies, processes, and applications to analyze mostly internal, structured data and business processes. However, data on the Internet comes in various formats: mostly unstructured or semi-structured, often noisy or not completely reliable. The density of information is lower.
This presentation provides an overview of these challenges and how the combination of natural language processing, machine learning and data analytics can be used to overcome them for collecting valuable pieces of information from this vast ocean of data.

Business Analytics Day - Thursday (11.27)


Felienne Hermans

Assistant Professor, Delft University

Felienne is a professor and entrepreneur in the field of spreadsheets. Her PhD thesis, which she defended in early 2013, centers around applying techniques from software engineering, like testing, refactoring and visualization to spreadsheets. This helps spreadsheet users to better understand and maintain them. In 2010 Felienne founded Infotron, a start up that uses the algorithms developed during the PhD project to analyze spreadsheet quality for large companies. In her spare time, Felienne volunteers as a judge for the First Lego League, a world wide technology competition for kids.

Spreadsheet archaeology: what can we learn for examining spreadsheets?

27 November, 2014 Thursday

When the Enron energy corporation went bankrupt, university researchers acquired a subset of their emails. Felienne Hermans of Delft University has examined those emails, specifically looking for attached spreadsheets.
Felienne found that emailing spreadsheets was very common within Enron, because the set of emails spanning 15 months of emails contained over 50.000 attached spreadsheets! In this session Felienne will explain:

  • The most interesting results from the Enron set
  • What conclusions we can draw from it
  • Methods and techniques to battle spreadsheet problems

Benjamin Wiederkehr

Managing Director & Founding Partner, INTERACTIVE THINGS

Benjamin Wiederkehr is an Interaction Designer with a focus on information visualization and interface design. With his work, he explores opportunities to innovate through the combination of design and technology, to simplify complex data in order to raise awareness, as well as to tell stories with an open intent and meaningful impact. Editor of the datavisualization.ch.

User-Centered & Data-Driven: A love story between user experience and data visualization

27 November, 2014 Thursday

The growing amounts of data and the desire to gain insights from them, demands for easy-to-use tools to perform the tasks of exploration, evaluation and communication of information. As a key component of such "macroscope" interactive, dynamic visualizations can help the user to gain an intuitively accessible, memorable and actionable understanding of the information. But, if you’re now thinking about the horrors of complicated business intelligence applications of the past or futuristic holographic interfaces à la Iron Man, think again. Today’s consumer applications are presenting vast amounts of information that the user needs to handle with ease and confidence.

In the process of creating those kinds of applications we see two different approaches collide: On the one hand, the representation of data requires an intensive examination of the structure and texture of the underlying data set. On the other hand, a user-centered design process must be followed to ensure usefulness and usability for the human using it. Based on case studies from our daily work, I will talk about the pitfalls and highlights of marrying these two principles into a design process that is as robust as possible yet as flexible as necessary.


Giedre Aleknonyte

Analyst, Vodafone

Giedre is a data analysis and visualization enthusiast based in Munich, Germany. Currently an Analyst at Vodafone, she has worked in Billing & Revenue Assurance and Marketing areas in the telecommunication industry in Germany and Lithuania. She has a degree in Computer Science and a natural curiosity for all things data visualization. Giedre loves to spread the joy of data and has spoken at conferences in the US and Europe. She is part of the local geek scene and has presented her data vizzes at Munich’s Nerd Nite.

Introducing Your Business to Smart BI

27 November, 2014 Thursday

How do you get your company to move away from classical, “heavy” BI platforms? How do you motivate people to use more user-oriented BI tools and approach data analysis in a smarter way? What issues are you likely to encounter with business and IT teams, and how can you overcome those hurdles?

Join this session to learn about strategies to encourage the adoption of new BI tools in your organization, which include developing simple solutions to tedious tasks and showing your team that BI software can be more than just an application for answering serious business questions. Yes, data analysis and visualization can be fun!

The talk will feature a case study from Vodafone, where a small team of enthusiastic data-minded people were able to achieve great performance using Tableau Software’s products. You will see a live demo of interactive dashboards used in project management, understand how storytelling with data can help your business, and smile at examples of fun BI.

Jan Willem Tulp

Jan Willem Tulp

Data Experience Designer, TULP interactive

Jan Willem Tulp (TULP interactive) is a Dutch Data Experience Designer. With his work he helps to understand complex phenomena by creating compelling data visualisations and interfaces. The projects of TULP interactive are custom data visualisations, that range from non-interactive explanatory data visualisations, to highly interactive custom exploratory visual analytics tools. Jan Willem Tulp is considered an expert in the field of data visualization, has been a member of a jury at Malofiej and Visualizing,org, and speaks regularly at national and international conferences. Works of TULP interactive have appeared in magazines and books, including Scientific American magazine, The Functional Art and Best American Infographics 2013. TULP interactive has won several awards at Malofiej (Spain), and some of his work has been exhibited at Ars Electronic (Austria) and The Art of Networks at the Florida Institute of Technology (USA). Jan Willem works for clients such as Scientific American, Popular Science, Nielsen, Unicef, Schiphol - Amsterdam Airport, ING Bank and Global Collect.

How to lift the veil?

27 November, 2014 Thursday

Visualizations are not created equally and different situations require different solutions. In this talk Jan Willem will present a practical model to categorize different types of visualizations and how
design decisions differ in each situation, with an extra emphasis on visualizations in a business situation.

Björn Stiel

Björn Stiel

Founder & CEO, spreadgit.com

Björn Stiel is the Founder and CEO of London based Spreadgit.com, a version control system for spreadsheets. Prior to starting spreadgit he worked at UBS Investment Bank in London for six years. He started his career as a rapid application and quant developer in the equities division and later moved into commodity index trading. Björn has extensive first hand experience in building, maintaining and integrating complex spreadsheet applications. At UBS, he was responsible for building the commodity index risk and trading systems and helped it grow to a multi billion dollar book. Björn is a hands on developer and passionate about anything Excel and Python related and is engaged in a couple of consulting projects. He holds a Master degree in Business Economics and in Financial Engineering.

Version Control for Spreadsheets

27 November, 2014 Thursday

Version control and continuous integration have become de-facto standards in software engineering. Source control gives software engineers control over changes to their source code, removes the friction of team collaboration and is vitally important for locating and fixing bugs. For developers, it is unthinkable to work without revision control. Yet when it comes to spreadsheets, the best we seem to have come up with so far is file name timestamping and keeping long lists of file versions.

In this talk I will give an overview of why and how to think about version control for spreadsheets - with a strong emphasis on non developer folks. I will also discuss technical and business lessons learned from building a product and business in a market that is considered stagnant by many people.

Mester Tamás

Mester Tamás

Business Intelligence Analyst/Consultant, Adatlabor.hu

Mester Tomi is the author and consultant of the adatlabor.hu professional blog. Prior to this he worked at prezi.com as a Customer BI Analyst. He has been speaker ate several conferences such as TedXYouth 2013 (http://bit.ly/tenykonfliktus), Internet Hungary 2014, Kutatók Éjszakája 2014, Pecha Kucha Nights, etc.Tomi is interested in the exciting aspects of data analysis: human behaviour, what is the motivation behind certain happenings. He believes that with the help of data we can navigate more easily in the caos, we make better decisions, and at last we can make our clients happier. His other passion is presentation and its techniques - he is the co-founder and CC-level speaker of the first Hungarian Toastmaster club. He makes much account of the encounter of data communication and presentation: no matter how smart the data analyst is if the decision makers can’t understand the investigation results. It is not enough to possess the right data, its communication within the company is just as important - in order to change the world.

Doesn’t the boss get the data?

27 November, 2014 Thursday

I have seen not once in my life genious researches and professional analyses swept under the rug, perfectly set up tests and their results ignored, action plans which were the exact opposites of survey results being deployed and failed. Without exception these were the consequences of wrongly communicated data within the organizations. Communication of data can happen on several levels and each level can be described with its own typical mistake - starting with the definition of wrong key messages or key results through using too simple or too complicated charts to choosing (or not choosing) the right communication channel (email, presentation, etc.).
I truly believe that there are no such things as dumb decision makers or short sighted bosses.... only poorly communicated data. If we manage to avoid these classical mistakes highlighted in this presentation, successful decisions will be made far more easily and everyone will be happy. I will use entertaining but instructive examples brought from the world of startups, multinational and small-middle sized companies.

Szekeres Péter

Szekeres Péter

Research Lead, Neticle Technologies

As one of the founders of the Neticle Technologies he works on the development of NLP applications, automatic opinion analysis, and text mining solutions. Besides maintaining one of the most accurate automatic opinion analysis application written on the hungarian language I investigate whose work could be supported effectively with automatic text processing.

Simple visualisation of large amount of textual data

27 November, 2014 Thursday

In this presentation the speaker will present different methods and solutions that they have developed during the last 2 years to provide a synthetic visualization of large amount textual data (in this case social media contents, online news, comments). He will speak about the progress of how the lingustic algorithms and visualisation moduls together made a better and better composition gradually. He will also give an overview about the problems they had to face and feedbacks received from clients/users, and will present their unique indicators and charts (web opinion index, mention graph, attribute map).

Koren Miklós

Koren Miklós

Associate Professor, Central European University

Miklos Koren is an associate professor at Central European University and head of its newly starting M.Sc. in Business Analytics program. He is also leading CEU MicroData, a group of researchers, analysts and software developers working with large-scale socioeconomic databases. His research focuses on international trade and inequality, firm performance, and knowledge transfer. His work is published in leading international academic journals. Miklos holds a Ph.D. from Harvard University.

The role for economic theory in big data business analytics

27 November, 2014 Thursday

Predictive analytics, and big data, in particular, offer the promise of better business decisions without relying on untested hypotheses. In this talk I argue that there is a role for economic theory in setting up business analytics projects and implementing subsequent business decisions. I first show some examples where purely data-driven decisions have gone awry. The reason is that agents we are collecting data about respond to changes in incentives, and will change their behavior after a change in business practices. I propose a symbiotic approach to theory and analytics.
Detailed contents:
1. How not to design class composition? External validity of predicted results.
2. How did this song get on my last.fm? A Lucas critique of consumer recommendations.
3. A checklist can save your life. When simple is better.


Daróczi Gergely


Passionate R developer, former assistant professor teaching statistics and data analysis for 5 years, PhD candidate in Sociology, co-author of a quant book, currently working on "Mastering Data Analysis with R", maintainer of half dozen of CRAN packages, founder of a cloud and R based reporting web application and the Hungarian R User Group.

Data visualization with R at the Big Data Challenge of Telecom Italia

27 November, 2014 Thursday

R is often criticized for not being able to handle large amount of records, and besides the performance issues, R cannot handle data in a memory-efficient way. This talk will give a quick overview on how I sorted out these problems while preparing for the Big Data Challenge, and how I could process 500+ Gb of CSV files to generate a short video on the volume of telecommunication and transportation in Milan. The resulting animation tells the story of human activity targeted at an audience without any prior statistical or similar knowledge.

Tóth Zoltán

Tóth Zoltán

Tech Lead Manager Data Services, Prezi

Prior to joining Prezi Zoltán worked as a developer/architect for pharmaceutical market research companies. Now, as Senior Data Engineer, he helps Prezi build and operate a world-class data infrastructure.

The Usability of Charts

27 November, 2014 Thursday

There are a few simple rules of how to make an understandable chart, and most probably we all know what these are. In this presentation the speaker will sum these up and show some non-trivial caveats that should be avoided in order to make a chart easily understandable by everyone.


Erdőssy Janka

Lead Business Analyst - Analytical Excellence, InfomatiX

Experienced BI Analyst and Analytical Excellence consultant . She has two Bachelors in Applied Economics and a Masters in International Business and Economics.
She spent over 2 years dealing with on-site collaboration with clients from various departments and industries focusing on BI Initiatives implementation. She leads data visualization workshops and manages prototypes as well as conducting trainings for business users to enable them to leverage BI tools.
Her mission is to compress the time required for scaling solutions as well as conveying a huge array of knowledge in the fastest growing areas of BI.

What is Analytical Excellence in 2014? Leading powers of enterprise analytics

27 November, 2014 Thursday

We all face challenges once users dream about new reporting Initiatives. How to let these dreams flow, while ensuring data will be there once we would like to populate these reports?
In her keynote presentation Janka Erdőssy showcases the most hyped trends of data visualization using her extensive hands-on experience with market leaders of different business sectors. She will give and discuss meaningful insights to the following questions:

  • At what point should advance analytics be implemented in the pilots?
  • Which factors determine the service we need for maximizing ROI of our BI reporting?
  • Can we create pilots without data?

Fülöp Gábor

BI Consultant – Analytical Excellence, InfomatiX

Gábor Fülöp is an experienced BI Consultant at InfomatiX, responsible for developing analytical training programs as part of the Analytical Excellence services. Before joining InfomatiX he worked for GfK Hungary, managing data acquisition. He holds BA in Business Studies and MBA in Finance. Gabor has been working with clients in several industries, such as Professional Services, CPG and Retail, therefore his wide business experience serves as great asset for his insightful approach to data visualization and business analytics.

Clarity from Complexity - Role of data visualization in the age of unstructured data

27 November, 2014 Thursday

Today’s data sets are so large and complex that analytical methods of the past won’t work anymore.
Do you know what makes the real difference between market leaders and market followers? BI technology and analytical culture.
Real business examples illustrating the best practices from capturing data through analysis to visualization.

The list of speakers is subject to change.