2017 Speakers 2016 Speakers

Chris D’Agostino | Capital One

chrisdagostino

Chris D’Agostino is a VP of Technology and helps lead Capital One’s big data initiative within its credit card business. Chris has over 25 years of software development experience and has been a hands-on contributor to enterprise applications in the areas of data processing, mobile and responsive web. He has served as an adjunct professor at George Mason University and the University of Virginia (UVa) and was the founder and CEO of a software company that exited successfully after ten years of steady growth.

Chris is the host and sponsor of the event, and will deliver a keynote.

Chris’s keynote address

https://www.linkedin.com/in/chrisd13/

Ted Malaska | Blizzard

Ted Malaska - headshot

Ted Malaska serves as a MetiStream Advisor. He brings with him deep industry expertise in the area of big data, open source technologies such as Hadoop, Apache Spark and advanced analytics. Ted is currently the Technical Group Architect at Blizzard Entertainment. He is the Co-author of O’Reilly Media’s Hadoop Application Architectures and a recognized industry leader in designing complex data management and analytics solutions in the healthcare and data intensive industries. Prior to joining Blizzard, Ted was a Principal Architect at Cloudera Inc., a leader in the Hadoop and big data arena.

https://www.linkedin.com/in/tedmalaska/ 

ABSTRACT: In search of value. Today the pace and expectation of technology and data are moving faster than ever before. Which makes it even more important to stay focused on the goals and select the right tools and direction for our organization to successfully reach them. As the Group Technical Architecture at Blizzard, Ted Malaska lives in the center of cloud, deployments, monitoring, and data. In the last 25 years, Blizzard has been known for some of the most iconic games ever made in the gaming industry. Join us for a talk on how Blizzard is not only leading as a game developer but as a technology company pushing the bounds of Cloud and Data with the goal of increasing the value they can provide their players.

Ted’s presentation: In Search of Value

Charmee Patel | Syntasa

charmeepatel

Charmee Patel is a Data Architect at Syntasa and is working on building next generation Advanced Analytics Application that can handle many types of digital interactions and allows it’s users to gain Behavioral Insights quickly. Prior to Syntasa she was doing Distributed Data Processing at Telarix; mastering the concept of pushing Computations to Data instead of Data to Compute nodes well before Hadoop revolutionized large scale distributed storage and computing on commodity hardware.

https://www.linkedin.com/in/charmee-patel-4a851011/

ABSTRACT: Agile Machine Learning for Customer Behavioral Analytics. Even with technological advances like Apache Spark, most companies still take months to build scalable machine learning models and put them in production. In many cases, model predictions are not evaluated after they are put in production and the cycle between each model retraining may be 6 months to a year. Why? Is there a better way? Yes. “There is an App for that!”

In this session, we focus on a class of machine learning problems that aim to understand behavior of individuals or entities based on all their online and offline interactions across multiple channels. We present an agile application that enables users to quickly build and experiment with machine learning models in days instead of months, and put successful models in production with a few clicks. We show how complex data preparation, machine learning model building, evaluation, model deployment, and production management can be abstracted using behavioral schema, training data templates, and model templates. We discuss how Apache Spark provides the ideal framework for building scalable components that convert these abstractions to code that can be executed in different execution environment.

Charmee’s presentation: Zero to Production-Ready in Days, Not Months

Craig Lovell | ArctosAI

Craig Lovell - headshot

Craig Lovell is a software architect and computer scientist with over 20 years of experience designing and building mission-critical information systems in the financial, biotech, and intelligence industries. He specializes in Artificial Intelligence, Algorithms, and Software Architecture. Craig holds a BS and MS in Logic and Computation from Carnegie Mellon University. He currently works at ArctosAI, a company he co-founded to build Natural Language Processing and Text Mining applications.

https://www.linkedin.com/in/craiglovell/

ABSTRACT: Natural Language Processing is the branch of Artificial Intelligence that deals with understanding and computing on unstructured text as an input. NLP algorithms are in use today to accept application input from speech commands, translate between languages, gauge the sentiment of movie reviews and social media content, and automatically summarize and categorize documents. In this session we will explore the basic concepts and practical applications of NLP while highlighting areas where these algorithms can be implemented using Apache Spark to achieve scale.

Craig’s presentation: Natural-Language Processing

Pat Patterson | StreamSets

patpatterson

Pat Patterson has been working with Internet technologies since 1997, building software and communities at Sun Microsystems, Huawei, Salesforce and StreamSets. At Sun, Pat was the community lead for the OpenSSO open source project, while at Huawei he developed cloud storage infrastructure software. As a developer evangelist at Salesforce, Pat focused on identity, integration and the Internet of Things. Now community champion at StreamSets, Pat is responsible for the care and feeding of the StreamSets open source community.

https://www.linkedin.com/in/metadaddy/

ABSTRACT: StreamSets Data Collector (SDC) is designed to make data ingest and processing easy. SDC integrates at several levels with Apache Spark to make data analysis straightforward, as well as working with Databricks Cloud to trigger jobs based on incoming data. In this session, you’ll learn how a large retail player with thousands of outlets is using StreamSets to power Spark jobs on the Databricks cloud, combining real-time foot traffic data with historic behavioral & transaction data for analytic insights that improve revenue per square foot.

Pat’s presentation: Analytic Insights in Retail Using Apache Spark and StreamSets

Patrick Hall | H2O

Patrick Hall - H2O

Patrick Hall is a senior data scientist and product engineer at H2O.ai. Patrick works with H2O.ai customers to derive substantive business value from machine learning technologies and his internal product work at H2O.ai focuses on model interpretability. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning.

Prior to joining H2O.ai, Patrick held global customer facing roles and R & D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick is the 11th person worldwide to become a Cloudera certified data scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.

https://www.linkedin.com/in/jpatrickhall/

ABSTRACT: Machine Learning with GLM and GBM. Generalized linear models (GLMs) and gradient boosting machines (GBMs) are two of the most widely used supervised learning approaches in all of commercial data science. GLMs have been the go-to predictive and inferential modeling tool for decades, but important mathematical and computational advances have been made in training GLMs in recent years. This talk will contrast H2O’s implementation of penalized GLM techniques with ordinary least squares and give specific hints for building regularized and accurate GLMs for both predictive and inferential purposes. As more organizations begin experimenting with and embracing algorithms from the machine learning tradition, GBMs have come to prominence due to their predictive accuracy, the ability to train on real-world data, and resistance to overfitting training data. This talk will give some background on the GBM approach, some insight into the H2O implementation, and some tips for tuning and interpreting GBMs in H2O.

Patrick’s presentation: Maching Learning With Gradient Boosting Machines

Richard Garris | Databricks

richardgarris

Richard Garris is a Principal Solutions Architect at Databricks focused on helping clients with their Advanced Analytics initiatives using Apache Spark and MLlib . He has spent 13 years working with enterprises in data management and analytics. Richard got his undergraduate degree at The Ohio State University and Masters in Software Management from CMU. His previous work experience includes Skytree, Google and PwC.

https://www.linkedin.com/in/rlgarris/

ABSTRACT: Apache Spark has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do you deploy these ML model to a production environment? How do you embed what you’ve learned into customer facing data applications? In this talk I will discuss best practices on how data scientists productionize machine learning models, do a deep dive with actual case studies, and show live tutorials of a few example architectures and code in Python, Scala, Java and SQL.

Richard’s presentation: ‘Productionizing’ ML Models

Silvio Fiorito | Databricks

silviofiorito

Silvio is a Resident Solutions Architect with Databricks. He joined the company in May, 2016 but has been using Spark since it’s early days back in v0.6. He’s delivered multiple Spark training courses and spoken at several meetups in the DC area. He’s worked with customers in the financial industry, digital marketing, and cyber security all using Apache Spark. In addition to Spark development, Silvio also has a background in application security and forensics.

https://www.linkedin.com/in/silviofiorito

ABSTRACT: Apache Spark can truly accelerate your analytics, whether you’re doing ETL, Machine Learning, or Data Warehousing. However, to really make the most of Spark it pays to understand best practices for data storage, file formats, and query optimization. This talk will cover best practices I’ve applied for several different customers to help them improve their Spark applications as well as how to identify what patterns make sense for your use case.

Silvio’s presentation: Best Practices and Patterns with Apache Spark

Zach Hanif | Capital One

zacharyhanif

Zachary Hanif leads the security machine learning team at Capital One, where he currently works to create powerful analytics within batch and real time data processing engines though applied statistics and rapid correlation. In addition to his individual contributions, Zachary is currently working to establish the Center for Machine Learning within Capital One. His research interests revolve around applications of machine learning and graph mining within the realm of massive security data and the automation of model validation and governance.

https://www.linkedin.com/in/zachary-hanif-942bab49/

Zach’s presentation: A Sufficiently Cheap Lunch: Model Optimizations for Humans