Michael Data


Communication is an integral part of all data analysis. Whether someone is presenting results or general updates to a small or large audience, or you are simply discussing work someone has done, it's important to participate in these discussions. This involves both being able to ask useful questions when you are the audience, and how to present your own work in a useful way.

Being an Audience

A quality data processing system, model, or design will serve some data mining goal and either provide understanding or prediction in some way. It can be tricky to figure out how that is happening.

A data project will involve the full range shown below, but may only directly be responsible for some slice of it. The connection from top to bottom (business to math) should be coherent and complete. The technology stack in the middle tends to run in a well-engineered manner if it is configured correctly at all; It is usually at the ends that problems occur. Reviewing this chain is a critical component of achieving the human validation goal of communicating in math.

A common frustration as an audience is a lack of understanding of where a project's focus lies in the above stack. Alternately, failure to consider the remaining parts of the stack can lead to incorrect results or problems executing the project.

Guiding Questions

Business Application
  • What are you trying to achieve here?
  • How does this affect the bottom line $?
  • How can this require changes to existing business processes or workflows?
Data
  • Where did the data come from?
  • What did you have to do to prepare this data ahead of time?
  • How did you chunk/bin/discretize this to process it?
Model

Handicap Styles

Some presentations rely on particular communication styles to compensate for less rigorous work. These methods are effective at impressing certain audiences regardless of the quality of data analysis work behind them.

The Buzzword Number Cruncher

This person will mention such as Random Forests and what large numbers of features they are using, typically while showing poor results that indicate a lack of familiarity with model selection or basic performance metrics. They know their audience has taken the same Intro to ML MOOC they've taken. They try to gain rapport and confidence by mentioning key shared concepts, then establish authority and experience by mentioning what large numbers are involved in their work to distinguish it from a typical homework assignment.

The Futurist

This person will mention bleeding-edge machine learning techniques but display a poor grasp of when they are appropriate to use. They are riding the AI Hype Wave and making a lot of promises based on technology with a short track record.

The Technology Master

This person will talk a lot about the technology stack they are using. Look at how many machines are in their cluster! They will use the latest cloud-number-crunching hardware and probably a NoSQL database just to show you how much muscle is underneath the hood. The problems come when you ask what they plan to use that monster truck for.

The Impolite Questions

Try at home, kids! :-D

  • How did you determine the accuracy of this model?
    • Is that tested in real-world results, or on simulated or predicted data?
    • No, we understand the original data is from the real world. But is the testing?
  • How long is this report or prediction valid?
    • Does the model have to be retrained?
    • Do we have to reevaluate the model selection altogether?
  • How old are you?
  • How much do you weigh?

– approximately impolite as each other, depending on who you are asking. :-x

Being a Better Presenter

Answer the above questions before they are asked!

  • Describe your tech stack.
  • Describe your techniques.
  • Describe what slice of the stack you are responsible for or are using out-of-the-box.
  • Avoid answering questions with references to more material the audience is unfamiliar with.
    • Don't rely on giving the audience reading assignments for later in order to understand.
    • This is evasive behavior and erodes trust