Human-in-the-loop Overfitting (How to Overfit series, 1)

One day, you read data from your parquet or csv files, explore several key columns, and perform EDA. Then, you import some amazing machine learning packages, split data into training and testing, train, cross-validate and tune on training data, evaluate on testing data, and everything seems good, so far.

However, when you apply your model in production. Damn, it’s bad. What could possibly go wrong?

Well, you might be overfitting your data.

The idea of splitting data into training and testing is that testing data should be unseen to the model, more importantly, the modeler. Many developers overlooked the ‘modeler’ part.

Loading the whole data introduced a risk of human-loop overfitting by providing insights of testing data to the modeler. If a modeler sees the whole dataset and make decisions that will affect the modeling, then this is overfitting, even though testing data is not available to the model training.

Instead of splitting data at a later stage of project, I believe it’s better to split data before performing any analysis. If the data is not timestamped, it’s probably safe to randomly split the data file into two files: development and testing. Make sure you never touch testing until you are ready to evaluate your model.

After we evaluate the model on testing set, the results will stuck in our brain and any decisions we make regarding the testing results will potentially lead to overfitting. This kind of overfitting is very subtle.

Product lessons I learned from teaching people Zoom

four questions I got asked

For the past three months, I have been a tech support volunteer for the San Francisco Unified School District (SFUSD). SFUSD is the seventh largest school district in California, educating over 57000 students every year. SFUSD has done a great job helping teachers, students, and parents to prepare and get used to the digital mode, which involves a lot of software and technical issues.

One of the challenges they are faced with is Zoom, which is the most popular enterprise video communications tool. As an engineer in a tech company, I think Zoom is easy to use. It had some security issues but overall it’s great software.

(picture source: unsplash @cwmonty)

Before becoming a tech support volunteer, I didn’t expect there were many questions on how to use Zoom. After getting asked so many Zoom questions, I realized that we engineers may not understand real-world users as well as we think we do.

Zoom is far from perfect for users, especially for teachers and students.

You may think, well, it’s mostly for professionals.

Wrong! According to EdSurge, more than 100,000 schools in 25 countries are using Zoom and education is their largest vertical.

Here are four of the questions I got asked by teachers and I learn many product lessons from them.

  1. How to change the font?

Zoom has a white-boarding feature where users can type text on the shared screen. I didn’t know why it’s necessary to change the zoom until a teacher told me that, kids don’t recognize “a” because they learned by the single-storey “ɑ”. The latter is commonly used in handwriting and intended to be read by children. She preferred to switch to a fond that uses “ɑ” so there is less confusion and kids can learn ɑ the correct way. However, such a choice is not available in Zoom. 

Your user will always surprise you. I believe many people had doubts when they first saw “a” or “g” which are so different from writing. When we design these products, no one would remember this because we are so used to them. Keep in mind that, some users are not, and sometimes they are small kids.

  1. How to mute everyone?

Imagine in a classroom setting, the teacher may show something amazing, then the classroom may burst into exclamation or laughter. The experience is not so fun when everyone is voicing through a microphone. The teacher asked me how to find a way to mute everyone immediately. There is a way to do that but she was looking for something fast so she didn’t need to switching screens and be busy with mouse and clicks. Thanks to Zoom, there is a shortcut. On Mac, you can use Command(⌘)+Control+M to mute everyone except you.

  1. How to enlarge the cursor?

It’s easy for kids to lose attention and that’s the reason why we have lots of pictures and bigger fonts for kids. One problem with Zoom is that, during screen share, you cannot change the size of the cursor. One can probably use “spotlight” to colorize the cursor but it’s not enough, according to this teacher.

The workaround is actually through the Mac system. Users can change cursor size in the Accessibility setting. Allowing users to enlarge the cursor (or bigger Spotlight) is a great feature. It will help presenters to better grab the audience’s attention.

  1. How to share files?

This teacher asked me how to share files. This session was done using a phone call and it was hard for me to explain to her how to share files. For those of you don’t know how to share files using Zoom, there are two ways:

  • The first one is to click “Share Screen”, then you will realize that there is a tab panel on the top and you can actually share files after selecting “Files”. But you cannot share files from your computer because it only allows files from cloud storage like Dropbox and Google Drive. This one is confusing and why “Files” is under “Share Screen”?

  • The other one is to share files using chat, where you can click the “Files” button and you can share files from your computer. This one is better but why you can share files from the computer here but not in the first method? 

This teacher is not familiar with tech and she doesn’t have her laptop on hand so she needed to write it down. After the session was over, I thought to myself, why isn’t there a video demonstrating all these? Actually, there is A LOT of these videos on YouTube. I found one that explains it very and shared it with her via Email. She immediately understood and followed the instructions. The lesson here is, video is a thousand times better than words. 

I noticed that some startups are working on video messaging and I think they are on the right track. I believe that companies should also consider adding videos to their help center pages. 

Here is Zoom’s instructions on sharing files:

Here is a video on YouTube 

Not only it explains how to share files, but also explains situations like, what if you share the files with everyone but someone joined the Zoom session later. Video is a great way to explain to people who are not so familiar with tech or has reading comprehension issues. If I am someone new to tech, I would definitely prefer to follow videos.

Overall, it’s a fruitful experience for me to be able to help teachers and students with technological issues. As engineers and product developers, we should always keep in mind that, users will also surprise us. It’s crucial to listen and learn from them.

As the COVID situation gets worse, many activities are going online. People who are not familiar with tech are those who need help the most, especially kids and seniors. This is a rewarding experience and I encourage you to search for tech support opportunities in your community.

The origin of machine learning and machines

There are lots of machines in Machine Learning: Gradient Boosting Machines, Boltzmann Machine, Helmholtz Machine, Support Vector Machine, etc. So, what’s the matter with these machines. More importantly, why is it machine learning?

In this article, I will introduce the history of the following:

  • Machine

  • Learning Machines

  • Machine Learning

  • Machines

MIT Machine Learning for Big Data and Text Processing Class Notes – Day 2 –  R&D


Before we call it computers, people used the term, “Machine”, which was introduced by Alan Turing in his seminal 1936 paper as “UTM” (Universal Turing machine). Back then, “computer” is used to describe a person who did calculations. Until 1970s, it was not uncommon for companies and governments to advertise jobs as "computers".

‘Machine’ is also used for naming of the ACM (Association for Computing Machinery), which was founded as the Eastern Association for Computing Machinery at a meeting at Columbia University in 1947.

ENIAC Programmers

Don’t forget that computers in the 50s were literally machines. The Electronic Numerical Integrator and Computer (ENIAC) was one of the first large general-purpose digital computers. By the end of its operation in 1956, ENIAC weighed more than 30 short tons, was roughly 8 ft × 3 ft × 98 ft in size, occupied 1,800 sq ft and consumed 150 kW of electricity. Another important aspect is that to develop functionalities, the machine requires rewiring, restructuring, or redesigning the machine due to its fixed program, which is very different from modern concept of software and hardware.

Learning Machines

Machine Learning Department at CMU on Twitter: ""... what we want is a  machine that can learn from experience." ~ Alan Turing, London Mathematical  Society, February, 1947 #mldcmu #machinelearning #ai #ml  #artificialintelligence #

The first combination of “machine” and “learning” comes from Turing. In his 1950 seminal paper, which titled "Computing Machinery and Intelligence" and introduced Turing test to the general public, he created the term “Learning Machines” and used it as one of the headings. He has some quite interesting descriptions of a learning machine, where he compared it to the Constitution:

The idea of a learning machine may appear paradoxical to some readers. How can the rules of operation of the machine change? They should describe completely how the machine will react whatever its history might be, whatever changes it might undergo. The rules are thus quite time-invariant. This is quite true. The explanation of the paradox is that the rules which get changed in the learning process are of a rather less pretentious kind, claiming only an ephemeral validity. The reader may draw a parallel with the Constitution of the United States.

He also describes the concept of Black Box, which is now one of the most important fields in Machine Learning:

An important feature of a learning machine is that its teacher will often be very largely ignorant of quite what is going on inside, although he may still be able to some extent to predict his pupil's behaviour.

Machine Learning

In 1952, Arthur Samuel developed one of the first AI programs. It was a checkers program for IBM 701. In his 1959 paper titled Some Studies in Machine Learning Using the Game of Checkers, he popularized the term “Machine Learning”.

Machines in Machine Learning

However, where do these machines come from?

A graphical representation of an example Boltzmann machine.
  • Boltzmann Machine was named by Ackley, D., Hinton, G., & Sejnowski, T., in 1985 in the paper A Learning algorithm for boltzmann machine.

  • Helmholtz Machine was named by Peter, Dayan; Hinton, Geoffrey E.; Neal, Radford M.; Zemel, Richard S. in 1995 in the paper The helmholtz machine.

  • Support Vector Machine was actually called Support-vector Networks. It was invented by Corinna Cortes and Vladimir Vapnik in their 1995 paper. The term Support Vector Machine probably comes from the first sentence:

The support-vector network is a new learning machine for two-group classification problems.

  • Gradient Boosting Machines was introduced by Friedman in his 1999 paper Greedy Function Approximation: A Gradient Boosting Machine.

They were called ‘Machine’ for many reasons, one of them is actually patent law.

In statutory United States patent law, software and computer programs are not explicitly mentioned. US courts tried clarify the boundary between patent-eligible and patent-ineligible subject matter for computers and software.

Section 101 of title 35, United States Code, provides:

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

In case the invented “algorithms” need to be patented, the scientists preferred to use the word “machine”.

One great video I found during writing of the article is Patrick Winston’s Lecture 16: Learning: Support Vector Machines.

I would like to share one snippet:

Well, I want to talk to you today about how ideas develop, actually.

Because you look at stuff like this in a book, and you think, well, Vladimir Vapnik just figured this out one Saturday afternoon when the weather was too bad to go outside.

That's not how it happens. It happens very differently.

Later in the video, he told the interesting history of Support Vector Machines:

So around 1992, 1993, Bell Labs was interested in hand-written character recognition and in neural nets.

Vapnik thinks that neural nets-- what would be a good word to use?

I can think of the vernacular, but he thinks that they're not very good.

So he bets a colleague a good dinner that support vector machines will eventually do better at handwriting recognition then neural nets.

And it's a dinner bet, right?

It's not that big of deal.

But as Napoleon said, it's amazing what a soldier will do for a bit of ribbon.

So that makes colleague, who's working on this problem with handwritten recognition, decides to try a support vector machine with a kernel, in which n equals 2, just slightly nonlinear, works like a charm.

Was this the first time anybody tried a kernel?

Vapnik actually had the idea in his thesis but never though it was very important.

As soon as it was shown to work in the early '90s on the problem handwriting recognition, Vapnik resuscitated the idea of the kernel, began to develop it, and became an essential part of the whole approach of using support vector machines.

So the main point about this is that it was 30 years in between the concept and anybody ever hearing about it.

It was 30 years between Vapnik's understanding of kernels and his appreciation of their importance.

And that's the way things often go, great ideas followed by long periods of nothing happening, followed by an epiphanous moment when the original idea seemed to have great power with just a little bit of a twist.

And then, the world never looks back.

And Vapnik, who nobody ever heard of until the early '90s, becomes famous for something that everybody knows about today who does machine learning.

Loading more posts…