NLP: EMNIST Handwritten Character Classification
- 1 min- Using EMNIST data sets to create a more robust version of handwritten character classification.
Abstract
The MNIST database was a subset of a larger dataset known as the NIST Special Database 19 which contains digits, uppercase and lowercase handwritten letters. This project uses a variant of the full NIST dataset, which is called Extended MNIST(EMNIST), which follows the same conversion paradigm used to create the MNIST dataset. The result is a set of datasets that constitute a more challenging classification tasks involving letters and digits, and that shares the same image structure and parameters as the original MNIST task, allowing for direct compatibility with all existing classifiers and systems.
Introduction
Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing.Some advancements with the current data sets have led to create a more robust version of handwritten character classification.This project uses a latest dataset. This project uses several Python packages that come standard with the Anaconda Python distribution. The primary libraries that we’ll be using are:
- 1 NumPy(Provides a fast numerical array structure and helper fuctions)
- 2 pandas: Provides a DataFrame structure to store data in memory and work with it easily and efficiently.
- 3 scikit-learn: The essential Machine Learning package in Python.