Vision – A Voice Assistant App

INTRODUCTION

According to the estimates from the World Health Organization (WHO), around 39 million people across the globe are suffering from complete blindness and around 246 million have low vision i.e. severe or moderate visual impairment. Among these around 90 percent of the world’s visually impaired people live in developing countries. These people face a lot of challenges and troubles due to inaccessible infrastructure and other social challenges. Some of the major difficulties faced by them include unable to use smartphones to perform basic functionalities like messaging or calling, navigation problems, recognizing different denominations which further lead to inaccessibility in getting involved in day-to-day chores. Therefore, visually impaired people thus need an assistive tool to help them cope with these difficulties and simplify them to an extent.

The proposed system implements a voice assistant app for visually impaired people to assist them in basic activities like calling, messaging, date/time accessibility and to recognize Indian currency denominations with ease and accuracy.

As of today, many people suffer from visual impairment. In modern society, visually impaired people need helpful tools for them to be able to perform routine tasks such as operating digital devices. In today’s technologically advanced world, machine-learning algorithms along with a proper deployment, the platform has become the go-to solution for almost everything. The proposed system is a low-cost and easy-to-use application to help visually impaired people to access the most important features of the mobile phone.

PROPOSED SYSTEM

The proposed system is to build a customized application which acts as a voice assistant and can be used to help the visually impaired to access the most important features of their mobile phones. The app consists of four modules. These are:

1)Messaging Inbox – In this module the system will speak the new messages for the user and the user can also send messages through Speech Recognition API and text-to-speech API.

2)Phone Manager – In this module the user can either use the provided dialer or can speak recipient’s phone number to make a call.

3)Time/Date and Battery Status – In this module the user can get the phone’s current battery status and also know date and time.

4)Camera – This module will be used to identify Indian currency denomination and predict the notes scanned by the camera.

METHOD OF IMPLEMENTATION

The android application is made using Kotlin programming language. Kotlin is a modern statically typed programming language that helps boost productivity, developer satisfaction, and code safety. Some of its features are – Expressive and concise, safer code, interoperable, and structured concurrency. The built in speech-to-text and text-to-speech APIs are used for voice assistant functionality. The speech-to-text API is an intent based API, which launches Google's Speech Recognition service, and returns back the text result. The text-to-speech API, unlike Speech Recognition, is available without Google Services, and can be found in android.speech.tts package.

Implicit intents are used to make a phone call after receiving the recipient's phone number. The ACTION_CALL action is used to trigger built-in phone call functionality available in Android devices. Implicit intents send the user to another app or service based on an action the user would like to perform. For example, here we have a phone number and we want to make a call. For this instead of building our own activity, we create a request to make the phone call using Implicit Intent.

Next we have features such as sending and receiving messages and getting the current battery status and battery percentage. These functionalities are implemented using broadcast receivers. Apps can register specific broadcasts. When a broadcast is sent, the system automatically routes broadcasts to apps that have subscribed to receive that particular type of broadcast. The BatteryManager class is used to broadcast all battery and charging details and the onReceive() method of the BroadcasrReceiver class is used to receive messages.

The currency detection model uses Deep Learning techniques to recognize any Indian Currency using image as an input feed. Deep Learning is a machine learning technique that teaches computer to do what comes naturally to humans. A computer model learns to perform classification tasks directly from images, text or sound.

The image detection is done through a Convolutional Neural Network model built using Tensorflow and Keras Library of python.

Convolutional neural network (CNNs) are one of the most popular technique used to improve images classification accuracy.

The steps involved are:

1.Training the model

2.Converting the model

3.Deploy to device

4.Optimize the model

The model classifies currency images into 10 different categories comprising of valid old and new currencies. I have used a simple sequential model for classification purposes.

The steps followed in building the model are:

1)Dataset collection – The official IEEE dataset of Indian and Thai currency is used which comprises of 2000 different images.

2)Splitting dataset – The dataset is partitioned into training and testing directory.

3)Building the model – Building a sequential model using MaxPooling and convolutional layers.

4)Using Image augmentation – Image augmentation is used to expand the dataset by rotation, flipping, zooming, shifting, etc. and improve the performance of the model.

5)Training and Testing - The model is trained on images in the training directory and is tested with images, which the model hasn’t seen previously.

Next, the model needs to be converted. Therefore, to convert a trained TensorFlow model to run on mobile devices, the TensorFlow Lite converter Python API is used. This reduces the model into a Flat Buffer, reducing model size and modifying it to be used on TensorFlow Lite operations.

The next important step is deploying the model into the android application. When deploying a model for use on mobile devices, it is important to consider the model size, workload and the operations that are used.

Model Size – A model must be small enough to fit within your target device’s memory.

Workload – The size and complexity of the model has an impact on workload. Large, complex models might result in a higher duty cycle, which will increase power consumption and heat output.

The TensorFlow Lite interpreter, runs optimized models on edge devices such as mobile phones and microcontrollers.

CONCLUSION

This paper proposes a much helpful voice assistant app for visually impaired people. This system will be very easy to use and will run on the Android operating system. The voice recognition API and text-to-speech (TTS) makes it very easy for users to navigate around different functionalities of the app. The application with its deep learning based technique to recognize and classify Indian currencies provides a reasonable accuracy and will help visually impaired people to be able to improve their quality of life by reducing their dependency and aiding them in their day to day life.

Vision – A Voice Assistant App

Tuesday, October 6, 2020

INTRODUCTION

Popular Posts

About Me