Imagining Patient Experiences with Face Recognition, IoT and OCR

IoT
Blog

Anup Vasudevan ,

Raj Saxena ,

Raju Reddy and

Sai Kiran Naragam

Published: December 20, 2016

What cascade detectors, sensors, and scanners can bring to Bahmni, an open source hospital information system for low-resource environments.

Visiting a hospital is rarely a pleasant experience. Aside from the obvious trauma of needing to go to a hospital, there are other inconveniences such as long queues and having to carry physical documents and paper slips from one department to the other.

We looked at hospitals using Bahmni, an open source hospital information system for low resource environments, to identify common process inefficiencies that need improvement.

[Points of inefficiencies in the patient experience]

Inefficiency 1: Patient identification

To deliver appropriate clinical care, patients need to be accurately identified and their medical records updated. To do so, hospital staff search for returning patients using their name, village etc., when the latter has either forgotten or misplaced their ID card. This search can be prone to errors and delays considering the high patient load they handle, multiple patients with the same name from the same village, similar sounding names etc.

When a patient’s registration isn’t found on search, staff create a new profile, abandoning the existing profile and the medical history associated with it. This hampers good clinical care. This can also be frustrating to doctors, who know from memory that the patient is a return visit. But they are unable to find the patient’s records and need to rely on their description of ailments/diagnosis.

Inefficiency 2: Vitals measurement

Once the patient is registered or found in the system, the next step is for their vitals to be taken by a nurse or doctor. Currently, this is manually recorded and entered into the system against each patient.

Inefficiency 3: Paper-based medical records

Throughout the healthcare process, patients are given sheets of paper with their diagnosis, lab tests, treatment plan, prescription etc. When digitized as images, they cannot be indexed, therefore useless for data analysis and reporting.

Currently, digitization is done manually by entering values into the system. This is burdensome, time-consuming and error-prone.

While exploring solutions for these inefficiencies in the patient experience, we must bear in mind that Bahmni is designed especially for low-resource environments. It needs to work in hospitals that experience interruptions in electricity and Internet, among other things.

In this article, we—a small team from Thoughtworks Hyderabad working on ideas around machine learning and IoT—aim to present experimental solutions. At this stage, these ideas are exploratory; we are still considering ways to make them work with Bahmni. We, and the Bahmni team, are excited by the possibilities.

Expediting registration and patient identification with face recognition

With an inexpensive webcam, facial recognition can instantly identify a patient and pull up their medical records. In addition to the existing search functions, a “search by face” feature can be introduced to launch the webcam, detect the patient’s face in the frame, and search for the database of faces it has already learned.

If a face is identified, the record for that patient can be brought up on the screen. Otherwise, the user interface for adding a new patient can be launched so that hospital staff can enter patient details.

Our proof of concept in Python used OpenCV, an open-source image processing and computer vision library, for face detection and recognition. The face detection is achieved using HAAR cascade detectors, which is a well-established standard. In cases where there is more than one person in view, we also made sure to pick the face closest to the camera.

For face recognition, OpenCV provides several options like EigenFaces, FisherFaces and Local Binary Pattern Histograms (LBPH). We chose LBPH for its robustness under different lighting conditions and accuracy for straight faces. We also used Bottle server to set up a simple REST API for this system to communicate with Bahmni.

This isn’t without challenges. In our experiment, the accuracy of face detection was good for straight faces but not so for oblique faces. We used the LFW (Labelled Faces in the Wild) face dataset and Yale dataset respectively as benchmarks for the hard and easy face detection and recognition tasks. For easy tasks, we reached 100% accuracy: face recognition is reliable as long as people look straight into the camera. However, on the hard task, we were only 18% accurate, a long way from success.

[Straight faces]

[Oblique faces]

Another concern is the size of data that needs to be stored. Each face requires multiple snapshots to be stored, and each snapshot is roughly 1 MB of information. A typical hospital may have 50,000 to 200,000 patient records, needing anywhere between 50 and 200GB of space.

This is not a grave concern for Bahmni systems because they are localized to individual hospitals. But to scale this system, we need to explore more efficient methods.

Another challenge is how to handle the case of people who might look similar, say identical twins. This might lead to errors, as hospital staff begin to rely on the system to find the correct match without verifying patient details themselves. OpenCV provides a recognition API which just returns a single face id, in which case, only one face would be returned even if there are multiple similar ones.

OpenFace offers solutions to these concerns. It is a neural network based open source solution, which converts each face into a vector of numbers to uniquely identify it. As there are only about 128 numbers per face, the data representation becomes compact (typically less than a KB), making storage easier. Similar faces can also be identified with reasonable accuracy by comparing its vector with the database.

Automating vitals measurement

In our proof of concept, we integrated sensors for measuring temperature, pulse and observing ECG. We used Arduino to interface those sensors with a computer for data analysis.

Arduino collects data from the sensors and writes to the computer’s serial port, from which data can be processed. We used LM35 temperature sensor to record body temperature, pulse sensor to measure bpm (beats per minute) and AD8232 to observe ECG. This could significantly reduce human intervention in the vitals measurement process.

[More efficient patient experience with face recognition, automated vitals measurement, and OCR]

The main challenge currently is the availability of sensors. The sensors recommended for medical use are too expensive or difficult to set up in low-resource environments. Cheaper sensors tend to be noisy. To overcome this, we are considering signal processing through Baseline Estimation and Denoising with Sparsity (BEADS). We aim to use signal processing on noisy sensor data to isolate its signal component.

Digitizing paper records

Most medical records such as prescriptions and lab results are often in the form of physical print outs, requiring the patient to carry them. In places where this is digitized, hospital staff manually enter information into the EMR. This takes time and human effort, causing delays.

Optical Character Recognition (OCR) helps extract textual information from images to enable digitization in a searchable and indexable format. But, OCR systems are not 100% accurate, and inaccurate update of records can be life-threatening to the patient.

To eliminate that possibility, we aim to strengthen the process in two ways. Firstly, medical reports and forms need to be pre-processed so that each segment of text corresponds to a field in the system. Secondly, when opened in Bahmni, the system entry and the digitized image would appear side-by-side, enabling manual verification as necessary.

In our proof of concept, we used the open-source OCR system Tesseract. We’re building an OCR pipeline around Tesseract with pre-processing and post-processing enhancements to improve predictions. Our pre-processing enhancements include image de-skewing (taking an image where the page is trapezoidal and making it rectangular and straight) and page segmentation (dividing the page into segments at a sentence or word level).

Post-processing stages would combine the predictions of the OCR with a statistical language model (a method to understand the statistical properties of words and character distributions in a language). For most of the pre-processing, we’re using SciKit-Learn and OpenCV library in Python, and the language model can be created using KenLM.

Looking forward

We began this experiment hoping to solve some of Bahmni’s problems by pushing the boundaries of our capabilities. We have been able to build partial prototypes so far and there is a long way to go before they can be included in Bahmni.

At every step, we need to be exceptionally careful about maintaining accuracy, and that is the focus of our next endeavor. Our immediate challenges are improving segmentation of the words in a text for OCR, improving signal processing for real-time inputs, and improving user experience to reduce errors in face recognition.

At Thoughtworks, we have had the privilege of playing with futuristic technology, learning and discovering the various ways in which technology can transform lives. In this endeavor, we’ve looked for practical, yet affordable manifestations of our discoveries (experimental code base here). If you’ve worked on any of these technologies before, or have any feedback, suggestions or ideas, do leave a comment below.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Solutions

Industries

Resource Hubs

Publications and Tools

All Insights