Learning with Privacy

Distributed Deep-Learning Optimized System (DDOS) for Privacy-Sensitive Medical and Health Applications (Research Collaborators: Dr. Xusheng Xiao at CWRU Computer Science and Dr. Sunny Chung at Cleveland State University)

Deep learning has been becoming a promising focus in data mining research. With deep learning techniques, researchers can discover deep properties and features of events from quantitative mobile sensor data. However, many data sources are geographically separated and have strict privacy, security, and regulatory constraints. Upon releasing the privacy-sensitive data, these data sources generally no longer physically possess their data and cannot interfere with the way their personal data being used. Therefore, it is necessary to explore distributed data mining architecture which is able to conduct consensus learning based on needs. Accordingly, we propose a distributed hierarchical incremental learning optimized system which contains a cloud server and multiple smartphone devices with computation capabilities, and each device is served as a personal mobile data hub for enabling mobile computing while preserving data privacy. We plan to advance data-driven AI learning for wearable computing by constructing a robust AI computing platform to address the following questions: i) how to seamlessly deploy the updated model without manual operations, such as manually loading the trained model to the wearable computer- s/microcontrollers? ii) how to collect sufficient data samples with good quality labels for AI learning? iii) what if a customized learning model is necessary for meeting individual needs?

The proposed system keeps the private data locally in smartphones, deploys computation model seamlessly, shares trained parameters, and builds a global consensus model incrementally. The feasibility and usability of the proposed system are evaluated by multiple experiments and related discussions. User data privacy is protected on two levels. First, local private training data do not need to be shared with other people and the user has full control of their personal training data all the time. Second, only a small fraction of trained gradients of the local model are selected for sharing, which further reduces the risk of information leaking. Two realistic use cases are evaluated and discussed in this dissertation, including risk factors identification for work-related musculoskeletal disorders and fall detection. Both use cases are challenged and crucial for connected healthcare services which involve medical problem identification, feasible solution development, and performance verification.

We have successfully implemented and published a distributed deep learning testbed using 7 smartphones. The local Android-based smartphone service was implemented on Google Nexus 6 and the cloud server ran on university-owned high-performance clusters. Human Activity Recognition dataset was used to evaluate the developed testbed. The dataset was built from the recordings of 30 volunteers performing activities of daily living while carrying a waist-mounted smartphone with embedded inertial sensors. There are 10299 instances in the dataset and the total size of the dataset is 58M.

This research received partial support provided by the Institute for Smart, Secure and Connected Systems at Case Western Reserve University and the Internet of Things Collaborative through a grant from the Cleveland Foundation.

Publications: [J36], [J29], [C21].