Enable breakthroughs in Parkinson
disease research through wearables and
Big Data analytics technologies
About us…
• Part of the Big Data Analytics Solutions group @Intel reporting to
the Data Center Group
• Developing products & solutions leveraging:
• Big Data edge-technologies
• Self developed machine learning & steam analytics algorithms
• Our team includes developers, data scientists and system analysts
• I am a Big Data Analytics Architect and Development Manager responsible for
leading-edge technology projects within Intel involving Big Data and stream
analytics solutions in the Internet of Things and Parkinson's disease research
2
How It All Started?
3
Big data analytics
IOT
Parkinson’s Disease
4
OVER AGE
0F 60
1/100 60,000
NEW
1M/US
5M/WORLD
NO CURE,
MEDICATION ONLY HELPS WITH
SYMPTOMSThere is
NO TEST
and no
PROGRESSION
MARKER
PARKINSON’S DISEASE IS CAUSED BY THE DEATH
OF DOPAMINE CELLS.
OF THESE CELLS ARE ALREADY LOST BY THE TIME
MOTOR SYMPTOMS APPEAR.
60 80%TO
Challenges To Address
NO
OBJECTIVE
MEASURE
3-6 MONTHS
BETWEEN
PHYSICIAN
VISITS
CHANGES ARE
SLOW
AND HARD TO
DETECT
AVERAGE
TRIAL SIZE
< 100
PATIENTS
VERY
SMALL
number of
patients
contribute
to research
COST OF
TRIALS
are in the
scales of
$M
5
HOW?
6
The Solution
Wear a watch Start an application
7
1 2
Use Cases
MANAGE
THE
DISEASE
USING
DATA
FREE DATA
FOR 1000’S
OF PATIENTS
ACCURATE
REPORT
SINCE LAST
VISIT
MEASURE
MEDICATION
EFFECT
RESEARCHER
PHARMACEUTICAL
CLINICIAN
INTEL BIG DATA CLOUDANALYSTICS
INSIGHT / VALUE
8
DEMO
9
THE APPLICATION
10
10
Medication
reporting
Medication
reminder
Report
something
PATIENT
REPORTED
OTHER
Configurable
data
collections
Contribution
score
Integrated
Login and
registration Pebble
notifications
OBJECTIVE
MEASURES
Gait
Sleep
Tremor
Activity Level
Controlled
Tests
BIG-DATA and IOT
TECHNOLOGIES
12
IoT Cloud Simplified Framework
DatacenterNetworkThing
Cloud Infrastructure
Data Platform
Analytics Platform
UI Services
Gateway
13
SERVICE LAYER
BATCH LAYER
STREAM ANALYICS LAYER
INGESTION LAYER
STORAGE LAYER
USER INTERFACE LAYER
Mosquitt
o
14
CLOUD COMPUTING SERVICES
Storage Layer
• Cloudera Enterprise Data Hub
• HBase as main scalable time series data storage layer
• Allows high writes throughput
• Random real-time access to stored data
• Highly available MySQL as metadata storage
15
• Multi-protocol pipeline built over AKKA & KAFKA
• KAKFA is a fast, scalable, durable & distributed messaging system
• AKKA is an Actor based framework allowing high concurrency, distributed
and resilient based on events / messaging
• This layer is responsible for:
• Pulling messages
• Parse & Process
• Concurrent & controlled write
HBase
Load Balancer
Device
Device
Device
Device
Mosquito
Mosquito
16
Data Ingestion Layer
• Based on AKKA actors framework
• Contains millions of concurrent actors handling different
streams and operations
• Each actor is a small peace of code performing its role
• A set of actors creates a topology which is
responsible for device’s data stream processing
17
Stream Analytics Layer
Subscriber Parser Aggregator
HBase
Writer
Analytics
Manager
Change
Detection
UnZip
Real Time
Rules
Sleep
Quality
• Based on Apache Spark over HBase
• Spark is a fast and general engine for large-scale data processing
• Algorithms & Calculations are being executed on large data sets on a daily basis
• Layer includes:
• Set of complex machine learning algorithms
• Rule engine rules baseline calculations
18
Batch Analytics Layer
• Interactive and scalable web services layer
• A set of RESTful APIs allowing:
• Registration to platform
• Row & calculated data retrieval from HBase
• Built on top of Play framework and
providing secured entry point
• Uses Apache Phoenix & native HBase client
HBase
Load Balancer
19
Service Layer
HBase
Challenges
20
Data Ingestion to HBase
• Challenge
Concurrent ingesting millions messages into HBase creates a massive load on
HBase region servers and causes disconnections
• Development Evolution
1. HBase client per topology (millions of writers)
2. Pool of HBase clients, each is using separate HTable
3. Pool of HBase clients, all are using same
HBase connection pool (HConnectionManager)
• Solution
Creating “fixed” number of connections to HBase, allowing batch writes and
load balancing
21
.
.
.
Hbase
Writer
Hbase
Writer
Hbase
Writer
.
..
Pool
Router
Table Indicators over Large Tables
• Challenge
Gathering indicators (i.e. counts) on large HBase tables results in long table scans
and performance reduction
• Solution
• Real time update new indicator columns using incrementColumnValue
• Allows atomic increment of a specific column
• Large table counts successfully implemented
• Allowed implementation of required Indicators
• Real time hourly counts
• Real time Max values (i.e. last time a user transmitted data)
22
Batch Processing Input Format
• Challenge
• Batch processing is done using Spark – InputFormat is required for scan
• TableInputFormat was used and is equivalent to a single scan
• Poor performance when data from “remote” parts of a table are required
• Solution
• Using MultiTableInputFormat
• Allows usage of multiple scans
• Successfully used with more than 100 scans per MultiTableInputFormat
23
ANALYTICS
24
Activity Level
• Measure that will continuously describe the
intensity of the patient’s activity throughout the day
and will motivate the patients to be more active.
• Motivates the patients to be more active (known to
be important for PD patients)
• Personalized measure per patient based on their
average activity of walking periods (avoid
frustration)
• Based on intensity measurement from the
accelerometer
• Filters our tremor as
25
Activity Level – An Example
26
Activity Level in Controlled Session (ON State)
Activity Level in Controlled Session (OFF State)
Tremor
• Tremor is one of the most obvious symptoms of PD
• Most PD patients experience tremor
• Tremor is detectable using signal processing
techniques
27
TRAILS AND PARTNERS
28
REAL PD
L-DOPA RESPONSE TRIAL
DATA GATHERING TRIAL
FOX INTEL APPLICATION TRIAL
1000
50
30
20
FOX INSIGHT WEAR 1000
20
20
30
20
10
29
Trial And Partners
SCRIPS TREMMOR TRIAL 1000
WHAT’S NEXT?
30
SCALE PLATFORM
• Scale to 1000’s of
patients in the US
• Scale to 1000’s of patients in the
Netherlands
• IOS support
• Support additional wearable's
• Build more value
generating capabilities
• Upgrade to HBase 1.0
• Upgrade Spark to 1.3
• Enrich Platform (i.e. Advanced
Export, Reporting)
• Enrich Parkinson Disease solution
• Analytics
• Value to patients
31
Q&A
Thank you!
33
• Strategic direction
34
R: 0
G: 112
B: 197
R: 247
G: 127
B: 0
R: 130
G: 170
B: 50
R: 0
G: 172
B: 240

HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Disease Research

  • 1.
    Enable breakthroughs inParkinson disease research through wearables and Big Data analytics technologies
  • 2.
    About us… • Partof the Big Data Analytics Solutions group @Intel reporting to the Data Center Group • Developing products & solutions leveraging: • Big Data edge-technologies • Self developed machine learning & steam analytics algorithms • Our team includes developers, data scientists and system analysts • I am a Big Data Analytics Architect and Development Manager responsible for leading-edge technology projects within Intel involving Big Data and stream analytics solutions in the Internet of Things and Parkinson's disease research 2
  • 3.
    How It AllStarted? 3 Big data analytics IOT
  • 4.
    Parkinson’s Disease 4 OVER AGE 0F60 1/100 60,000 NEW 1M/US 5M/WORLD NO CURE, MEDICATION ONLY HELPS WITH SYMPTOMSThere is NO TEST and no PROGRESSION MARKER PARKINSON’S DISEASE IS CAUSED BY THE DEATH OF DOPAMINE CELLS. OF THESE CELLS ARE ALREADY LOST BY THE TIME MOTOR SYMPTOMS APPEAR. 60 80%TO
  • 5.
    Challenges To Address NO OBJECTIVE MEASURE 3-6MONTHS BETWEEN PHYSICIAN VISITS CHANGES ARE SLOW AND HARD TO DETECT AVERAGE TRIAL SIZE < 100 PATIENTS VERY SMALL number of patients contribute to research COST OF TRIALS are in the scales of $M 5
  • 6.
  • 7.
    The Solution Wear awatch Start an application 7 1 2
  • 8.
    Use Cases MANAGE THE DISEASE USING DATA FREE DATA FOR1000’S OF PATIENTS ACCURATE REPORT SINCE LAST VISIT MEASURE MEDICATION EFFECT RESEARCHER PHARMACEUTICAL CLINICIAN INTEL BIG DATA CLOUDANALYSTICS INSIGHT / VALUE 8
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    IoT Cloud SimplifiedFramework DatacenterNetworkThing Cloud Infrastructure Data Platform Analytics Platform UI Services Gateway 13
  • 14.
    SERVICE LAYER BATCH LAYER STREAMANALYICS LAYER INGESTION LAYER STORAGE LAYER USER INTERFACE LAYER Mosquitt o 14 CLOUD COMPUTING SERVICES
  • 15.
    Storage Layer • ClouderaEnterprise Data Hub • HBase as main scalable time series data storage layer • Allows high writes throughput • Random real-time access to stored data • Highly available MySQL as metadata storage 15
  • 16.
    • Multi-protocol pipelinebuilt over AKKA & KAFKA • KAKFA is a fast, scalable, durable & distributed messaging system • AKKA is an Actor based framework allowing high concurrency, distributed and resilient based on events / messaging • This layer is responsible for: • Pulling messages • Parse & Process • Concurrent & controlled write HBase Load Balancer Device Device Device Device Mosquito Mosquito 16 Data Ingestion Layer
  • 17.
    • Based onAKKA actors framework • Contains millions of concurrent actors handling different streams and operations • Each actor is a small peace of code performing its role • A set of actors creates a topology which is responsible for device’s data stream processing 17 Stream Analytics Layer Subscriber Parser Aggregator HBase Writer Analytics Manager Change Detection UnZip Real Time Rules Sleep Quality
  • 18.
    • Based onApache Spark over HBase • Spark is a fast and general engine for large-scale data processing • Algorithms & Calculations are being executed on large data sets on a daily basis • Layer includes: • Set of complex machine learning algorithms • Rule engine rules baseline calculations 18 Batch Analytics Layer
  • 19.
    • Interactive andscalable web services layer • A set of RESTful APIs allowing: • Registration to platform • Row & calculated data retrieval from HBase • Built on top of Play framework and providing secured entry point • Uses Apache Phoenix & native HBase client HBase Load Balancer 19 Service Layer
  • 20.
  • 21.
    Data Ingestion toHBase • Challenge Concurrent ingesting millions messages into HBase creates a massive load on HBase region servers and causes disconnections • Development Evolution 1. HBase client per topology (millions of writers) 2. Pool of HBase clients, each is using separate HTable 3. Pool of HBase clients, all are using same HBase connection pool (HConnectionManager) • Solution Creating “fixed” number of connections to HBase, allowing batch writes and load balancing 21 . . . Hbase Writer Hbase Writer Hbase Writer . .. Pool Router
  • 22.
    Table Indicators overLarge Tables • Challenge Gathering indicators (i.e. counts) on large HBase tables results in long table scans and performance reduction • Solution • Real time update new indicator columns using incrementColumnValue • Allows atomic increment of a specific column • Large table counts successfully implemented • Allowed implementation of required Indicators • Real time hourly counts • Real time Max values (i.e. last time a user transmitted data) 22
  • 23.
    Batch Processing InputFormat • Challenge • Batch processing is done using Spark – InputFormat is required for scan • TableInputFormat was used and is equivalent to a single scan • Poor performance when data from “remote” parts of a table are required • Solution • Using MultiTableInputFormat • Allows usage of multiple scans • Successfully used with more than 100 scans per MultiTableInputFormat 23
  • 24.
  • 25.
    Activity Level • Measurethat will continuously describe the intensity of the patient’s activity throughout the day and will motivate the patients to be more active. • Motivates the patients to be more active (known to be important for PD patients) • Personalized measure per patient based on their average activity of walking periods (avoid frustration) • Based on intensity measurement from the accelerometer • Filters our tremor as 25
  • 26.
    Activity Level –An Example 26 Activity Level in Controlled Session (ON State) Activity Level in Controlled Session (OFF State)
  • 27.
    Tremor • Tremor isone of the most obvious symptoms of PD • Most PD patients experience tremor • Tremor is detectable using signal processing techniques 27
  • 28.
  • 29.
    REAL PD L-DOPA RESPONSETRIAL DATA GATHERING TRIAL FOX INTEL APPLICATION TRIAL 1000 50 30 20 FOX INSIGHT WEAR 1000 20 20 30 20 10 29 Trial And Partners SCRIPS TREMMOR TRIAL 1000
  • 30.
  • 31.
    SCALE PLATFORM • Scaleto 1000’s of patients in the US • Scale to 1000’s of patients in the Netherlands • IOS support • Support additional wearable's • Build more value generating capabilities • Upgrade to HBase 1.0 • Upgrade Spark to 1.3 • Enrich Platform (i.e. Advanced Export, Reporting) • Enrich Parkinson Disease solution • Analytics • Value to patients 31
  • 32.
  • 33.
  • 34.
    • Strategic direction 34 R:0 G: 112 B: 197 R: 247 G: 127 B: 0 R: 130 G: 170 B: 50 R: 0 G: 172 B: 240

Editor's Notes

  • #2  On Wednesday, August 13, Intel and the Michael J. Fox Foundation (MJFF) announced a collaboration to improve research and treatment of Parkinson’s disease using wearable computing and big data analytics. The collaboration includes a multi-stage study using wearable devices to collect patient data and an Intel-built big data analytics platform to analyze the vast pools of data with the goal of developing objective measures for Parkinson’s disease progression, treatment response and drug development. Key Messages: Intel and the Michael J. Fox Foundation are joining forces to use wearable computing and big data analytics to help improve Parkinson’s disease research and treatments. The collaboration includes a multi-stage study to validate the use of wearable devices and big data analytics to track disease symptoms and develop objective measures for disease progression. The big data analytics platform combines Intel’s hardware and software technologies to provide a powerful cloud-based platform to collect, transform, store, and visualize data from sensors.
  • #4 The story The name of the man in the picture on the left is Andy Grove and he is one of Intel’s founders and has Parkinson (PD) The story begins when he reads an article in the NY times about Big Data and decides to start a project within Intel related to PD and Big Data He contacts Michael J fox foundation and they make a decision to start a joint effort together The idea is to elaborate Internet of things, wearable's technology and big data platforms to assist PD research
  • #5 Disclaimer: I’m not a neurologist, and do not intend to provide an extensive introduction of the disease Parkinson Disease (PD) Parkinson’s disease is a degenerative disorder of the central nervous system that is characterized by serious motoric disabilities, such as shaking, rigidity and slowness of movement It is also characterized also by complicated non-motoric implications, like low sleeping quality, depression and tendency to compulsive behavior There are ~6M Parkinson patients out of them about 1M in the US and about 5M in the rest of the globe 1 out 100 people over the age of 60 in the US is a Parkinson patient and in the US only, ~60 thousand new patients are diagnosed every year Parkinson patients life expectancy is usually between 10 to 15 years There is on cure for the disease and existing medications are mainly for improving patients quality of life by helping with symptoms The disease progresses slowly, meaning that changes in patients condition and disease progress can be observed only over the course of months or years, making management and research of disease difficult Parkinson test & progression are subjectively assessed by physicians being and there is no standard test or progression marker Parkinson’s is a complex disease with symptoms and treatment responses that vary widely. The disease progresses slowly, meaning that changes in clinical and molecular features can be observed only over the course of months or years, making management of and research into Parkinson’s disease difficult. Today, the diagnosis of Parkinson’s, assessment of disease progression, and clinical trials for treatments and medications have largely relied on periodic clinical assessments by a physician and on patient reports. The advent of wearable computing and big data analytics could dramatically enhance our understanding of Parkinson’s disease by enabling scientists and physicians to gather vital data continuously and unobtrusively, without putting a burden on patients, and in significantly larger populations than in traditional clinical trials.
  • #6 So… Why do we really need this solution? I’ll try to describe part of the challenges we’re addressing One of the main challenges is the lack of objective measures, both for patients and physicians Today, patients are monitored only during occasional clinical visits – usually every 3-6 months In those visits, mainly due to stress, patients behave differently and their daily collected reports are subjective to their opinion In addition and as I mentioned in previous slide – the disease progress is also subjective and highly depended on physicians observation during patient visits Additional challenge is related to clinical trials Today, only small amount of data is available to the research community Collecting meaningful amounts of good and reliable data is not trivial Only small amounts of data is available to the research community: One of the main reasons is the fact that the cost of clinical trials is in the scales of millions of dollars, takes extensive time and effort to arrange and complete Today, there are very small number of patients contributing to research resulting in small trial sizes – actually, average trial size is less than 100 patients In addition, collecting data in fine granularity & good quality cannot scale due to trial’s technology limitations Lack Need to handcraft medication regime Levodopa’s positive effect progressively declines, and some patients suffers from dyskinesia No biomarkers (diagnosis is hard: PD ≠ Parkinsonism)
  • #8 Actually, the theoretical solution very simple, we just ask patients to do 2 very simple steps… Wear a watch & start a cell phone application If we want patients to follow those two simple steps we must make sure that the value that our solution is providing is greater than the burden on patients That is our mission definition In our solution we’ll address main challenges I reviewed in previous slides by: Continuous collecting of movement data and objective measurement 24 hours a day, 7 days a week and 365 days a year And providing value to patients by providing them real time insights on their disease and condition (in the form of activity level, tremor detection and sleep quality indicators)
  • #9 So far, so good… we have provided value to patients and collected objective measurement – but what’s next? Who will use this? What are the main use cases? After collection of the data, advanced analytics algorithms are applied on it (I’ll elaborate on those later on) and it is being saved into Intel Big Data Cloud platform Using this data we’re providing: Researchers access to free reliable data of thousands of patients of patients (LDopa) Clinicians get accurate repot for their patients condition since their last visit (RealPD) And pharmaceutical companies the capability to measure their medication effectiveness during its test phase ()
  • #10 Either use real demo or show the next 2 slides
  • #12 As mentioned before The application is based on sensorial data
  • #14 The solution is based on a self developed generic Internet Of Things platform Platform allows “Things” which can practically be any type of devices with some kind of internet connectivity (can be direct or using gateways) sending data to the platform Data transmission can be done using different protocols and can be transformed during transition or after landing in cloud The Entire code stack for this platform is based on open source with Hadoop eco system at its core The platform is cloud based and offers application developers tools to develop their own application on top of it. The key tools are: Data storage which is based on top of Hadoop and HBase Analytics platform – allows both batch and stream analytics developments Built in analytics features such as near real time rule engine and change detection engines And data extraction tools such as an export service
  • #15  Parkinson disease solution is was developed on top of the generic IoT platform I described a moment ago I’ll quickly review the different layers and will dive into few of those later on Computing services are Batch Layer based on Spark Storage layer using Hadoop, HBase & MySQL for Metadata Powerful, scalable ingestion layer based on Akka & Kafka A dynamic stream analytics layer based on Akka actor system framework Scalable Service layer providing set of APIs for registration & data extraction out of the platform UI layer – the only layer in this diagram which is unique to PD solution – using Pebble watch and Android application to collect data and interact with patients You can note that 5 out of the presented 6 layers (excluding the UI layer) are part of the IoT platform and can be used for similar products / verticals
  • #18 Need to redo the whole slide
  • #19 Need to redo the whole slide
  • #20 Need to redo the whole slide
  • #27 This is the activity level of the same subject in his two successive visits in the clinic. In the on state and in the off state (the two sessions were recorded in different days). Although the patient repeated the same (or at least highly similar) protocol in the two visits, we can see that his activity level while in OFF is around half than the activity level while ON. We checked, and this result is also seen while comparing particular activities.
  • #30 Need to redo the whole slide