Tag :Java

Retour sur l’EclipseCon France 2017

Les 21 et 22 juin avait lieu à Toulouse l’EclipseCon France. J’y participais pour la première fois grâce aux invitations que la fondation Eclipse a offert à Duchess France. J’avais la chance d’accompagner Aurélie Vache, bien connue des Duchess France et aguerrie à l’EclipseCon. Lire la suite

Simplon.co ouvre une nouvelle formation gratuite !

Simplon.co revient avec une toute nouvelle formation ! Elle s’appelle CQP Développeur·se Nouvelles Technologies, et elle est spécialisée sur le vaste écosystème … Java !!!

Lire la suite

Predict your activity using your Android, Cassandra and Spark

Lately, I started a new sport activity: running. When you run, you get really curious about the acceleration, the distance, the elevation and other metrics you can analyse when you practice this kind of sport. As a runner, I started using phone application (runkeeper) and recently I bought a Garmin Watch so I can get more information about my running sessions.
But how this kind of application analyse data and compute all this metrics?
Let’s focus on one metrics: proper acceleration.

What is proper acceleration?

Proper acceleration or physical acceleration is the acceleration it experiences relative to freefall and is the acceleration felt by people and objects. It is measured by an accelerometer.
The accelerometer data consist of successive measurements made over a time interval. that’s what we call a time series.

timeseries

How can I get an accelerometer?

Luckily, most of smartphones contain an accelerometer sensor.
The sensor measures 3 values related to 3 different axes as shown in the picture bellow:
accelerometerSchema

As an Android fan, I implemented an Android App: Basic Accelerometer which shows different axes values and the current date as timestamp.

Let’s create Basic Accelerometer Android App!

All source code is available on my Github repository here.
First step, I implemented the start activity:
https://gist.github.com/MiraLak/924090ad709098284d6c

After creating the starting menu, I have to collect the sensor values in a new activity: “AccelerometerActivity”.
To use the sensor, the activity class must implements SensorEventListener.
https://gist.github.com/MiraLak/cd7832d742d530598754

Now, I’m able to get information from the sensor and post them to an online REST service.
I used Retrofit, a REST client for Android and Java:
https://gist.github.com/MiraLak/5a86bb3586204b8b290f

After that, I added an asynchronous task to post sensor values at each sensor’s update:
https://gist.github.com/MiraLak/0242d2caa0df24bee28f

Now we’re able to launch our app!

How to install the app on your phone?

  • Download Android Studio
  • Clone the BasicAccelerometer project and open it on Android Studio
  • Activate developer mode on your Android phone (must have 4.0.3 version and above).
  • Plug your phone, run the app and choose your phone as a target.

The application will start automatically on your phone and you will see the screen below:

BasicAccelerometerScreen1

Now as the application is started, we will focus on the REST Service.

REST Service and Cassandra DB

The android app is ready to sent us real time data: time series of our acceleration.
As you may have noticed, I used acceleration bean on my Android app:
https://gist.github.com/MiraLak/63261abb04f17fc62e5f
The acceleration is posted to a REST service.
The REST API receiving accelorometers data and storing them into Cassandra. Each acceleration contains:

  • acceleration capture date as a timestamp (eg, 1428773040488)
  • acceleration force along the x axis (unit is m/s²)
  • acceleration force along the y axis (unit is m/s²)
  • acceleration force along the z axis (unit is m/s²)

Rest API sources are available on my Github here. All data are saved on Cassandra Data Base.

Apache Cassandra is an NoSQL database. When writing data to Cassandra, data is sorted and written sequentially to disk. When retrieving data by row key and then by range, you get a fast and efficient access pattern due to minimal disk seeks – time series data is an excellent fit for this type of pattern.

To start Cassandra data base:

  • dowload the archive Cassandra 2.1.4
  • open it
  • execute this command: sh /bin/cassandra

On the REST application, I used Spring Data Cassandra which uses DataStax java driver so I can easily interact with Cassandra DB to do different operations:  write, read, update or delete.
Spring Data Cassandra helps to configure Cassandra cluster and create my keyspace:
https://gist.github.com/MiraLak/14d835992cf2980ceebe

After configuration, I created my application model:

https://gist.github.com/MiraLak/1d2bf2d8e28bff7a3a98

We are trying to store historical data, so I used a compound key (user_id and timestamp) as they are unique:

https://gist.github.com/MiraLak/ac995c21ee05387cd893

Then, I added the REST controller.
The controller receives POST request with an acceleration and insert values on Cassandra DB.
https://gist.github.com/MiraLak/fc187ed3a25d7de5d1e6

The acceleration bean used in the controller is the same as defined for the Android app with an extra attribute: userID (I’ll explain the usage later).

After defining the REST controller and defining Cassandra configuration, we’re able to run the application:

https://gist.github.com/MiraLak/a97d9b9375660b932e50

Spring Boot starts a Jetty Server but you can use Tomcat Server instead. You’ll have to update project dependencies:
dependencies {
compile("org.springframework.boot:spring-boot-starter-web") {
exclude module: "spring-boot-starter-tomcat"
}
compile("org.springframework.boot:spring-boot-starter-jetty")
compile("org.springframework.boot:spring-boot-starter-actuator")
compile("org.springframework.data:spring-data-cassandra:1.2.0.RELEASE")
testCompile("junit:junit")
}

Let’s launch REST app!

We have already our Andoid. We run now the REST app. Then you have to add REST app URL to Android app:
http://myLocalIp:8080/accelerometer-api

As soon as you click on start button, Basic Accelerometer app begins to send acceleration data to the REST service:

basicAccelerometerScreen2

And then we start see insertion logs on REST app:

insertIntoCassandraLogs

And if we check Cassandra DB, we launch a CQL terminal to do some queries:
sh apache-cassandra-2.1.4/bin/cqlsh

Here’s how data looks like in Cassandra:
cassandraCqlTerminal

At this level, we collect data from the accelerometer and we store it in Cassandra Data Base.

How to analyse accelerometer data?

Remember, we aim to analyse our acceleration data. We must have some references to be able to create a decision tree model.

Luckily this model already exist and there is an interresting article explaining how to create a decision tree model based on acceleration data using data from Cassandra, Spark and MLib here.

Apache Spark is a fast and general engine for large-scale data processing.  MLlib is a standard component of Spark providing machine learning primitives on top of Spark which contains common algorithms , and also basic statistics and feature extraction functions.

The source code to do prediction with an exiting model is available on my Github here. [Update: this is the latest version with Scala]

We want now to guess just by analysing our acceleration if we are walking, jogging, standing up, sitting down, goind up or downstairs.
The decision tree model contains Resilient Distributed Dataset (RDD) of labeled points based on some features.

A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated on in parallel.

The feature include different values:
https://gist.github.com/MiraLak/1cc323a08e3880da4695

So to analyse the collected data from BasicAccelerator application, we have to compute features as defined in our decision tree model.
We init our Spark Context:
https://gist.github.com/MiraLak/74be7d0db0eb81099fb9

Then, we read data from Cassandra Data Base (UserID « TEST_USER » is hard coded in REST service application, you can update it or add it to Android App).

Spark-Cassandra Connector is a lightning-fast cluster computing with Spark and Cassandra. This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and execute arbitrary CQL queries in your Spark applications.

The connector transforms data written into Cassandra into Spark’s RDDs:
https://gist.github.com/MiraLak/e9276bb60145245f359f

After creating our features and computed them into vectors, we can call the model and try to predict our activity.
You must use spark-cassandra-connector-java-assembly-1.3.0-SNAPSHOT or above to be able to save and load models:
https://gist.github.com/MiraLak/a130849a9fa7a1916d55

Last final step: prediction

Now we can launch our prediction to see if we can predict the activity based on acceleration:

  1. Launch the REST application
  2. Start the Andoid app with REST application URL
  3. Do an activity during 30seconds (Sitting, Standing Up, Walking, Jogging or Going up or down stairs) while holding the phone in one hand.
  4. Stop the Android app
  5. Launch the prediction activity:

https://gist.github.com/MiraLak/b213be868d61f29d227d

Then you will see the predicted activity as a result:

predictActivity

Conclusion

We’ve seen how to use a connected object (smartphone) to collect time series data and store it into Cassandra.
Then we used Spark Cassandra Connector to transform data into RDD. Then we analysed those RDD using a decision tree model created with Spark.
This is a just a light simple of the infinite possibilies we can have nowadays with conneted devices.

Rencontre avec Amira Lakhal

amira_lakhalNous continuons notre série d’interview avec le portrait d’Amira LAKHAL, développeur Java chez Valtech et membre très active de la team Duchess France.

Agnès: Peux-tu te présenter, ton métier, ton parcours? As-tu des activités geeks extra-pro?

Amira: Je m’appelle Amira LAKHAL. Je suis développeuse Java à Valtech Paris. Je suis passionnée d’Agilité et de langages fonctionnels. J’ai suivi un cursus assez classique: après un baccalauréat scientifique, j’ai intégré un IUT et ensuite une école d’ingénieur. Je suis aussi membre de l’association Duchess France depuis quasiment le début et je fais partie du bureau il y a bientôt deux ans.
Mise à part le développement informatique, je pratique un peu de sport et plus précisément le running. D’ailleurs, je fais partie de l’équipe Duchess qui va courir les 10Km organisé par Nike, le 9 Juin prochain.

Agnès: Qu’est-ce qui t’a donné le goût de ce métier ?

Amira: Mon histoire avec l’informatique a débuté très tôt. Etrangement, comme la plupart des développeurs, j’étais accro au légo. Après avoir collectionné diverses séries de Lego, j’ai eu droit à mon premier PC à douze ans. Mes premières instructions étaient des commandes DOS pour lancer mon premier jeu sur disquette. C’est ainsi que ma passion pour l’informatique est née.

Agnès:  Quelles sont les dernières présentations que tu as données? Les prochaines?

Amira: J’ai co-animé avec Duy Hai Doan des présentations et hand’s on autour de Cassandra. Le but de ces présentations est faire découvrir aux débutants les différentes caractéristiques de cette solution NoSQL et de donner un exemple d’usage à travers l’atelier pratique.

Je vais aussi co-animer, avec Ludwine Probst, une session autour de Spark et Cassandra. La présentation aura lieu au Geneva JUG fin Avril.

Agnès: Qu’est-ce que t’apportent ces présentations ? Quels sont tes trucs pour te préparer?

Amira: Faire une présentation est un exercice qui nécessite beaucoup de préparation. Il y a d’un côté la préparation du sujet : des recherches, des tests, des analyses etc… et puis aussi l’entrainement sur la présentation orale : la création du support, de l’enchainement et le respect de la durée.

Ce n’est donc pas facile mais si on est motivé, on peut apprendre et gagner en expérience sur les présentations.

De mon côté, quand je prépare une présentation, je la répéte plusieurs fois avant de la faire devant le public. Mes répéti

Analyze accelerometer data with Apache Spark and MLlib

The past months I grew some interest in Apache Spark, Machine Learning and Time Series, and I thought of playing around with it.

In this post I will explain how to predict user’s physical activity (like walking, jogging, sitting…) using Spark, the Spark-Cassandra connector and MLlib.

The entire code and data sets are available on my github account.

This post is inspired from the WISDM Lab’s study and data (not cleaned) come from here.

 

Spark-accelerometer

A FEW WORD ABOUT APACHE SPARK & CASSANDRA

Apache Spark started as a research project at the University of California, Berkeley in 2009 and it is an open source project written mostly in Scala. In a nutshell, Apache Spark is a fast and general engine for large-scale data processing.
Spark’s main property is in-memory processing, but you can also process data on disk and it can be fully integrated with Hadoop to process data from HDFS. Spark provides three main API, in  Java, Scala and Python. In this post I chose the Java API.
Spark offers an abstraction called resilient distributed datasets (RDDs),  which are  immutable and lazy data collections partitioned across the nodes of a cluster.

MLlib is a standard component of Spark providing machine learning primitives on top of Spark which contains common algorithms (regression, classification, recommendation, optimization, clustering..), and also basic statistics and feature extraction functions.

If you want to get a better look at Apache Spark and its ecosystem, just check out the web site Apache Spark and its documentation.

Finally the Spark-Cassandra connector lets you expose Cassandra tables as Spark RDDs, and persist  Spark RDDs into Cassandra tables, and execute arbitrary CQL queries within your Spark applications.

AN EXAMPLE: USER’S PHYSICAL ACTIVITY RECOGNITION

The availability of acceleration sensors creates exciting new opportunities for data mining and predictive analytics applications. In this post, I will consider data from accelerometers to perform activity recognition.

The data in my github account are already cleaned.
Data come from 37 different users. Each user has recorded the activity he was performing. That is why something the data are not relevant and need to be cleaned. Some rows are empty in the original file, and some other are misrecorded.

DATA DESCRIPTION

I have used labeled accelerometer data from users thanks to a device in their pocket during different activities (walking, sitting, jogging, ascending stairs, descending stairs, and standing).

The accelerometer measures acceleration in all three spatial dimensions as following:

  • Z-axis captures the forward movement of the leg
  • Y-axis captures the upward and downward movement of the leg
  • X-axis captures the horizontal movement of the leg

The plots below show characteristics for each activity. Because of the periodicity of such activities, a few seconds windows is sufficient to find specific characteristics for each activity.

walking_jogging_view

stairs_view

standing_sitting_view

We observe repeating waves and peaks for the following repetitive activities walking, jogging, ascending stairs and descending stairs. The activities Upstairs and Downstairs are very similar, and there is no periodic behavior for more static activities like standing or sitting, but different amplitudes.

 

DATA INTO CASSANDRA

I have pushed my data into Cassandra using the cql shell.

https://gist.github.com/nivdul/88d1dbb944f75c8bf612

Because I need to group my data by (user_id, activity) and then to sort them by timestamp, I decided to define the couple (user_id, activity) and timestamp, as a primary key.

Just below, an example of what my data looks like.

Capture d’écran 2015-04-15 à 20.25.19

And now how to retrive the data from Cassandra with the Spark-Cassandra connector:

 

https://gist.github.com/nivdul/b5a3654488886cd36dc5

 

PREPARE MY DATA

As you can imagine my data was not clean, and I needed to prepare them to extract my features from it. It is certainly the most time consuming part of the work, but also the more exciting for me.

My data is contained in a csv file,  and the data was acquired on different sequential days . So I needed to define the different recording intervals for each user and each activity. Thanks to these intervals, I have extracted windows on which I have computed my features.

Here is a diagram to explain what I did and the code.

 

IMG_7954

 

First retrieve the data for each (user, activity) and sorted by timestamp.

 

https://gist.github.com/nivdul/6424b9b21745d8718036

 

Then search for the jumps between the records in order to define my recording intervals and the number of windows per intervals.

https://gist.github.com/nivdul/84b324f883dc86991332

DETERMINE AND COMPUTE FEATURES FOR THE MODEL

Each of these activities demonstrate characteristics that we will use to define the features of the model. For example, the plot for walking shows a series of high peaks for the y-axis spaced out approximately 0.5 seconds intervals, while it is rather a 0.25 seconds interval for jogging. We also notice that the range of the y-axis acceleration for jogging is greater than for walking, and so on. This analysis step is essential and takes time (again) to determine the best features to use for our model.

After several tests with different features combination, the ones that I have chosen are described below (it is basic statistics):

  • Average acceleration (for each axis)
  • Variance (for each axis)
  • Average absolute difference (for each axis)
  • Average resultant acceleration (1/n * sum [√(x² + y² + z²)])
  • Average time between peaks (max) (Y-axis)

 

FEATURES COMPUTATION USING SPARK AND MLLIB

Now let’s compute the features to build the predictive model!

AVERAGE ACCELERATION AND VARIANCE

https://gist.github.com/nivdul/0ff01e13ba05135df09d

AVERAGE ABSOLUTE DIFFERENCE

https://gist.github.com/nivdul/1ee82f923991fea93bc6

AVERAGE RESULTANT ACCELERATION

https://gist.github.com/nivdul/666310c767cb6ef97503

AVERAGE TIME BETWEEN PEAKS

https://gist.github.com/nivdul/77225c0efee45a860d30

THE MODEL: DECISION TREES

Just to recap, we want to determine the user’s activity from data where the possible activities are: walking, jogging, sitting, standing, downstairs and upstairs. So it is a classification problem.

Here I have chosen the implementation of  the Decision Trees algorithms using MLlib, to create my model and then to predict the activity performing by users.

You could also use others algorithms such as Random Forest or Multinomial Logistic Regression (from Spark 1.3) available in MLlib.
Remark: with the chosen features, prediction for « up » and « down » activities are pretty bad. One trick would be to define more relevant features to have a better prediction model.

Below is the code that shows how to load our dataset, split it into training and testing datasets.

 

https://gist.github.com/nivdul/246dbe803a2345b7bf5b

 

Let’s use DecisionTree.trainClassifier to fit our model. After that the model is evaluated against the test dataset and an error is calculated to measure the algorithm accuracy.

https://gist.github.com/nivdul/f380586bfefc39b05f0c

RESULTS

classes number mean error (Random Forest) mean error (Decision Tree)
4 (4902 samples) 1,3% 1,5%
6 (6100 samples)  13,4% 13,2%

 

CONCLUSION

In this post we have first demonstrated how to use Apache Spark and Mllib to predict user’s physical activity.

The features extraction step is pretty long, because you need to test and experiment to find the best features as possible. We also have to prepare the data and the data processing is long too, but exciting.

If you find a better way/implementation to prepare the data or compute the features, do not hesitate to send a pull request or open an issue on github.

Hands-On Spark

Pour ce premier Hands-On Duchess de l’année vous proposons de venir découvrir Apache Spark le 10 mars dans les locaux de Blablacar lors d’une session de code en Java et/ou Scala (à vous de choisir ! les deux seront disponibles). Les inscriptions sur font sur Meetup.

IMG_6940

Spark, c’est quoi ?

Apache Spark est un projet open source, codé en Scala et initialement créé a l’universite de Californie à Berkeley en 2009.

C’est un framework qui permet de faire de l’analytics sur de gros volumes de données en faisant ses traitements principalement en mémoire mais aussi sur disque, avec des performances largement supérieures à la plupart des outlis Big Data, comme Hadoop.

Spark est particulièrement intéressant pour des traitement itératifs, que l’on retrouve beaucoup en Machine Learning.

 

La soirée

Le but de ce Hands-On est de vous faire manipuler l’API de Spark, son shell, et découvrir les API de l’écosystème de Spark : Spark SQL, MLlib (Machine Learning) et Spark streaming au travers de plusieurs exercices.

Le Hands-On sera disponible en Java et Scala.

 

Prérequis pour la soirée

  • Java 8 (pour bénéficier des lambda expression), si vous voulez coder en Java
  • Scala, pour ceux qui veulent utiliser l’API Scala
  • un IDE (intelliJ, eclipse…) et Maven installé
  • Une machine avec au
  • moins 4Go de RAM

 

Le pair-programming sera largement encouragée lors de cette soirée, donc pas de panique si vous oubliez votre laptop !

 

Speakers

me

Ludwine Probst est Data Engineer chez Cityzen Data, où elle travaille sur du Machine Learning et du traitement de gros volumes de données temps réel notamment avec Spark.

 

sam

Sam Bessalah est développeur freelance passionné de systèmes distribués, et de tout ce qui tourne autour de la data. Il co-organise le meetup du Paris Datageeks.

 

 

Rendez-vous le 10 mars chez Blablacar avec votre laptop à partir de 19h15. La soirée débutera à 19h30 précise !

BlaBlaCar

Merci à notre hôte Blablacar de nous accueillir et nous offrir boissons et pizzas.

 

Rencontre avec Pauline Iogna, développeuse Java et enseignante à l’université

C’est au tour de Pauline Iogna, membre Duchess depuis plusieurs années, de se prêter au jeu de l’interview.

pauline

Pauline, développeuse Java et enseignante à l’université,  s’est lancée cette année en tant que speakeuse à des événements techniques et conférences (Human Talk Paris, Devoxx France, JUG Summer Camp ou encore Devoxx Anvers…). Elle partage avec nous son expérience, et revient sur ce qui l’a motivé à se lancer, ses appréhensions et quelques conseils pour ceux qui n’ont pas encore osé franchir le pas.

 Pour rappel, notre google group est ouvert à tous et plusieurs thread portent sur des conseils et discussions autour des Call For Papers.

 


Téléchargez ici le mp3

 

Les questions :

0’00 Présentation (parcours, études et job actuel)

0’59 Enseignante à l’université

1’56 L’expérience speakeuse : le déclic ? comment ?

5’00 Speakeuse à Devoxx Anvers : les coulisses

5’58 Qu’est ce que ça t’apporte ?

6’50 Tes prochains projets de présentations

7’13 Quelques conseils pour ceux qui n’osent pas encore se lancer

8’09 Les actions Duchess France : ateliers CFP, mailing list du google group, répétitions hangout

9’22 Message de fin

En continuant à utiliser le site, vous acceptez l’utilisation des cookies. Plus d’informations

Les paramètres des cookies sur ce site sont définis sur « accepter les cookies » pour vous offrir la meilleure expérience de navigation possible. Si vous continuez à utiliser ce site sans changer vos paramètres de cookies ou si vous cliquez sur "Accepter" ci-dessous, vous consentez à cela.

Fermer