Applying Machine Learning to identify Earthquake Patterns

Monday, August 5, 2013

Linear Discriminant Analysis

The Linear Discriminant Analysis (LDA) is a classification algorithm, used in machine learning and pattern detection. For linearly classifiable data, it classifies the data faster than the Support Vector Machine. It also allows for dimensionality reduction, if required. It is used if two or more features convey the same information; for optimizing, the redundant features are not used in deciding the classification criteria. Basically, the LDA maximizes the ratio of the between-class variance to the within-class variance.

For practical purposes, the LDA seems to be used more often than the SVM. So we tried this classification method on our data. We obtained the following results for a Single Sensor LDA.

LDA classification results

Picking on Quiet Data too!

The idea of running the KSigma algorithm on the filtered data seemed promising after seeing the plots in the previous blog. But, what we did not realize all this while was that, if the algorithm was picking better in the quake time, it would pick better on the quiet time too. We were basically enhancing the frequencies that distinguish the quiet time from the quake time, that seemed like the reason why there were more picks in the quiet time.

Below is a plot showing this effect. The top sub plot is the original data from the SAC file. The two blue lines are the TauP estimations of the P and S wave arrival times, the first is for the P wave and the second is for the S wave.

Quiet time picking

Friday, July 19, 2013

Trying to get cleaner picks!

The single SVM sensor seemed to work fine, atleast for the station that we picked randomly. We are currently trying the same for the other stations. Also, the previous results were for a single channel- N. We are now trying to combine channels.

Dr. Chandy suggested that a computationally inexpensive way to derive the frequency characteristics was to pass the data through different filters and it would essentially give the same information as the FFTs. Also, now the SVM has been trained and the weights for different features have been derived, the data obtained after filtering should be weighted with the weight obtained for that feature (the bins and the filters should correspond to the same frequency intervals). Since the data has been now tweaked to better classify into quake and non-quake times, it is expected that it should now be able to pick better, and the kSigma algorithm picks would have a higher confidence value.

Below is an illustration of the algorithm.

Algorithm illustration

Thursday, July 18, 2013

Classification Results for the SVM of a single sensor

We tried to implement an SVM to classify positive and negative picks for a single sensor. We implemented the following algorithm in Python.

The data in the SAC files is divided at the point of the first pick. Suppose the first pick occurs at 34 seconds for some earthquake. The algorithm takes an FFT for each 5 second interval before and after the pick, i.e. 0-5, 5-10, 10-15, 15-20, 20-25, 25-30 seconds for non-quake period and 35-40, 40-45, 45-50, 50-55, 55-60 seconds for quake period (similarly for 10 second intervals). These are stored in an array and the array is then randomly sorted.

One concern was that the SVM seemed to be heavily biased towards the side (positive/ negative) that has more training examples (even one more training example).

We tried this for the following bin sizes.

0-1, 1-2, 2-3, 3-4, 4-5, 5-6, 6-7, 7-8 Hz
0-1, 1-2, 2-4, 4-8 Hz
0-2, 2-4, 4-6, 6-8 Hz

These were all done for both 5 second and 10 second intervals, using both the linear and the RBF kernels. For obtaining the results, we kept the number of training examples for positive picks and negative picks exactly equal.

The table shows the results for the linear kernel for bins of interval 2 Hz and 5 second data intervals.

For bins: 0-2, 2-4, 4-6, 6-8 Hz
5 second intervals: 56 training samples, 8 test samples
Training Data
	5 seconds- linear kernel	5 seconds- rbf kernel
True Positive	71.42	25
False Positive	7.2	0
True Negative	92.8	100
False Negative	28.58	75

Test Data
	5 seconds- linear kernel	5 seconds- rbf kernel
True Positive	100	0
False Positive	0	0
True Negative	100	100
False Negative	0	100


10 second intervals: 24 training samples, 4 test samples
Training Data
	10 seconds- linear kernel	10 seconds- rbf kernel
True Positive	71.42	25
False Positive	7.2	0
True Negative	92.8	100
False Negative	28.58	75

Test Data
	10 seconds- linear kernel	10 seconds- rbf kernel
True Positive	50	100
False Positive	0	0
True Negative	100	100
False Negative	50	0

Thursday, July 11, 2013

Issues with Spectrogram

Since our last post on the Spectrograms and their utility, we have spent some time thinking about the current challenges to the approach and how do we refine our spectrograms to meet the expectations.

Interestingly, all the answers are contained within the way ObsPy creates spectrogram, or any spectrogram is created for that matter. Here is a good link which explains some of the technical aspects of a spectrogram.

In our last post concerning spectrogram, we talked about poor resolution of the spectrograms that we created. It appears to us that whole issue was due to the three important variables in ObsPy spectrogram that we had failed to understand clearly:

Wlen (stands for Window Length): In concept, every spectrogram is created by dividing enitre data stream into small windows, and then FFT of each window is taken and stacked side by side to produce the plot. Therefore 'wlen' defines size of each such window. Now, a very large wlen implies less stacks (hence poorer resolution) while very low wlen implies too less data for single FFT(hence poorer resolution on color scale). We had to carefully tweak the settings for our purpose.
Over Lap: To produce better quality spectrograms, two consecutive windows in a spectrogram usually have some overlapping data points. But a very high overlap can disturb your X axis scaling.

We had test these settings one by one.By tweaking above two parameters, we were able to create somewhat better spectrograms, like the one displayed below.

Several spectrograms have been stacked one over another for LADWP building, with decreasing floor numbers(top to bottom)

Wednesday, July 10, 2013

Issues with a single sensor SVM

It would be very useful if we could train an SVM on a single sensor and detect picks on that sensor correctly. There are a few issues with doing that.

We classify non-quake times as negative picks and quake times as positive picks. The training data is separated at the first time domain kSigma pick. Suppose it occurs at 34 seconds for some earthquake and some station. The algorithm takes an FFT for each 5 second interval before and after the pick, i.e. 0-5, 5-10, 10-15, 15-20, 20-25, 25-30 seconds for non-quake period and 35-40, 40-45, 45-50, 50-55, 55-60 seconds for quake period. We're not sure at this point if this is infact a good idea. It might not capture the information correctly. Also, since 5 seconds is a very small interval, and therefore is lesser data, even slight abnormalities can lead to disastrous results.

We also face data insufficiency issues.There are only a few sensors for which we have the data for all earthquakes during the past few years. Even among those, a few stations did not pick for some of the earthquakes. Also, some stations picked at the 54th second out of the 60 seconds data. This causes there to be more quiet time data than quake time data. The SVM, in such cases, is highly biased towards negative picks and is very likely to predict a positive pick as a negative one.

Tuesday, July 9, 2013

Restrictions due to renaming of Phidgets

The SVM trained on the non-normalized data gave better results than expected. There is a very low probability that this is by chance since the training and test data were chosen such that the a possibility of good results by mere chance is eliminated. Nevertheless, to be sure, it could be tested for a single sensor.

The availability of data is an issue here. The CSN, being a relatively new project, has recorded only a limited number of earthquakes. Also, once a Phidget is unplugged and setup again from a different machine, it is assigned a new identification number. So, it becomes very difficult to figure out a pair of identification numbers which actually belong to the same Phidget. Maybe, in the future, the CSN develops a work around.