Applying Machine Learning to identify Earthquake Patterns: July 2013

Friday, July 19, 2013

Trying to get cleaner picks!

The single SVM sensor seemed to work fine, atleast for the station that we picked randomly. We are currently trying the same for the other stations. Also, the previous results were for a single channel- N. We are now trying to combine channels.

Dr. Chandy suggested that a computationally inexpensive way to derive the frequency characteristics was to pass the data through different filters and it would essentially give the same information as the FFTs. Also, now the SVM has been trained and the weights for different features have been derived, the data obtained after filtering should be weighted with the weight obtained for that feature (the bins and the filters should correspond to the same frequency intervals). Since the data has been now tweaked to better classify into quake and non-quake times, it is expected that it should now be able to pick better, and the kSigma algorithm picks would have a higher confidence value.

Below is an illustration of the algorithm.

Algorithm illustration

Thursday, July 18, 2013

Classification Results for the SVM of a single sensor

We tried to implement an SVM to classify positive and negative picks for a single sensor. We implemented the following algorithm in Python.

The data in the SAC files is divided at the point of the first pick. Suppose the first pick occurs at 34 seconds for some earthquake. The algorithm takes an FFT for each 5 second interval before and after the pick, i.e. 0-5, 5-10, 10-15, 15-20, 20-25, 25-30 seconds for non-quake period and 35-40, 40-45, 45-50, 50-55, 55-60 seconds for quake period (similarly for 10 second intervals). These are stored in an array and the array is then randomly sorted.

One concern was that the SVM seemed to be heavily biased towards the side (positive/ negative) that has more training examples (even one more training example).

We tried this for the following bin sizes.

0-1, 1-2, 2-3, 3-4, 4-5, 5-6, 6-7, 7-8 Hz
0-1, 1-2, 2-4, 4-8 Hz
0-2, 2-4, 4-6, 6-8 Hz

These were all done for both 5 second and 10 second intervals, using both the linear and the RBF kernels. For obtaining the results, we kept the number of training examples for positive picks and negative picks exactly equal.

The table shows the results for the linear kernel for bins of interval 2 Hz and 5 second data intervals.

For bins: 0-2, 2-4, 4-6, 6-8 Hz
5 second intervals: 56 training samples, 8 test samples
Training Data
	5 seconds- linear kernel	5 seconds- rbf kernel
True Positive	71.42	25
False Positive	7.2	0
True Negative	92.8	100
False Negative	28.58	75

Test Data
	5 seconds- linear kernel	5 seconds- rbf kernel
True Positive	100	0
False Positive	0	0
True Negative	100	100
False Negative	0	100


10 second intervals: 24 training samples, 4 test samples
Training Data
	10 seconds- linear kernel	10 seconds- rbf kernel
True Positive	71.42	25
False Positive	7.2	0
True Negative	92.8	100
False Negative	28.58	75

Test Data
	10 seconds- linear kernel	10 seconds- rbf kernel
True Positive	50	100
False Positive	0	0
True Negative	100	100
False Negative	50	0

Thursday, July 11, 2013

Issues with Spectrogram

Since our last post on the Spectrograms and their utility, we have spent some time thinking about the current challenges to the approach and how do we refine our spectrograms to meet the expectations.

Interestingly, all the answers are contained within the way ObsPy creates spectrogram, or any spectrogram is created for that matter. Here is a good link which explains some of the technical aspects of a spectrogram.

In our last post concerning spectrogram, we talked about poor resolution of the spectrograms that we created. It appears to us that whole issue was due to the three important variables in ObsPy spectrogram that we had failed to understand clearly:

Wlen (stands for Window Length): In concept, every spectrogram is created by dividing enitre data stream into small windows, and then FFT of each window is taken and stacked side by side to produce the plot. Therefore 'wlen' defines size of each such window. Now, a very large wlen implies less stacks (hence poorer resolution) while very low wlen implies too less data for single FFT(hence poorer resolution on color scale). We had to carefully tweak the settings for our purpose.
Over Lap: To produce better quality spectrograms, two consecutive windows in a spectrogram usually have some overlapping data points. But a very high overlap can disturb your X axis scaling.

We had test these settings one by one.By tweaking above two parameters, we were able to create somewhat better spectrograms, like the one displayed below.

Several spectrograms have been stacked one over another for LADWP building, with decreasing floor numbers(top to bottom)

Wednesday, July 10, 2013

Issues with a single sensor SVM

It would be very useful if we could train an SVM on a single sensor and detect picks on that sensor correctly. There are a few issues with doing that.

We classify non-quake times as negative picks and quake times as positive picks. The training data is separated at the first time domain kSigma pick. Suppose it occurs at 34 seconds for some earthquake and some station. The algorithm takes an FFT for each 5 second interval before and after the pick, i.e. 0-5, 5-10, 10-15, 15-20, 20-25, 25-30 seconds for non-quake period and 35-40, 40-45, 45-50, 50-55, 55-60 seconds for quake period. We're not sure at this point if this is infact a good idea. It might not capture the information correctly. Also, since 5 seconds is a very small interval, and therefore is lesser data, even slight abnormalities can lead to disastrous results.

We also face data insufficiency issues.There are only a few sensors for which we have the data for all earthquakes during the past few years. Even among those, a few stations did not pick for some of the earthquakes. Also, some stations picked at the 54th second out of the 60 seconds data. This causes there to be more quiet time data than quake time data. The SVM, in such cases, is highly biased towards negative picks and is very likely to predict a positive pick as a negative one.

Tuesday, July 9, 2013

Restrictions due to renaming of Phidgets

The SVM trained on the non-normalized data gave better results than expected. There is a very low probability that this is by chance since the training and test data were chosen such that the a possibility of good results by mere chance is eliminated. Nevertheless, to be sure, it could be tested for a single sensor.

The availability of data is an issue here. The CSN, being a relatively new project, has recorded only a limited number of earthquakes. Also, once a Phidget is unplugged and setup again from a different machine, it is assigned a new identification number. So, it becomes very difficult to figure out a pair of identification numbers which actually belong to the same Phidget. Maybe, in the future, the CSN develops a work around.

Normalization of the FFT histograms

Dr. Bunn suggested normalizing FFT histograms for the SVM to be able to generalize better. This seemed logical and we tried it out. Surprisingly, the results were not better, infact, for quake predictions, the results were more likely to be predicted correctly by a random guess (less than 50% accuracy). But, the number of false positives, did go down considerably.

Results:
Training Data:
Even the training data was not fit by the SVM trained on it.
True positive: 48.90%

True Negative: 97.23%

False Positive: 2.77%

False Negative: 51.10%

Test Data:
Since even the training data was not fit by the SVM, we did not even expect the test data results to look good.
True positive: 30%

True Negative: 100%

False Positive: 0%

False Negative: 70%

Support Vector Machines

For a proof of concept, we tried to train a support vector machine to distinguish between quiet time and quake time periods. The results are promising!

The support vector machine was trained on 7 different earthquake for a set of stations for the quiet time and the quake time. Currently, the first pick is considered as the boundary between the quiet time and quake time periods. A set of random data points- 3 random stations for each quake for both quiet time and quake time was chosen as the training data.

We had a dataset of 2240 data points in all.

Results:

True positive: Correctly predicted quake time as quake time

True Negative: Correctly predicted quiet time as quiet time

False Positive: Incorrectly predicted quiet time as quake time

False Negative: Incorrectly predicted quake time as quiet time

Training Data:

The Support Vector Machine seemed to be able to fit the training data well enough.

True positive: 78.93%

True Negative: 84.19%

False Positive: 15.81%

False Negative: 21.07%

Test Data:

The Support Vector Machine was also able to correctly classify the test data.

True positive: 80.00%

True Negative: 88.57%

False Positive: 11.43%

False Negative: 20.00%

The idea of detecting earthquakes by capturing the differences in the frequency domain properties, seems logical now. We will try changing a few parameters to figure the best possible way for the classification to get optimal results.

More on the Spectrograms

In the last post we discussed how we were able to plot spectrograms using the spectrogram module in ObsPy. A few days ago, we were able to to show some of those plots to Dr. Monica Kohler. Our plots seemed very pixelated as compared to the plots that she normally used. We were looking for a way to plot spectrograms of good resolution and palpable color distribution within different intensities for the Los Angleles Department for Water and Power (LADWP) multi-storeyed building. Following plot can be considered a perfect example for the scenario.


Spectrogram plotted by Ming Hei, a graduate student, for the Millikan Tower, a multistory Building at Caltech. Make note of intense horizontal lines, which refer to harmonics of the building.

Dr. Kohler offered few suggestions that could help us improve our plots-

To apply a 2 pass 2-pole Butter-worth Bandpass filter for 0.1 to 20 Hz before calculating spectrograms
To change the color scheme such that maximum spectral amplitudes appear as bright red (see spectrogram above)

We will try to incorporate these suggestions.

Monday, July 1, 2013

The Spectrograms

Less than a week ago, we were able to show some of our work to Dr. Monica D. Kohler, a Senior Research Fellow in mechanical and civil Engineering, Caltech. She being a seismic expert in the CSN group, was interested in the effects of seismic waves on tall buildings, and whether it was possible to observe oscillations of tall building in the frequency transforms. She suggested the use of Spectrograms: the Frequency Time Distribution of the seismic data from the SAC files.

Fortunately, the Obspy package itself contains 'obspy.imaging.spectrogram' method, which allows to read and plot spectrograms directly from SAC files. This optimized for speed method takes stream and bunch of other arguments(for plotting) as input and plots spectrogram within a second for 15 min SAC data.

A Spectrogram showing activity during Anza (Mar 11, 2013) tremors for all 3 channels. Notice the red and yellow shades mark the beginning of quake, when according to K Sigma

Though some work has to be done if you wish to transfer these plots to Matplotlib.pyplot, or arrange them on Subplots. This because obspy uses its own plotting methods which doesn't plot on a subplot by default.
The workaround is to use 'plt.gca()' to get axis, and then use obspy to plot. Also make sure to give 'show=False' as an argument to spectrogram, to popping up windows at every subplot. A sample code looks for single subplot looks like this:

                    plt.subplot(311)
                    ax_HNN=plt.gca()
                    spectrogram(tr_HNN.data,tr_HNN.stats.sampling_rate,show=False,axes=ax_HNN)

We are planning to plot spectrograms for some CSN sensors in Los Angeles Department of Water & Power (LADWP), to figure some correlation between frequency and respective floor of installation of the sensors.