We have a few extremely high volume endpoints - Golang GAE handlers, and thus far we have came up with quite a few ways to query the data being collected - yet none was satisfactory.
Streaming live traffic to BigQuery is very appealing for us, because right now we have quite a process to achieve the goal of being able to query our data. We need to log it, export it, and then load it onto an analytics DB. At every step of the chains there are things that could be better - logging to datastore is not cheap, exporting is not really working well (even after refactoring the code to use cursors, as we do in our open source GAE Remote utility - fails when there are about 300,000 records to download), and uploading the data onto an analytics DB defeats the purpose of using PaaS.
So we've decided to give BigQuery a shot. But to our surprise we couldn't find an example in the official GAE Go docs for streaming data into BigQuery from Golang. There is a client to be found but it's not a GAE client (it uses OAuth, which is irrelevant for the GAE backend).
So we've started to write a nice little go client for big query on GAE. it currently only supports connecting and inserting rows, but there's more coming up. Feel free to fork and add stuff!
go-gae-bigquery
A nice little package to abstract usage of the BigQuery service on GAE. Currently supports only inserting rows (queries coming soon, feel free to fork and add stuff!)
usage
Import the package:
import (
"github.com/streamrail/go-gae-bigquery"
)
and go get it using the goapp gae command:
goapp get "github.com/streamrail/go-gae-bigquery"
The package is now imported under the "gobq" namespace.
import (
"github.com/streamrail/go-gae-bigquery"
)
goapp get "github.com/streamrail/go-gae-bigquery"
example
Running the example:
git clone https://github.com/StreamRail/go-gae-bigquery.git
cd go-gae-bigquery
cd example-batch
goapp get "github.com/streamrail/go-gae-bigquery"
goapp serve
The example may be found at examples-batch/example.go. The part you want to look at is the Track function:
func Track(w http.ResponseWriter, r *http.Request) {
c := appengine.NewContext(r)
// create instance of big query client
if client, err := gobq.NewClient(&c); err != nil {
c.Errorf(err.Error())
} else {
// get some data to write
rowData := GetRowData(r)
// append the row to the buffer
if err := buff.Append(rowData); err != nil {
c.Errorf(err.Error())
}
c.Infof("buffered rows: %d\n", buff.Length())
// if the buffer is full, flush it into big query.
// the flushing resets the buffer and you can accumulate rows again
if buff.IsFull() {
if err := client.InsertRows(*projectID, *datasetID, *tableID, buff.Flush()); err != nil {
c.Errorf(err.Error())
} else {
c.Infof("inserted rows: %d", buff.Length())
}
}
}
}
git clone https://github.com/StreamRail/go-gae-bigquery.git
cd go-gae-bigquery
cd example-batch
goapp get "github.com/streamrail/go-gae-bigquery"
goapp serve
func Track(w http.ResponseWriter, r *http.Request) {
c := appengine.NewContext(r)
// create instance of big query client
if client, err := gobq.NewClient(&c); err != nil {
c.Errorf(err.Error())
} else {
// get some data to write
rowData := GetRowData(r)
// append the row to the buffer
if err := buff.Append(rowData); err != nil {
c.Errorf(err.Error())
}
c.Infof("buffered rows: %d\n", buff.Length())
// if the buffer is full, flush it into big query.
// the flushing resets the buffer and you can accumulate rows again
if buff.IsFull() {
if err := client.InsertRows(*projectID, *datasetID, *tableID, buff.Flush()); err != nil {
c.Errorf(err.Error())
} else {
c.Infof("inserted rows: %d", buff.Length())
}
}
}
}
No comments:
Post a Comment