Simple Watson SDK for the Go programming language.
- Lightweight
- Native Go implementation. No C-bindings, just pure Go
So far only Speech To Text and Text To Speech functionalities have been implemented.
- Go 1.2 or higher
watson-go-sdk
is available as a normal Go package with this Github branch. Just include it in your dependencies (imports) on your code:
import "github.com/mediawen/watson-go-sdk"
-
Make sure Git is installed on your machine and in your system's
PATH
. -
Install the package to your $GOPATH with the go tool from shell:
$ go get "github.com/mediawen/watson-go-sdk"
Usage of this SDK is simple.
Go to IBM/Bluemix.
Once you allocate a Watson service, from the dashboard, you can go to the service settings, and get the service credentials where you can extract username and password required to initialize this SDK :
package main
import (
"github.com/mediawen/watson-go-sdk"
)
func main() {
w := watson.New("<username>", "<password>")
Replace 'username' and 'password' from the service credentials.
-
The Speech To Text service allows you to transcribe audio data to text. You have to specify the language in which the audio is spoken. For each language, the service supports two models:
- The broadband model with audio that is sampled at greater than or equal to 16 KHz.
- The narrowband model with audio that is derived from the telephone, which is typically recorded at 8 KHz.
Here is the list of Models in September, 2015:
en-US 8000 => en-US_NarrowbandModel ja-JP 16000 => ja-JP_BroadbandModel es-ES 16000 => es-ES_BroadbandModel ja-JP 8000 => ja-JP_NarrowbandModel es-ES 8000 => es-ES_NarrowbandModel en-US 16000 => en-US_BroadbandModel
-
Get list of models and languages:
-
Model Type
type Model struct { Rate int // Minimum Sampling Rate Name string // Model Name Lang string // Model Language Desc string // Model Description } type Models struct { Models []Model // List of Model(s) }
-
Example
package main import ( "fmt" "log" "github.com/mediawen/watson-go-sdk" ) func main() { w := watson.New("foo", "shhhht") ml, err := w.GetModels() if err != nil { log.Fatal(err) } for _, m := range ml.Models { fmt.Printf("%s %-8d=> %s\n", m.Lang, m.Rate, m.Name) } }
-
-
Transcribe an audio file:
-
Text Type
type Word struct { Token string Begin float64 End float64 Confidence float64 } type Text struct { Words []Word }
-
Example
package main import ( "fmt" "log" "os" "github.com/mediawen/watson-go-sdk" ) func main() { w := watson.New(cfg.User, cfg.Pass) is, err := os.Open("audio.wav") if err != nil { log.Fatal(err) } defer is.Close() tt, err := w.Recognize(is, "en-US_BroadbandModel", "wav") if err != nil { log.Fatal(err) } for _, w := range tt.Words { fmt.Printf("%v\n", w) } }
-
- The IBM Text to Speech service is designed for streaming low-latency synthesis of audio from written text. The service synthesizes natural-sounding speech from the text in a variety of languages and voices that speak with appropriate cadence and intonation. It is the inverse of the IBM Speech to Text automatic speech-recognition service.
To synthesize text, you have to specify the input text, the audio format and the voice.
Here is the list of Voices in September, 2015.
```
en-US male => en-US_MichaelVoice
en-US female => en-US_AllisonVoice
fr-FR female => fr-FR_ReneeVoice
it-IT female => it-IT_FrancescaVoice
es-ES female => es-ES_LauraVoice
de-DE female => de-DE_BirgitVoice
es-ES male => es-ES_EnriqueVoice
de-DE male => de-DE_DieterVoice
en-US female => en-US_LisaVoice
en-GB female => en-GB_KateVoice
es-US female => es-US_SofiaVoice
```
-
Get list of voices:
-
Model Type
type Voice struct { Name string // Name of the voice Lang string // Lang of the voice Gender string // Gender of the voice } type Voices struct { Voices []Voice // List of voices }
-
Example
package main import ( "fmt" "log" "github.com/mediawen/watson-go-sdk" ) func main() { w := watson.New("foo", "shhhht") vl, err := w.GetVoices() if err != nil { log.Fatal(err) } for _, v := range vl.Voices { fmt.Printf("%s %-8s => %s\n", v.Lang, v.Gender, v.Name) } }
-
-
Synthesize text to an audio file:
-
Example
package main import ( "fmt" "log" "os" "github.com/mediawen/watson-go-sdk" ) func main() { w := watson.New(cfg.User, cfg.Pass) a, err := w.Synthesize(text, voice, ext) if err != nil { log.Fatal(err) } defer a.Close() f, err := os.Create(out) if err != nil { log.Fatal(err) } defer f.Close() n, err := io.Copy(f, a) if err != nil { log.Fatal(err) } fmt.Printf("%s: %d bytes written\n", out, n) }
-
- Add live support (through websocket)
- Fix issues reported on this repository
- Add other Watson functionalities:
- IBM Language Translation
- IBM Natural Language Classifier
- IBM Concept Expansion
- IBM Message Resonance
- IBM Personality Insights
- IBM Question and Answer
- IBM Relationship Extraction
- IBM Retrieve and Rank
- IBM Tradeoff Analytics
- IBM Visual Recognition
-
Text To Speech API demo.
-
Speech To Text API demo.
-
At MediaWen International, we use these technologies to enhance STVHub, our platform for closed captioning, subtitling, and automatic dubbing.
By example, we generated the voice over (or Automatic Dubbing) on a video of the French Minister of Foreign Affairs anouncing the Climate Change Conference COP21 hosted in Paris, December 2015.
Just watch it and listen in Spanish and English IBM/Watson Speech Synthesis brought to you on video by STVHub.
- Philippe Anel - CTO of Mediawen
The file cookie.go has been picked from go net/http package. It's under the following BSD License:
Copyright (c) 2012 The Go Authors. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
All the rest of the code is under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.