View on GitHub

NewsHack 2018

Editorial Tools for Global News

[Home]

API Documentation

Table of Contents


Each team will have access to the API of a simplified version of the SUMMA platform. Teams must use at least one of the SUMMA technologies in their final prototype. They can use as many external libraries and tools, etc. as they like.

Each team will have their own SUMMA server for the duration of the hack. You will receive details of your team’s server at the beginning of day one.

The IP address of your SUMMA server should replace [base] in the examples of this document.

Demo Server

To try the examples on the demo SUMMA server, replace [base] with summa-hackathon.newslabs.tools.bbc.co.uk. Each team will have their own dediacted SUMMA insance for the duration of the hack. Text surrounded by angle brackets (<>) is a placeholder and requires a real value.

Getting started

To test that your SUMMA server is running, perform an HTTP GET to http://[base]:8026/v1/api/mediaItems/ You should receive a JSON array as a response. Don’t worry if the array is empty, this means there are no articles yet.

Notes on Security

Please be aware that the provided SUMMA servers have no authentication mechanism. All requests are http requests, not https. Participants should not upload sensitive materials.

Basic Flow of Data

The basic process for processing data is as follows:

  1. Upload data to API endpoints via HTTP POST
  2. SUMMA platform returns response for transaction ID
  3. Use transaction ID to request result at API endpoint
  4. When the result is ready, SUMMA will respond with the processed result.

The uploaded data will be subject to translation and all other processing operations. Participants cannot submit text for processing by a single technology module (Named Entity Recognition, Summarisation, etc.).

API

The API is key to interacting with the SUMMA platform. It allows teams to upload text data into the SUMMA server and retrieve results.

Available Endpoints

The following endpoints are available:

Endpoint HTTP Method Location Use Link
Upload Article POST http://[base]:8026/v1/api/newsItems/ Upload a text article to SUMMA Upload Article
Retrieve Article GET http://[base]:8026/v1/api/mediaItems/<id> Gets a processed article from SUMMA Retrieve Article
Retrieve Cluster GET http://[base]:8026/v1/api/storylines/<id> Gets similar articles from SUMMA Retrieve Article Cluster
Retrieve Latest Articles GET http://[base]:8026/v1/api/mediaItems/ Gets a list of the latest articles processed. Can be more efficient than frequent polling using the upload article endpoint Retrieve Latest Articles
Reset DELETE http://[base]:8026/v1/api/reset/ Delete all articles, storyline and feeds Reset

Upload Article

Request

  Type Headers URL
Request POST Content-Type: application/json http://[base]:8026/v1/api/newsItems/

Body The body must be a JSON document containing the following:

Field Required? Description
sourceItemTitle Yes Article title
sourceItemMainText Yes Article body
sourceItemLangeCodeGuess Yes Two character language code: en, de, es, ru, ar, lv
feedURL Yes A URL to use an ID, it’s ok if the URL doesn’t actually exist. Each article gets associated with a feed.
sourceItemOriginFeedName Yes A name for the feed
sourceItemIdAtOrigin Yes A unique identifier

Example

{
 "sourceItemTitle": "Colapso en Génova: las 3 diferencias del emblemático puente del lago Maracaibo y el Morandi que se derrumbó en Italia, diseñado por el mismo ingeniero",
 "sourceItemMainText": "El desplome del puente en Génova llevó a muchos a preguntarse por la situación de su gemelo sobre el lago Maracaibo...",
 "sourceItemLangeCodeGuess": "es",
 "feedURL": "http://hack.summa-project.eu/team-x",
 "sourceItemOriginFeedName": "Team X",
 "sourceItemIdAtOrigin": "http://hackday.summa-project.eu/team-x/123"
}

Response

Body The response is a JSON object. The main properties are:

Field Example value Description
id 1c350e2f-f747-44fb-8a8d-132f6b1d3a8f Identifier (string) to use to retrieve translated text, auto-summary, entities, topics and cluster information. See Retrieve Article section below for details.

Example

{
 "customMetadata": {},
 "feedId": "1c350e2f-f747-44fb-8a8d-132f6b1d3a8f",
 "feedURL": "http://hack.summa-project.eu/team-x",
 "id": "2110ec4a-5c6b-48b1-9914-6fe98e51f2dc",
 "sourceItemIdAtOrigin": "http://hackday.summa-project.eu/team-x/123",
 "sourceItemLangeCodeGuess": "es",
 "sourceItemMainText": "El desplome del puente en Génova llevó a muchos a preguntarse por la situación de su gemelo sobre el lago Maracaibo...",
 "sourceItemOriginFeedName": "Team X",
 "sourceItemTitle": "Colapso en Génova: las 3 diferencias del emblemático puente del lago Maracaibo y el Morandi que se derrumbó en Italia, diseñado por el mismo ingeniero",
 // Additional fields omitted
 "timeAdded": "2018-08-22T15:24:32.663Z"
}

Retrieve Article

Request

  Type Headers URL
Request GET   http://[base]:8026/v1/api/mediaItems/<id>
Placeholder Example value Description
:id 2110ec4a-5c6b-48b1-9914-6fe98e51f2dc The identifier. Use Upload Article to get the identifier.

Response

The response is a JSON object. The main properties are:

Name Example value Description
detectedTopics [ ["videos", 0.09571], ["top stories", 0.0938] ] Topics (array)
mainText.english The collapse of the bridge in Genoa led many to wonder about the situation... English translation (string)
namedEntities.entities { "m.01ncqr": { "baseForm": "Lake Maracaibo", "currlangForm": "Lake Maracaibo", "id": "m.01ncqr", "type": "places" } } Named entities (object)
namedEntities.mentionsIn "mentionsIn": { "mainText": { "m.01ncqr": [{ "endPosition": 109, "startPosition": 95, "text": "Lake Maracaibo" }] } } Object
storyId 1 Identifier (integer) of the story cluster. Each cluster is a set of similar stories.
summary [ "The collapse of the bridge in Genoa led many to..." ] Array of strings. Each string is a “bullet point”.
title.english Collapse in Genoa: the three differences from the flagship Lake Maracaibo Lake and Morandi, which collapsed in Italy, designed by the engineer himself. English translation (string)

Retrieve article cluster

Get storyId from the Retrieve Article described above.

Request

  Type Headers URL
Request GET   http://[base]:8026/v1/api/storylines/<id>
Placeholder Example value Description
:id 1 Story identifier

Response

The response in a JSON object. The main properties are:

Name Example value Description
highlightItems [ { "highlight": "PM: Lebanon sees no reason for the Syrian refugees to...", "sentiment": null } ] Array of highlighted objects
label Collapse in Genoa: the three differences from the flagship Lake Maracaibo Lake and Morandi, which collapsed in Italy, designed by the engineer himself. Label (string)
newsItems { "00956029-0194-4498-a70c-a684172f49df": { "id": "00956029-0194-4498-a70c-a684172f49df", "title": "DW English Live Stream Chunk", ... } Object containing articles in the cluster

Retrieve latest articles

Request

  Type Headers URL
Request GET   http://[base]:8026/v1/api/mediaItems/

Querystring parameters:

Name Required? Description Example
limit No Maximum number of items to return. The default is 20. 50

Response

The response is a JSON array. Each article in the array has fields in Retrieve Article.

Reset

Request

  Type Headers URL
Request DELETE   http://[base]:8026/v1/api/reset/

Response

The response is a JSON object. The properties are:

Name Example value
mediaItems {"deleted":22, "errors":0, "inserted":0, "replaced":0, "skipped":0, "unchanged":0}
feeds {"deleted":1, "errors":0, "inserted":0, "replaced":0, "skipped":0, "unchanged":0}
storylines {"deleted":5, "errors":0, "inserted":0, "replaced":0, "skipped":0, "unchanged":0}

Developer Resources

Visualisation Resources

Collections

General purpose

Multivariate data

Text visualization

Networks


Feedback

Problems, suggestions, missing information? Contact me before the day at andrew.secker@bbc.co.uk or find me at the venue.