Import content from Propublica API with Migrate API

Propublica has APIs for USA congressional information such as candidate profiles and voting records.

We can use the Json parser in Migrate Plus module read the API.

First, we'll need to sign up for an api key, and pass that in the X-API-Key header when we make requests.

Take a look at the json example module to see how they set up the migration yaml file.

You can copy this into the migrations directory in your custom module and modify it there. You may need to clear the cache in between changes. Alternatively, you can put it in your config/optional directory or paste it into the add config form. You may need to reference other tutorials to understand the various ways to supply migration config.

I named the file migrate_plus.migration.propublica_congress.yml

Here is the top of the file:

id: propublica_congress
label: JSON feed of congress members.
migration_group: Propublica
migration_tags:
  - propublica congress
source:
  plugin: url
  data_fetcher_plugin: http
  data_parser_plugin: json
  headers:
    X-API-Key: YOUR_API_KEY
  urls:
    - 'https://api.propublica.org/congress/v1/116/senate/members.json'
  item_selector: /results/0/members

Note that Migrate Plus has fetcher and parser plugins for the url source plugin. Using the http fetcher and json parser, we just need to put in the api key, endpoint url, and selector. Replace YOUR_API_KEY with your api key.

The item selector is an xpath style syntax to reference the items in the results object.

I hit a bug with the 0 in the selector and had to hack the parser plugin.

Create a content type to import into, and set up fields that you want to use.

To test things, I created a Representative content type and added fields for the propublica id, position title, and party affiliation.

You can use an application like Postman to test the API and see what the results look like. Here's a record from the endpoint for senate members in the 116th congression (sp?).

{
   "status":"OK",
   "copyright":" Copyright (c) 2021 Pro Publica Inc. All Rights Reserved.",
   "results":[
      {
         "congress": "116",
         "chamber": "Senate",
         
         
         "num_results": 102,
         "offset": 0,
         "members": [
              {
                 "id": "A000360",
                 "title": "Senator, 2nd Class",
                 "short_title": "Sen.",
                 "api_uri":"https://api.propublica.org/congress/v1/members/A000360.json",
                 "first_name": "Lamar",
                 "middle_name": null,
                 "last_name": "Alexander",
                 "suffix": null,
                 "date_of_birth": "1940-07-03",
                 "gender": "M",
                 "party": "R",
                 "leadership_role": null,
                 "twitter_account": "SenAlexander",
                 "facebook_account": "senatorlamaralexander",
                 "youtube_account": "lamaralexander",
                 "govtrack_id": "300002",
                 "cspan_id": "5",
                 "votesmart_id": "15691",
                 "icpsr_id": "40304",
                 "crp_id": "N00009888",
                 "google_entity_id": "/m/01rbs3",
                 "fec_candidate_id": "S2TN00058",
                 "url": "https://www.alexander.senate.gov/public",
                 "rss_url": "https://www.alexander.senate.gov/public/?a=RSS.Feed",
                 "contact_form": "http://www.alexander.senate.gov/public/index.cfm?p=Email",
                 "in_office": false,
                 "cook_pvi": null,
                 "dw_nominate": 0.324,
                 "ideal_point": null,
                 "seniority": "17",
                 "next_election": "2020",
                 "total_votes": 717,
                 "missed_votes": 133,
                 "total_present": 0,
                 "last_updated": "2020-12-30 19:01:18 -0500",
                 "ocd_id": "ocd-division/country:us/state:tn",
                 "office": "455 Dirksen Senate Office Building",
                 "phone": "202-224-4944",
                 "fax": "202-228-3398",
                 "state": "TN",
                 "senate_class": "2",
                 "state_rank": "senior",
                 "lis_id": "S289"
                 ,"missed_votes_pct": 18.55,
                 "votes_with_party_pct": 96.55,
                 "votes_against_party_pct": 3.45
               },
fields:
    -
      name: propublica_id
      label: 'Propublica id'
      selector: id
    -
      name: position_title
      label: 'Position title'
      selector: title
    -
      name: first_name
      label: 'First name'
      selector: first_name
    -
      name: middle_name
      label: 'Middle name'
      selector: middle_name
    -
      name: last_name
      label: 'Last name'
      selector: last_name
    -
      name: party
      label: 'Party'
      selector: party
  ids:
    propublica_id:
      type: string
process:
  type:
    plugin: default_value
    default_value: representative
  title:
    plugin: concat
    source:
      - first_name
      - middle_name
      - last_name
    delimiter: ' '
  field_propublica_id: propublica_id
  field_position_title: position_title
  field_party: party
  sticky:
    plugin: default_value
    default_value: 0
  uid:
    plugin: default_value
    default_value: 0
destination:
  plugin: 'entity:node'
migration_dependencies: {  }

Here I am naming fields in the API I want to use, and in the process section, mapping to the fields to store them in. The node title is concatenated from the first, middle, and last names.

Now I can run the migration using drush with Migrate Tools installed.

Propublica Congress Drupal Migration

Tags
Migration