Upload Data to Internet Sourve for R

Analysing or using data without software is incredibly cumbersome if not incommunicable.Here nosotros evidence you lot how you tin can import data from the web into a tool called R. Reasons why R has become so pop, and continues to abound, are that it's free, open source, with state-of-the-art practices and a fantastic community.

Information on the web comes in several modes, for example:

  • files that yous tin download
  • APIs
  • content such as HTML tables
  • custom data browsers
  • and more.

Analysing or using data without software is incredibly cumbersome if not impossible.Here we show you how you can import information from the web into a tool chosen R. Reasons why R has become so popular, and continues to grow, are that information technology'southward free, open source, with state-of-the-art practices and a fantastic community.

R, and its IDE RStudio, is a statistical software and data analysis environment. You tin find a quick interactive tutorial on Code School or well-designed courses on DataCamp. If you haven't installed R, y'all can paste and try the lawmaking at R-fiddle.

Comma separated values (CSV)

Reading a CSV-file from an URL could not exist simpler. Hither are the number of police officers in Scotland over fourth dimension.

          read.csv("http://world wide web.quandl.com/api/v1/datasets/EUROSTAT/CRIM_PLCE_42.csv")                  

And yet information technology is not guaranteed that this works. Why? Many CSVs don't follow a minimal standard. For example, the first row of a CSV file should be a header row, but some data has a header row in a later line. We use theskip pick.

          read.csv("http://www.royalwolverhamptonhospitals.nhs.united kingdom/files/mth%206%20september%202013%20(three).csv", skip = 2)                  

Unfortunately,read.csv() does non cope well with SSL, that is https connections. An alternative employsdownload.file, see below.

          # Fail read.csv("https://raw.github.com/sciruela/Happiness-Salaries/principal/data.csv")  # Win read.url <- function(url, ...){   tmpFile <- tempfile()   download.file(url, destfile = tmpFile, method = "coil")   url.data <- read.csv(tmpFile, ...)   return(url.data) } read.url("https://raw.github.com/sciruela/Happiness-Salaries/chief/data.csv")                  

What gifts did David Cameron receive in May-June 2013?

The Uk government publishes data near gifts David Cameron receives and what happens with them. Nosotros will use it every bit another example.

The data is behind a secure connection, so we use ourread.url function. Yet it still produces an mistake. The reason is a £ symbol in the header row.

          read.url("https://www.gov.united kingdom of great britain and northern ireland/government/uploads/system/uploads/attachment_data/file/246663/pmgiftsreceivedaprjun13.csv")                  

A faster and more flexible tool isfread from the data.table package (come across the documentation).

          install.packages("data.table") library(data.tabular array)  read.url <- part(url, ...){   tmpFile <- tempfile()   download.file(url, destfile = tmpFile, method = "whorl")   url.data <- fread(tmpFile, ...)   render(url.data) }  read.url("https://www.gov.uk/authorities/uploads/organisation/uploads/attachment_data/file/246663/pmgiftsreceivedaprjun13.csv")                  

And the results are:

Date received From Gift Value Result
May-13 President of UAE Model boat Over limit Held by Section
Jun-13 Tony Pontone, Albemarle Gallery Art work Over limit Held by Department
Jun-13 President of the Us Jewellery Over limit Held past Section
Jun-xiii President of Pakistan Rug Over limit Held by Department
Jun-xiii President of Republic of kazakhstan Medals & stamp anthology Over limit Held by Department

A useful trick is to only read a few lines. This makes especially sense when y'all take a big dataset similar the Country Registry'due south Price Paid Data (several GB in its complete form).

          read.csv("http://publicdata.landregistry.gov.united kingdom of great britain and northern ireland/market-trend-data/price-paid-information/a/pp-complete.csv", nrow = x)                  

APIs

R's community has built wrapper packages for many APIs. For example, the Earth Depository financial institution Evolution indicators are bachelor in R. A quick case with Google'southward Ngram Viewer is below.

What is more pop: line charts or line graphs?line

          # Install the parcel install.packages(c("ngramr", "ggplot2"))  # Load it into R library(ngramr) library(ggplot2)  # Example-insensitive search lines <- ngrami(c("line chart", "line graph"), year_start = 1913) ggplot(lines, aes(Twelvemonth, Frequency, color = Phrase)) + theme_minimal() + geom_line(lwd = 1)                  

ROpenSci collected an extensive list of R packages that deal with APIs. It includes Twitter, the Guardian, Amazon Mechanical Turk and many more than.

Scraping

Scraping is an art in itself and is mayhap all-time left in the hands of experts such as our friends at ScraperWiki. Withal, R has support (packages, no surprise hither) for popular tools. Worth mentioning is RCurl and XML.

Xiao Nan made a useful table of the R scraping toolkit.

packagesSource: Xiao Nan, @road2stat

Lastly, I wished that I'd take known well-nigh parallelisation options before… For instance,getURIAsynchronous from RCurl.

I also wrote a tutorial on how to import a HTML table into R.

What to practise next

Using a tool similar R has another smashing reward: unlike manually downloading a file, you tin can easily re-use and share your work. Having some R code instead of an Excel file means your assay is reproducible and you may be able to adapt information technology for future projects or if an updated dataset was released.

If you demand help you lot can find support via stackoverflow and the R-help mailing list. If you're looking for information, scan a catalogue (e.thou. information.gov.uk), use a web search engine or ask me on Twitter.

keithgress1977.blogspot.com

Source: https://theodi.org/article/how-to-use-r-to-access-data-on-the-web/

0 Response to "Upload Data to Internet Sourve for R"

Enviar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel