# Extracting Data from your HTML Feed

### Extracting HTML

Feeds can be used to get content from ordinary HTML web pages. The fetched page can be accessed through the liquid doc object, as follows:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-05428b39f2c021dc0421842be71c05c3aaaeb67b%2Fimage27.png?alt=media)

On the test tab this looks like:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-a4b1b05da71d8add13b222ddc05c4a3a1dd55723%2Fimage11.png?alt=media)

This returns the entire HTML document. To extract content from the HTML document there are 3 helpers that can be used:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-011702e4707e17efb3448565c48651d4c6614f9f%2FScreenshot%2B2018-12-14%2Bat%2B13.40.04.png?alt=media)

### Tag Stripping:

Liquid’s standard ‘strip\_html’ filter can be useful when working with HTML documents: [https://shopify.github.io/liquid/filters/strip\_html](https://www.google.com/url?q=https://shopify.github.io/liquid/filters/strip_html\&sa=D\&ust=1544786297207000)

### HTML

In this example we will get the Biography Text from the [Taxi for Email Twitter](https://twitter.com/taxiforemail).

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-558e02561920c34e62b30687c13b3631263b1e03%2Fimage22.png?alt=media)

Feed set up

* Set the feed url to [https://twitter.com/taxiforemail](https://www.google.com/url?q=https://twitter.com/taxiforemail\&sa=D\&ust=1544786297209000)
* Set the method to ‘GET’
* Set the data type to ‘HTML’

Data Extraction

First open the twitter page in a browser, then using the ‘inspect’ tool in the browser find the element we’re looking for:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-a7b74caafd57663fc442ee49d93db4e35c6c2ee9%2Fimage19.png?alt=media)

We can see that the text is in a \<p> tag with the class ‘ProfileHeaderCard-bio’. We can use this to make the following CSS selector:

p.ProfileHeaderCard-bio

We can get the content of this P through the doc object, using the find\_first\_by\_css filter:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-d8eeb634a89207a3a50493771cc0b4f1b77d63b0%2Fimage25.png?alt=media)

This gives the following result:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-23576a5cef5bd399da80b02ce8b84ab49b277974%2Fimage26.png?alt=media)

If we want just the text from this, without html tags, we can add the strip\_html filter:

\#{{doc | find\_first\_by\_css: 'p.ProfileHeaderCard-bio' | strip\_html }}

Which gives just the text:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-5c170b979005e1e55036291fc75bc37962c37918%2Fimage14.png?alt=media)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bird.com/taxi/feeds-and-amp-link-tracking/feeds/extracting-data-from-your-html-feed.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
