> For the complete documentation index, see [llms.txt](https://docs.bird.com/taxi/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.bird.com/taxi/feeds-and-amp-link-tracking/feeds/extracting-data-from-your-html-feed.md).

# Extracting Data from your HTML Feed

### Extracting HTML

Feeds can be used to get content from ordinary HTML web pages. The fetched page can be accessed through the liquid doc object, as follows:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-05428b39f2c021dc0421842be71c05c3aaaeb67b%2Fimage27.png?alt=media)

On the test tab this looks like:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-a4b1b05da71d8add13b222ddc05c4a3a1dd55723%2Fimage11.png?alt=media)

This returns the entire HTML document. To extract content from the HTML document there are 3 helpers that can be used:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-011702e4707e17efb3448565c48651d4c6614f9f%2FScreenshot%2B2018-12-14%2Bat%2B13.40.04.png?alt=media)

### Tag Stripping:

Liquid’s standard ‘strip\_html’ filter can be useful when working with HTML documents: [https://shopify.github.io/liquid/filters/strip\_html](https://www.google.com/url?q=https://shopify.github.io/liquid/filters/strip_html\&sa=D\&ust=1544786297207000)

### HTML

In this example we will get the Biography Text from the [Taxi for Email Twitter](https://twitter.com/taxiforemail).

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-558e02561920c34e62b30687c13b3631263b1e03%2Fimage22.png?alt=media)

Feed set up

* Set the feed url to [https://twitter.com/taxiforemail](https://www.google.com/url?q=https://twitter.com/taxiforemail\&sa=D\&ust=1544786297209000)
* Set the method to ‘GET’
* Set the data type to ‘HTML’

Data Extraction

First open the twitter page in a browser, then using the ‘inspect’ tool in the browser find the element we’re looking for:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-a7b74caafd57663fc442ee49d93db4e35c6c2ee9%2Fimage19.png?alt=media)

We can see that the text is in a \<p> tag with the class ‘ProfileHeaderCard-bio’. We can use this to make the following CSS selector:

p.ProfileHeaderCard-bio

We can get the content of this P through the doc object, using the find\_first\_by\_css filter:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-d8eeb634a89207a3a50493771cc0b4f1b77d63b0%2Fimage25.png?alt=media)

This gives the following result:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-23576a5cef5bd399da80b02ce8b84ab49b277974%2Fimage26.png?alt=media)

If we want just the text from this, without html tags, we can add the strip\_html filter:

\#{{doc | find\_first\_by\_css: 'p.ProfileHeaderCard-bio' | strip\_html }}

Which gives just the text:

![](https://2516523503-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMq0EQG9Ff46ZMsrIBcVn%2Fuploads%2Fgit-blob-5c170b979005e1e55036291fc75bc37962c37918%2Fimage14.png?alt=media)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.bird.com/taxi/feeds-and-amp-link-tracking/feeds/extracting-data-from-your-html-feed.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
