Web scraper to get news article content

user profile photouser profile photouser profile photo1875 developers have joined this project.

Discussion

Ask questions, discuss different approaches, and share your thoughts about this project.

Anonymous User
Adder AllzAdder Allz

Dress Clothes For Men Who Workout & Bodybuilders Mens satin shirt. Shop Our Mens Tailored Fit Dress Shirts, Slim Fit Dress Pants, Suits, & Mens Formal Wear By Gerardo Collection

Goli SriramGoli Sriram

may i know if i can get any certificate after completing this course

Adder AllzAdder Allz

At TwoBirds Bridal, we consider shopping for your bridal dress as an unforgettable and enjoyable experience plus size wedding dresses. We do not want it to be a stressful experience, which’s why our collection is very all-around.

Removed UserRemoved User

Our well established business has been built on the foundation More Info of providing 100% customer satisfaction.

Removed UserRemoved User

I’m glad to locate this say very beneficial for me, because it consists of lot of are seeking for. I constantly choose to admission freelance the man or woman content and this case i discovered in you proclaim. thank you for sharing.

Zuko FernandoZuko Fernando

Hi, When I tried to scrape some of the news sites( ex : washingtonpost.com ) with Scrapy, It gave me a 403 error. I think I can avoid that by sending request headers with the request. I have tried sending a custom user-agent header with the request. That also didn't work. It seems like I have to send user-agent header of the browser or something like postman to avoid the error. My concern is, is it legally ok to send headers like that?

vikz1071vikz1071

Hello everyone,

I am new to python and I have a question. So, I have scraped a news website for article content. Below is the o/p when I printed the variable.

[

South Australia's Independent Commissioner Against Corruption has issued a warning to politicians and their staff to report corruption or misconduct to her office, saying she has been disappointed to learn of serious allegations from the media.

,

"It is disappointing to learn of the possible existence of serious allegations of misconduct in public administration from the media," Ann Vanstone QC said.

,

In a statement, Ms Vanstone warned public officers they have a legal obligation to report corruption and serious misconduct and maladministration.

,

"I take this opportunity to remind all parliamentarians, parliamentary staffers and those working in electorate offices that this obligation extends to you," she said.

,

As you may notice, the tag

appears before the beginning of a paragraph. Can someone suggest how I can get rid of these tags please.

vikz1071vikz1071

Python code [for reference]

html = BeautifulSoup(ABCnews.text,'html.parser')
lis = ABCnews.article_content_lis(ABCnews_html)

def article_content_lis(html):
if html:
content = html.find('div', id = 'body')
content_p = content.find_all('div')[1]
print(content_p)

Sylvia ShenSylvia Shen

Hi vikz1017,
I think BeautifulSoup supports the usages of get text without HTML tags.
You can try to print the content like below:

content = html.find('div', id = 'body')
content_p = content.find_all('div')[1]

print(content_p.text)

// or

print(content_p.get_text())

https://stackoverflow.com/questions/9662346/python-code-to-remove-html-tags-from-a-string
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text

btw, using code block in you comments would let others to read your code easily 👍🏻

peter zhuangpeter zhuang

very interesting to implement

Abideen BelloAbideen Bello

Yes

許挺誼許挺誼

Any recommended Python modules for one to study before starting this project?

許挺誼許挺誼

Sorry. I found there's already a Suggested Implementation part in the project description for questions like this.

Abideen BelloAbideen Bello

This is definitely a good place to start.

Hulya KarakayaHulya Karakaya

this is the first time I needed to learn web scraping!

SoniaSonia

Looks like a good place to start

N LN L

I know a little python and how to create a Web Scraper I think it's fun

Interested in this project?

Shorten your learning curve with on-demand programming help

The awesome set of verified mentors will provide guidance and mentoring help when you are stuck.

Suresh Atta

  • Post request free
  • First 15 mins free
Shorten your learning curve with on-demand programming help

Browse more projects

More coming soon...