For focus based study its essential to get rid of all unnecessary context like wonderful design, and other ui effect. Whats necessary only text like the text book.
Wikipedia is one of my most visited website at this time. Though I love the content , i wll love more if its more minimalist..
So there is the solution… I can read the Wikipedia article in the Linux terminal now!… Too minimalist!!!
Also we can save the article as txt format for offline reading. May be I will read it in vs code!!!!
Now back to the program..
We will apply wikipedai api. here
We will request through wikipedia api. in response get the json…
And then prepare the text format with BeautifulSoup library . That’s it!
pip install beautifulsoup4
The complete Code
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse
WIKIPEDIA_API_URL = "https://en.wikipedia.org/w/api.php"
def get_wikipedia_content(title):
params = {
"action": "query",
"format": "json",
"prop": "extracts",
"titles": title,
"explaintext": True
}
response = requests.get(WIKIPEDIA_API_URL, params=params)
data = response.json()
page_id = list(data["query"]["pages"].keys())[0]
content = data["query"]["pages"][page_id]["extract"]
return content
def get_wikipedia_content_format_with_beautifulsoup4(title):
params = {
"action": "query",
"format": "json",
"prop": "extracts",
"titles": title,
}
response = requests.get(WIKIPEDIA_API_URL, params=params)
data = response.json()
page_id = list(data["query"]["pages"].keys())[0]
content = data["query"]["pages"][page_id]["extract"]
return content
# Wikipedia Article URL
url = "https://en.wikipedia.org/wiki/Life"
parsed_url = urlparse(url)
title = parsed_url.path.split('/')[-1]
print(title)
wiki_content = get_wikipedia_content(title)
print(wiki_content)
# Save plain text content to a .txt file
with open("wikipedia_content.txt", "w", encoding="utf-8") as file:
file.write(wiki_content)
print("Plain text content saved")
#with beautifulsoup
wiki_content_page=get_wikipedia_content_format_with_beautifulsoup4(title)
soup = BeautifulSoup(wiki_content_page, 'html.parser')
plain_text = soup.get_text()
print(plain_text)
with open("wikipedia_content_plain_text_beautifulsoup.txt", "w", encoding="utf-8") as file:
file.write(plain_text)
print("Plain text content saved")
You can get the txt directly with API and without BeautifulSoup..
I also released an app for this in snap store
https://snapcraft.io/wiki-txt
You can also try the API in the browser
https://noncommercial.purnorup.com/wiki-txt