For focus based study its essential to get rid of all unnecessary context like wonderful design, and other ui effect. Whats necessary only text like the text book.
Wikipedia is one of my most visited website at this time. Though I love the content , i wll love more if its more minimalist..
So there is the solution… I can read the Wikipedia article in the Linux terminal now!… Too minimalist!!!
Also we can save the article as txt format for offline reading. May be I will read it in vs code!!!! Other thing is I recently purchased a mp4 player which supports e-book only in txt format. So I can real article also from here!
Now back to the program..
We will apply wikipedai api. here
We will request through wikipedia api. in response get the json…
And then prepare the text format with BeautifulSoup library . That’s it!
pip install beautifulsoup4
The complete Code
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse
WIKIPEDIA_API_URL = "https://en.wikipedia.org/w/api.php"
def get_wikipedia_content(title):
params = {
"action": "query",
"format": "json",
"prop": "extracts",
"titles": title,
"explaintext": True # Fetch plaintext content
}
response = requests.get(WIKIPEDIA_API_URL, params=params)
data = response.json()
page_id = list(data["query"]["pages"].keys())[0]
content = data["query"]["pages"][page_id]["extract"]
return content
# Wikipedia Article URL
url = "https://en.wikipedia.org/wiki/Life"
parsed_url = urlparse(url)
title = parsed_url.path.split('/')[-1]
print(title)
wiki_content = get_wikipedia_content(title)
soup = BeautifulSoup(wiki_content, 'html.parser')
plain_text = soup.get_text()
print(plain_text)
# Save plain text content to a .txt file
with open("wikipedia_content.txt", "w", encoding="utf-8") as file:
file.write(plain_text)
print("Plain text content saved")