Save wikipedia article in txt format with python and wikipedia API

For focus based study its essential to get rid of all unnecessary context like wonderful design, and other ui effect. Whats necessary only text like the text book.

Wikipedia is one of my most visited website at this time. Though I love the content , i wll love more if its more minimalist..

So there is the solution... I can read the Wikipedia article in the Linux terminal now!... Too minimalist!!!

Also we can save the article as txt format for offline reading. May be I will read it in vs code!!!!

Now back to the program..

We will apply wikipedai api. here

We will request through wikipedia api. in response get the json...

And then prepare the text format with BeautifulSoup library . That's it!

pip install beautifulsoup4

The complete Code



import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse


WIKIPEDIA_API_URL = "https://en.wikipedia.org/w/api.php"

def get_wikipedia_content(title):
    params = {
        "action": "query",
        "format": "json",
        "prop": "extracts",
        "titles": title,
        "explaintext": True 
    }

    response = requests.get(WIKIPEDIA_API_URL, params=params)
    data = response.json()
    page_id = list(data["query"]["pages"].keys())[0]
    content = data["query"]["pages"][page_id]["extract"]

    return content

def get_wikipedia_content_format_with_beautifulsoup4(title):
    params = {
        "action": "query",
        "format": "json",
        "prop": "extracts",
        "titles": title,
    }

    response = requests.get(WIKIPEDIA_API_URL, params=params)
    data = response.json()
    page_id = list(data["query"]["pages"].keys())[0]
    content = data["query"]["pages"][page_id]["extract"]

    return content


# Wikipedia Article URL
url = "https://en.wikipedia.org/wiki/Life"
parsed_url = urlparse(url)
title = parsed_url.path.split('/')[-1]
print(title)



wiki_content = get_wikipedia_content(title)
print(wiki_content)


# Save plain text content to a .txt file
with open("wikipedia_content.txt", "w", encoding="utf-8") as file:
    file.write(wiki_content)

print("Plain text content saved")

#with beautifulsoup
wiki_content_page=get_wikipedia_content_format_with_beautifulsoup4(title)

soup = BeautifulSoup(wiki_content_page, 'html.parser')
plain_text = soup.get_text()

print(plain_text)
with open("wikipedia_content_plain_text_beautifulsoup.txt", "w", encoding="utf-8") as file:
    file.write(plain_text)

print("Plain text content saved")

You can get the txt directly with API and without BeautifulSoup..

I also released an app for this in snap store
https://snapcraft.io/wiki-txt

You can also try the API in the browser
https://noncommercial.purnorup.com/wiki-txt