"""
*Don't edit the README.md or html files directly. Your edits will be lost.
They are generated by the iq.py script from the modules docstring. Edit 
the text there in the header of the file.
HTML and MD files will be generated automatically when you run the script*.

# Wealth and Intellect

### Investigation on the Relationship Between Intellectual and Academic Performance with Medium 
and Average Wealth in International Comparison

## Motivation:
The motivation for this project was purely self-educational. 
The author aims to utilize and practice recently acquired skills in programming with Python, 
Pandas, and data visualization, in particular in a pilot project. 
While the data comes from reliable sources, mostly scraped from 
the English-language Wikipedia, there is no claim to any scientific pretense or rigor. 
The data sources and analysis software are provided without any legal restrictions.

## Data Sources:
The following data sources were combined:

1. Data on average and median wealth in nearly all countries worldwide.
2. Data on the average intelligence quotient (IQ) in most countries worldwide, sourced from 
[Lynn and Becker](https://www.ulsterinstitute.org/ebook/THE%20INTELLIGENCE%20OF%20NATIONS%20-%20Richard%20Lynn,%20David%20Becker.pdf). 
Their validity is disputed by many, and I do not have the qualifications to form an opinion on this matter.
3. Data from PISA tests in approximately 80 countries. Generally accepted.

## Study Objectives:

1. Are intellectual or educational performances regionally correlated? 
The measured values were ranked and colored by continents.
2. Are median and average wealth regionally correlated? 
The available values were ranked and colored by continents.
3. Is there a correlation between educational success, national average intellectual abilities, 
and what is the mathematical relationship, and can it be quantified?
4. Are Lynn & Becker's measurements plausible in the light of PISA numbers?

## Results

An interactive webpage with graphics is generated from the data sources. 
The data can be zoomed in, and individual data points can be inspected. The process is entirely program-driven. 
The calculations are thus traceable, although not necessarily correct.

### The questions posed above can be answered as follows:

1. The top ranks in IQ and education, both according to Lynn & Becker's data and PISA results, 
are primarily occupied by East Asian countries, followed by European countries and 
culturally European-influenced countries such as Canada, 
the USA, Australia, and New Zealand. 
In the next group, you'll find mainly other Asian and South American countries. 
In the last group, you'll find most Sub-Saharan African countries, as well as many Central American and Caribbean states.

2. Regarding wealth, the distribution looks different. 
A notable difference is that the median and average income of East Asian countries is not at the top,
as would have been expected based on educational achievements and IQ data. 
They are primarily occupied by European countries, as well as the USA, Canada, Australia, and New Zealand.
Only Singapore and Hong Kong are in the top group.

Average wealth increases exponentially with PISA-measured education points and IQ. 
The data underwent a fitting process, and the data points were weighted according to the population of each country. 
A 15-point higher average IQ, the standard deviation of personal IQ, 
leads to a threefold average income and a 2.5 times higher median income. 
A similar relationship exists between the dependence of median and average incomes on the education level measured by PISA. 
A 100-point higher PISA result leads to a sixfold increase in the corresponding incomes. 
The impact is slightly stronger on the median than on the average. 
A higher education level or IQ leads to a slight financial equalization of societies. 
It should be noted for the PISA data that there is no data available for most populous but financially weak countries, 
particularly India and Africa. 
The correlation in all fits is close to 0.6-0.7, with the corresponding R^2 at 0.4. 40% of the variation 
in incomes, on the one hand, and education and intelligence, on the other hand, are co-causal.

3. Lynn & Becker's numbers are compatible with those of PISA. The correlation is ..... 
Some outliers in both directions are noticeable, especially Cambodia and Saudi Arabia, 
but not to an extent that fundamentally questions one or the other measurement. 100 PISA points 
correspond to about 15 IQ points.

## Notes:
It is noteworthy that Western countries, Western Europe, USA, Canada, Australia, and New Zealand perform significantly 
better in terms of wealth than would be expected based on the collected education and intelligence data. 
The difference in wealth compared to the fit is a factor of two to five. 
East Asian countries, not only China but also Japan, are in the range of expected or below. 
Sub-Saharan countries also have incomes well below expected levels.

## Participation:
I am very grateful for any comments from the community, especially if they concern methodological errors, errors in the data, 
or inadequacies or errors in the areas of mathematics or programming. 
I have no competence in sociological matters, will read comments on this topic with interest, 
but due to a lack of qualifications, 
I will hardly respond to them. 

Contact address: 
To avoid spam through web scraping, please replace the umlauts with the corresponding letters.

herr döt sackbauer ät gmäil döt cöm.



As to try the code, all you need is a computer running Python with Numpy, 
Pandas and Plotly installed and the Python file iq.py and the CSV files read in this script. 
The Pipfile and requirements can help you in the installation process with pipenv and pip respectively.

The program does not have any direct user interface. Just run it. 
It will try to start the webbrowser with a page showing the results. 
If this does not work on your system, just open the file iframes.html. 

You can of course clone the directory so you can issue pull requests.
"""
import math
import pathlib as path
import webbrowser
import numpy as np
import pandas as pd
import markdown
import plotly.io as pio
import plotly.express as px
import plotly.express.colors as col
import plotly.graph_objects as go
import plotly.figure_factory as ff

README_MARKDOWN = __doc__
# create the markdown file for github
with open('README.md', 'w', encoding="utf-8") as file:
    file.write(README_MARKDOWN)
# create the html file for the webbrowser
html_content = markdown.markdown(README_MARKDOWN)
with open('README.html', 'w') as file:
    file.write(html_content)
#
# the weird looking # %% comments a Jupyter Notebook cell marker for VS Code to aid debugging
#
# %%
# Read the CSV tables for IQ, Pisa results and Wealth
iq1 = pd.read_csv("IQByCountry.csv")
iq1 = iq1.sort_values(by='smartestCountries_iq', ascending=False)
pisa = pd.read_csv("pisa-scores-by-country-2024.csv")
wealth = pd.read_csv("wealth.csv")
# Give same name to the variable by which to join the tables
wealth.rename(columns={'Location': 'country'}, inplace=True)
# %%
wealth_iq = wealth.merge(iq1[['country', 'smartestCountries_iq',
                              'region', 'cca3', 'pop2023']], on='country', how='left')
wealth_iq.rename(columns={'smartestCountries_iq': 'IQ1', 'region': 'Region',
                          'cca3': 'CCA3', 'pop2023': 'Population'}, inplace=True)
# only use countries with known populations as populations are needed for the fits
wealth_iq = wealth_iq.dropna(subset=['Population'])
wealth_iq = wealth_iq[wealth_iq['IQ1'] > 70]  # not credible IQs are filtered
wealth_iq = wealth_iq.sort_values(by='IQ1', ascending=False)

# %%

# Pretty long names are quite annoying
pisa.rename(columns={'PISAScoresOverallScore2022': 'PisaOverall',
                     'PISAScoresMathScore2022': 'PisaMath',
                     'PISAScoresReadingScore2022': 'PisaRead',
                     'PISAScoresScienceScore2022': 'PisaScience'},
            inplace=True)
# Join wealth_iq and pisa DataFrames
wealth_iq_pisa = wealth_iq.merge(pisa[['country', 'PisaOverall', 'PisaMath',
                                       'PisaRead', 'PisaScience']],
                                 on='country', how='left')


# save complete  data set  as csv file
wealth_iq_pisa.to_csv("wealth_iq_pisa.csv", index=False)

# Save the DataFrame as an HTML file with DataTables functionality
# can be used for documentation

html_output = wealth_iq_pisa.to_html(
    classes='table table-striped table-bordered', index=False, table_id='example')
# Save the HTML output to a file
with open('wealth_iq_pisa.html', 'w', encoding='utf-8') as f:
    f.write(html_output)

# wealth_iq_pisa.dropna(subset=['PisaOverall'])

# check for which countries we do not have the necessary data
pisa_names = set(pisa.country.unique().tolist())
wealth_iq_names = set(wealth_iq.country.unique().tolist())
pisa_names_not_in_wealth_iq = pisa_names - wealth_iq_names
print(pisa_names_not_in_wealth_iq)
# %%
# used to make the size of the graphic marker logarithmically proportional to the population size
bubble_size_lin = wealth_iq['Population'].astype(float).tolist()
bubble_size_log = [max(1, math.log10(x/10_000)**2) for x in bubble_size_lin]

# %%
# Choose a nice pre-made color palette
# dir(col.qualitative)
palette = col.qualitative.G10
regions = wealth_iq['Region'].sort_values().unique().tolist()
# use the same color map throughout all plots.
color_map = dict(zip(regions, palette))
bar_order = {'CCA3': wealth_iq['CCA3'].tolist(), }
# %%
# prepare figure containing the table data to be displayed in the browser
columns = wealth_iq_pisa.columns.tolist()

data_matrix = [columns]
# Column names first
for row in wealth_iq_pisa.sort_values(by="country").values.tolist():
    data_matrix.append(row)
# now the data
for row in data_matrix:
    # break long multi word strings in country and region to make the columns narrower
    for column in [0, 6]:
        row[column] = row[column].replace(" ", "<br>\n")
    # make the others floats to show them as numbers
    # some may not be number, never mind, let them stay strings
    # we probably do not need them anyway
    for column in row:
        try:
            row[column] = float(row[column])
        except TypeError:
            pass
table = ff.create_table(data_matrix, height_constant=60)
table.update_layout(title='Wealth, IQ and PISA Scores by Country',
                    template='plotly_dark')
# %%
# make a bar chart with the ordered list of IQs of countries, using color for the continent
fig1 = px.bar(wealth_iq_pisa, y='IQ1', x='CCA3',
              color='Region',
              color_discrete_map=color_map,
              category_orders=bar_order,
              hover_data=['country', 'Region', 'Population'],)
fig1.update_layout(xaxis_title='Country', yaxis_title='IQ',
                   title='IQ by Country', template='plotly_dark', yaxis=dict(range=[70, None]))
# %%
# make a bar chart with the ordered list of PISA point of countries, using color for the continent
#
wealth1 = wealth_iq_pisa.sort_values(
    by='PisaOverall', ascending=False).dropna(subset=['PisaOverall'])
bar_order = {'CCA3': wealth1['CCA3'].tolist(), }
fig2 = px.bar(wealth1, y='PisaOverall', x='CCA3', color='Region',
              color_discrete_map=color_map, category_orders=bar_order,
              hover_data=['country', 'Region', 'Population'],)
fig2.update_layout(xaxis_title='Country', yaxis_title='Pisa Score',
                   title='Pisa Score by country', template='plotly_dark',
                   yaxis=dict(range=[300, None]))

# %%
# make a bar chart with the ordered list of Median wealth of countries
# using color for the continent
wealth_iq_rs = wealth_iq.sort_values(by='Median wealth', ascending=False)
bar_order = {'CCA3': wealth_iq_rs['CCA3'].tolist(), }
fig1a = px.bar(wealth_iq_rs, x='CCA3', y='Median wealth',
               color='Region',
               color_discrete_map=color_map,
               category_orders=bar_order,
               hover_data=['country', 'Region', 'Population', "Median wealth"],
               )
fig1a.update_layout(xaxis_title='Country', yaxis_title='Median Wealth',
                    title='Median Wealth by Country', template='plotly_dark',
                    yaxis_type="log",
                    yaxis=dict(range=[2, 6]))
# %%
# make a bar chart with the ordered list of Mean wealth of countries, using color for the continent
wealth_iq_rs = wealth_iq.sort_values(by='Mean wealth', ascending=False)
bar_order = {'CCA3': wealth_iq_rs['CCA3'].tolist(), }
fig2a = px.bar(wealth_iq_rs,  x='CCA3', y='Mean wealth',
               color='Region',
               color_discrete_map=color_map,
               category_orders=bar_order,
               hover_data=['country', 'Region', 'Population', "Mean wealth"],)
fig2a.update_layout(xaxis_title='Country', yaxis_title='Mean Wealth',
                    yaxis_type="log",
                    title='Mean Wealth by Country', template='plotly_dark')

# %%
# make a scatter plot with  IQs of countries vs. their median income
#  using color for the continent
fig3 = px.scatter(wealth_iq, x='IQ1', y='Median wealth',
                  hover_data=['country', 'Region', 'Population'],
                  color="Region",
                  size=bubble_size_log,
                  color_discrete_map=color_map,)
fig3.update_layout(xaxis_type="linear", yaxis_type="log",
                   title='Mean Wealth vs IQ by Region', template='plotly_dark')
# Perform log linear fit
fit = np.polyfit(wealth_iq['IQ1'].astype(float), np.log10(
    wealth_iq['Median wealth'].astype(float)), 1,
    w=wealth_iq['Population'].astype(float)
)
slope = fit[0]
intercept = fit[1]

# Create a trendline trace and overlay it to the data points
trendline_x = wealth_iq['IQ1']
trendline_y = 10**(slope * trendline_x + intercept)
trendline_trace = go.Scatter(
    x=trendline_x, y=trendline_y, mode='lines', name='Trendline')
fig3.add_trace(trendline_trace)
# Annotate the fit with the fit parameter
annot = (f"<br>15 more IQ points -> <br>{(10**(slope*15)):.2f} "
         + "times the Median wealth")
xcor = list(trendline_x)[0]  # first point   x
xend = list(trendline_x)[-1]  # last point in x
# xcor, ycor value at last point used for placing the annotation at the height of the peak of
# the trendline, at the beginning of the x-range as plots have have been arranged monotonically
# rising in x, this corner should be free of data points
ycor = slope*xend + intercept
# print(xcor, ycor, "--", annot)
fig3.add_annotation(x=xcor, y=ycor, text=annot,
                    xanchor="right",  showarrow=False)

# Update the layout of fig3
fig3.update_layout(xaxis_type="linear", yaxis_type="log",
                   title='Median Wealth vs IQ by Region', template='plotly_dark')

# %%
# %%
# make a scatter plot with  Pisa score of countries vs. their median income
#  using color for the continent
fig4 = px.scatter(wealth_iq_pisa, x='PisaOverall', y='Median wealth',
                  hover_data=['country', 'Region'], color="Region",
                  color_discrete_map=color_map, size=bubble_size_log)
fig4.update_layout(xaxis_type="linear", yaxis_type="log",
                   title='Median Wealth vs PISA score by Region',
                   template='plotly_dark',
                   yaxis=dict(range=[np.log10(1000), np.log10(1_000_000)]))
# Perform log  linear fit calculate trendline and overlay it to the plot,
# annotate it with the obtained fit parameters
# %%
wealth_iq_pisa = wealth_iq_pisa.dropna(subset=['PisaOverall'])
fit = np.polyfit(wealth_iq_pisa['PisaOverall'],
                 np.log10(wealth_iq_pisa['Median wealth']), 1)
slope = fit[0]
intercept = fit[1]
# Add the trendline trace to fig
# Create a trendline trace
trendline_x = wealth_iq_pisa['PisaOverall']
trendline_y = 10**(slope * trendline_x + intercept)
trendline_trace = go.Scatter(
    x=trendline_x, y=trendline_y, mode='lines', name='Trendline')
fig4.add_trace(trendline_trace)

# Annotate
annot = (f"<br>100 more Pisa points -> <br>{(10**(slope*100)):.2f} "
         + "times the Median wealth")

xcor = list(trendline_x)[0]
xend = list(trendline_x)[-1]
ycor = (slope*xend) + intercept
print(xcor, ycor, "--", annot)
fig4.add_annotation(x=xcor, y=ycor, text=annot,
                    xanchor="right",  showarrow=False)


# %%
# make a scatter plot with IQ score of countries vs. their Mean income
#  using color for the continent, again fit, overlay the fit line and annotate it

fig5 = px.scatter(wealth_iq, x='IQ1', y='Mean wealth', hover_data=['country', 'Region', 'Population'],
                  color="Region", size=bubble_size_log, color_discrete_map=color_map,)
fig5.update_layout(xaxis_type="linear", yaxis_type="log",
                   title='Mean Wealth vs IQ by Region', template='plotly_dark')
# Perform linear fit
fit = np.polyfit(wealth_iq['IQ1'].astype(float), np.log10(
    wealth_iq['Mean wealth'].astype(float)), 1,
    w=wealth_iq['Population'].astype(float)
)
slope = fit[0]
intercept = fit[1]
# Create a trendline trace
trendline_x = wealth_iq['IQ1']
trendline_y = 10**(slope * trendline_x + intercept)
trendline_trace = go.Scatter(
    x=trendline_x, y=trendline_y, mode='lines', name='Trendline')

# Add the trendline trace to fig
fig5.add_trace(trendline_trace)
# Annotate
annot = (f"<br>15 more IQ points -> <br>{10**(slope*15):.2f} "
         + "times the Mean Wealth")
xcor = list(trendline_x)[0]
xend = list(trendline_x)[-1]
ycor = ((slope*xend) + intercept)
fig5.add_annotation(x=xcor, y=ycor, text=annot,
                    xanchor="right",  showarrow=False)
# %%
# make a scatter plot with PISA scores of countries vs. their Mean income
#  using color for the continent, again fit loglinear, overlay the fit line and annotate it
bubble_size_lin = wealth_iq_pisa['Population'].astype(float).tolist()
bubble_size_log = [max(1, math.log10(x/10_000)**2) for x in bubble_size_lin]
fig6 = px.scatter(wealth_iq_pisa, x='PisaOverall', y='Mean wealth',
                  hover_data=['country', 'Region'], color="Region",
                  color_discrete_map=color_map, size=bubble_size_log)
fig6.update_layout(xaxis_type="linear", yaxis_type="log",
                   title='Mean Wealth vs PISA score by Region', template='plotly_dark',
                   yaxis=dict(range=[np.log10(1000), np.log10(1_000_000)]))

fit = np.polyfit(wealth_iq_pisa['PisaOverall'],
                 np.log10(wealth_iq_pisa['Mean wealth']), 1)
slope = fit[0]
intercept = fit[1]
# Create a trendline trace
trendline_x = wealth_iq_pisa['PisaOverall']
trendline_y = 10**(slope * trendline_x + intercept)
trendline_trace = go.Scatter(
    x=trendline_x, y=trendline_y, mode='lines', name='Trendline')
fig6.add_trace(trendline_trace)
# Annotate
annot = (f"<br>100 more PISA points -> <br>{(10**(slope*100)):.2f} "
         + "the Mean Wealth")
xcor = list(trendline_x)[0]
xend = list(trendline_x)[-1]
ycor = (slope*xend) + intercept
# print(xcor, ycor, "--", annot)
fig6.add_annotation(x=xcor, y=ycor, text=annot,
                    xanchor="right",  showarrow=False)

# %%
#  make a scatter plot with IQ score of countries vs. their Pisa score
#  using color for the continent, again fit linear this time,
# overlay the fit line and annotate it

fig7 = px.scatter(wealth_iq_pisa, x='PisaOverall', y='IQ1',
                  hover_data=['country', "Population"], color="Region",
                  size=bubble_size_log,
                  color_discrete_map=color_map,)

# Extract the values of PisaOverall and IQ1 from the DataFrame
# Drop rows with NaN values in PisaOverall or IQ1 columns

xfit = wealth_iq_pisa['PisaOverall'].values.astype(float)
yfit = wealth_iq_pisa['IQ1'].values.astype(float)
weights = wealth_iq_pisa['Population'].values.astype(float)

# Calculate the linear fit
coefficients = np.polyfit(xfit, yfit, 1,
                          w=weights
                          )
slope = coefficients[0]
intercept = coefficients[1]

# Plot the linear fit on Fig 7
fig7.add_trace(go.Scatter(x=xfit, y=slope*xfit +
               intercept, mode='lines', name='Linear Fit'))

# Add annotations for the fit parameters
fig7.add_annotation(x=xfit[-1],  # x-coordinate for the annotation
                    # y-coordinate for the annotation
                    y=slope*xfit[-1] + intercept,
                    # text for the annotation
                    text=f'Linear Fit<br> IQ= {
                        slope:.2f}* Pisa + {intercept:.0f}',
                    showarrow=True,  # show an arrow pointing to the annotation
                    arrowhead=3,  # style of the arrowhead
                    ax=-40,  # x-component of the arrow tail
                    ay=100  # y-component of the arrow tail
                    )

fig7.update_layout(
    xaxis=dict(showgrid=True), yaxis=dict(showgrid=True),
    title='IQ vs PISA Score by Region', template='plotly_dark', yaxis_range=[70, None])

fig7.update_traces(marker_symbol="x")
# %%
# arrange the figures and save them as SEPARATE html files, gridding is done in HTML not in plotly
#

for i, fig in enumerate([fig1,  fig2, fig1a, fig2a, fig3, fig4, fig5, fig6, fig7, table]):
    pio.write_html(fig, file=f'frame{i+1}.html')
#
# Keep the root webpage inside the python script and write it at the end of the script
# to make distribution easier, only need data cvs files and the python script to run it
#
html_page = """
<!DOCTYPE html>
<html>

<head>
    <title>Comparison Lynn 2022 IQ and Pisa vs. Median and Mean Wealth</title>
    <style>
        body {
            display: flex;
            flex-direction: column;
        }

        .iframe-container {
            display: flex;
            flex-wrap: wrap;
        }


        .iframe {
            width: 45%;
            height: 600px;
        }

        .fullwidth iframe {
            width: 100%;
            height: 600px;
        }
    </style>
</head>

<body>
   <div> <iframe style="width: 100%; height: 250px" src="README.html"></iframe>
        <p>
            <a href="README.html" target="_blank">
                <img src="zoom-21-16.png" alt="zoom">
                Read more...
            </a><br>
    </div>

    <div class="iframe-container">

        <iframe class="iframe" src="frame1.html"></iframe><br>
        <a href="frame1.html" target="_blank">
            <img src="zoom-21-16.png" alt="zoom">
        </a><br>

        <iframe class="iframe" src="frame2.html"></iframe><br>
        <a href="frame2.html" target="_blank">
            <img src="zoom-21-16.png" alt="zoom">
        </a><br>

        <iframe class="iframe" src="frame3.html"></iframe><br>
        <a href="frame3.html" target="_blank">
            <img src="zoom-21-16.png" alt="zoom">
        </a><br>

        <iframe class="iframe" src="frame4.html"></iframe><br>
        <a href="frame4.html" target="_blank">
            <img src="zoom-21-16.png" alt="zoom">
        </a><br>

        <iframe class="iframe" src="frame5.html"></iframe><br>
        <a href="frame5.html" target="_blank">
            <img src="zoom-21-16.png" alt="zoom">
        </a><br>

        <iframe class="iframe" src="frame6.html"></iframe><br>
        <a href="frame6.html" target="_blank">
            <img src="zoom-21-16.png" alt="zoom">
        </a><br>

        <iframe class="iframe" src="frame7.html"></iframe><br>
        <a href="frame7.html" target="_blank">
            <img src="zoom-21-16.png" alt="zoom">
        </a><br>
        <iframe class="iframe" src="frame8.html"></iframe><br>
        <a href="frame6.html" target="_blank">
            <img src="zoom-21-16.png" alt="zoom">
        </a><br>

        <iframe class="iframe" src="frame9.html"></iframe><br>
        <a href="frame9.html" target="_blank">
            <img src="zoom-21-16.png" alt="zoom">
        </a><br>



    </div>
    <div> <iframe style="width: 100%; height: 400px" src="frame10.html"></iframe>
        <p>
            <a href="frame10.html" target="_blank">
                <img src="zoom-21-16.png" alt="zoom">
            </a><br>
    </div>
    <div class="iframe-container">
        <h2>Resources</h2>
        <p>
            <a href="wealth_iq_pisa.html" target="_blank">
                <img src="zoom-21-16.png" alt="zoom"> Raw Data</a>
            <a href="wealth_iq_pisa.csv" target="_blank">
                <img src="zoom-21-16.png" alt="zoom"> as CSV File for Excel</a>
            <a href="https://github.com/HelmutQualtinger/VermoegenWiki/blob/main/iq.py" target="_blank">
                <img src="zoom-21-16.png" alt="zoom"> Python Source File</a>
    </div>

</body>

</html>
"""
with open('iframes.html', 'w', encoding='utf-8') as file:
    file.write(html_page)
# Open frames.html in a browser
# take care of different operating systems paths, use pathlib
# create an absolute URI from the path without disclosing
# the local file system structure in the python script
framesfile = path.Path('iframes.html').absolute()
webbrowser.get().open(framesfile.as_uri())

# %%

print(__doc__)