Aggregating functions for DOCS¶
Aggregating file to base DOCS file¶
- Documents.aggregate(filename: str, *, left_on: None | str | callable = None, right_on: None | str | callable = None, on: str | None = None, subset: None | str | List[str] = None, drop: None | str | List[str] = None, rename: dict | None = None, merge_policy: Literal['merge', 'keep', 'erase', 'replace', 'fill_na'] | None = 'merge', preprocessing: None | Callable | Operation = None, postprocessing: None | Callable | Operation = None, read_method: Callable | None = None, kw_read: dict | None = {})¶
Aggregation of a table from an Excel/CSV file.
The
left_onandright_onarguments are column names used to perform a join:left_onis a column in the central fileeffectif.xlsx, andright_onis a column from the file to aggregate. For more complex joins, you can use theguv.helpers.id_slug()andguv.helpers.concat()functions (see Examples). Ifleft_onandright_onhave the same value, you can simply specifyon.subsetis a list of columns to keep if you don’t want to aggregate all columns.dropis a list of columns to remove.renameis a dictionary of columns to rename.read_methodis a callable used to read the file and is called withkw_read.preprocessingandpostprocessingare callables that take a DataFrame as input and return a processed one — they are applied before and after the aggregation, respectively.merge_policyindicates how to handle merging columns with the same name ineffectif.xlsxand the aggregated file.- Parameters:
filename (
str) – Path to the CSV/Excel file to aggregate.left_on (
str) – Column name in theeffectif.xlsxfile used for the join. You can also use functions likeguv.helpers.id_slug()andguv.helpers.concat()for multi-column joins.right_on (
str) – Column name in the file to aggregate used for the join. Functions likeguv.helpers.id_slug()andguv.helpers.concat()are also supported.on (
str) – Shortcut whenleft_onandright_onare the same.subset (
list, optional) – List of columns to include. By default, all columns are incorporated.drop (
list, optional) – List of columns to exclude from aggregation.rename (
dict, optional) – Dictionary to rename columns after incorporation.read_method (
callable, optional) – Function used to load the file. Pandas functions likepd.read_csvandpd.read_excelare automatically selected based on file extension (“.csv”, “.xlsx”).kw_read (
dict, optional) –Keyword arguments passed to
read_method. For example, for a “.csv” file:kw_read={"header": None, "names": ["Email", "PW group"]} kw_read={"na_values": "-"}
preprocessing (
callable, optional) – Pre-processing function applied to the DataFrame before incorporation.postprocessing (
callable, optional) – Post-processing function applied to the DataFrame after incorporation.merge_policy (
str, optional) –Strategy for merging columns with the same name:
merge: Merge only if columns complement each other (NA values).replace: Use all non-NA values from the file to aggregate.fill_na: Only replace NA values ineffectif.xlsx.keep: Keep the original column fromeffectif.xlsxwithout changes.erase: Overwrite the column with values from the file to aggregate.
Examples
Aggregating columns from a CSV file, matching the
emailcolumn in the CSV withEmail addressin the central file:DOCS.aggregate( "documents/notes.csv", left_on="Email address", right_on="email" )
Aggregating only the
Notecolumn:DOCS.aggregate( "documents/notes.csv", left_on="Email address", right_on="email" subset="Note" )
Aggregating and renaming the
Notecolumn toNote_médian:DOCS.aggregate( "documents/notes.csv", left_on="Email address", right_on="email" subset="Note", rename={"Note": "Note_médian"} )
Aggregating using a CSV without a header:
DOCS.aggregate( "documents/notes.csv", on="Email", kw_read={"header": None, "names": ["Email", "Grade"]}, )
Aggregating based on
NameandLast nameusing a slug ID to allow for flexible matching (ignores accents, cases, hyphens, etc.):from guv.helpers import id_slug DOCS.aggregate( "documents/notes.csv", left_on=id_slug("Name", "Last name"), right_on=id_slug("Name", "Last name") )
Aggregating when the aggregated file has a single column with both names, while the main file separates them:
from guv.helpers import concat DOCS.aggregate( "documents/notes.csv", left_on=concat("Name", "Last name"), right_on="Full_name" )
- Documents.aggregate_jury(filename: str)¶
Aggregates the result of a jury from the task
XlsGradeBookJury.The equivalent using
aggregate()is written as:DOCS.aggregate( "generated/jury_gradebook.xlsx", on="Email", subset=["Aggregated grade", "ECTS grade"] )
- Parameters:
filename (
str) – The path to the file to aggregate.
Examples
DOCS.aggregate_jury("generated/jury_gradebook.xlsx")
- Documents.aggregate_moodle_grades(filename: str, rename: dict | None = None)¶
Aggregates grade sheets exported from Moodle.
The grade sheet must be an export of one or more grades from the Moodle gradebook in the form of an Excel or CSV file.
Unused columns will be removed. Column renaming can be performed by specifying
rename.- Parameters:
filename (
str) – The path to the file to aggregate.rename (
dict, optional) – Dictionary to rename columns after incorporation.
Examples
DOCS.aggregate_moodle_grades("documents/SY02 Notes.xlsx")
- Documents.aggregate_moodle_groups(filename: str, colname: str, backup: bool | None = False)¶
Aggregates group data from the “Group Choice” activity.
Since the group column is always named “Groupe”, the
colnameargument allows you to specify a different name. If the column already exists in the central file, it can be backed up using thebackupoption.- Parameters:
filename (
str) – The path to the file to aggregate.colname (
str) – The column name to store group information in.backup (
bool) – Save the column before making any changes
Examples
DOCS.aggregate_moodle_groups("documents/Paper study groups.xlsx", "Paper")
- Documents.aggregate_org(filename: str, colname: str, on: str | None = None, postprocessing: None | Callable | Operation = None)¶
Aggregation of a file in Org format.
The document to aggregate is in Org format. The headings serve as keys for aggregation, and the content under those headings is aggregated.
- Parameters:
filename (
str) – Path to the Org file to aggregate.colname (
str) – Name of the column in which to store the information from the Org file.on (
str, optional) – Column from theeffectif.xlsxfile used as the key to aggregate with the headings from the Org file. By default, the headings must contain the students’ first and last names.postprocessing (
callable, optional) – Post-processing to apply to the DataFrame after integrating the Org file.
Examples
Aggregating a file where the headings are student names:
The Org file:
* Bob Morane Frequently absent * Untel See excuse email
The aggregation command:
DOCS.aggregate_org("documents/infos.org", colname="Information")
Aggregating a file where the headings match elements from an existing column. For example, you can aggregate project grades from an Org file, grouped by project team. Specify the column containing the project groups, e.g., “Project1_group”.
The Org file:
* Project1_Group1 A * Project1_Group2 B
The aggregation command:
DOCS.aggregate_org("documents/infos.org", colname="Project 1 Grade", on="Project1_group")
- Documents.aggregate_self(*columns)¶
Aggregation of the central file
effectif.xlsxitself.It is sometimes more convenient to manually add or modify columns directly in the central file
effectif.xlsxrather than programmatically usingDOCS.aggregate(...), for example. Since guv cannot reliably detect which columns were manually added or modified, you must specify them explicitly using the*columnsargument.- Parameters:
*columns (any number of
str) – The manually added or modified columns to retain when updating the central file.
Directly modify base DOCS file¶
- Documents.add(filename, func=None, kw_func=None)¶
Adds a file to the central file
Simply adds the file if the central file does not yet exist. Otherwise, uses the function
functo perform the aggregation. The functionfunctakes as arguments the existing DataFrame and the path to the file, and returns the updated DataFrame. Keywords arguments can be given tofuncusingkw_func.If
funcis not provided, use eitherpd.read_csvorpd_read_excelbased on the extension of the file. Keywords arguments can be given topd.read_csvorpd_read_excelusingkw_func.See specialized functions for incorporating standard documents:
aggregate(): CSV/Excel documentaggregate_org(): Org document
- Parameters:
filename (
str) – The path to the file to aggregate.func (
callable, optional) – A function with signature DataFrame, filename: str -> DataFrame that performs the aggregation.kw_func (
dict, optional) – Keyword arguments to use withfuncorpd.read_csvorpd.read_excel.
Examples
Adding data when there is nothing yet:
DOCS.add("documents/base_listing.xlsx")
Adding data when there is nothing yet with extra keyword arguments:
DOCS.add("documents/base_listing.xlsx", kw_func={"encoding": "ISO_8859_1"})
Aggregating using a function:
def function_that_incorporates(df, file_path): # Incorporate the file at `file_path` into the DataFrame `df` # and return the updated DataFrame. DOCS.add("documents/notes.csv", func=function_that_incorporates)
- Documents.apply_cell(name_or_email: str, colname: str, value, msg: str | None = None)¶
Replaces the value of a cell.
name_or_emailis the student’s full name or email address, andcolnameis the name of the column where the change should be made. The new value is provided viavalue.- Parameters:
name_or_email (
str) – The student’s full name or email address.colname (
str) – The name of the column where the modification should be made.value – The value to assign.
msg (
str, optional) – A message describing the operation.
Examples
DOCS.apply_cell("Mark Watney", "DIY Grade", 20)
- Documents.apply_column(colname: str, func: Callable, msg: str | None = None)¶
Modifies an existing column using a function.
colnameis the name of an existing column, andfuncis a function that takes an element from the column as input and returns a modified value.A
msgcan be provided to describe the operation; it will be displayed when the aggregation is performed. Otherwise, a generic message will be shown.- Parameters:
colname (
str) – Name of the column where replacements will be made.func (
callable) – Function that takes an element and returns the modified element.msg (
str, optional) – A message describing the operation.
Examples
DOCS.apply("note", lambda e: float(str(e).replace(",", ".")))
- Documents.apply_df(func: Callable, msg: str | None = None)¶
Modifies the central file using a function.
funcis a function that takes a DataFrame representing the central file and returns the modified DataFrame.A
msgcan be provided to describe what the function does. It will be displayed when the aggregation is performed. Otherwise, a generic message will be shown.- Parameters:
func (
callable) – Function that takes a DataFrame and returns a modified DataFrame.msg (
str, optional) – A message describing the operation.
Examples
Add a student missing from the official listing:
import pandas as pd df_one = ( pd.DataFrame( { "Last name": ["NICHOLS"], "Name": ["Juliette"], "Email": ["juliette.nichols@silo18.fr"], } ), ) DOCS.apply_df(lambda df: pd.concat((df, df_one)))
Remove duplicate students:
DOCS.apply_df( lambda df: df.loc[~df["Email"].duplicated(), :], msg="Remove duplicate students" )
- Documents.compute_new_column(*cols: str, func: Callable, colname: str, msg: str | None = None)¶
Creating a column from other columns
The columns required for the calculation are specified in
cols. In case you want to change the column used for the calculation without modifying thefuncfunction, you can provide a tuple("col", "other_col")wherecolis the name used infuncandother_colis the actual column used.The
funcfunction that computes the new column receives a Pandas Series of all the values from the specified columns.- Parameters:
*cols (list of
str) – List of columns provided to thefuncfunctionfunc (
callable) – Function that takes a dictionary of “column names/values” as input and returns a computed valuecolname (
str) – Name of the column to createmsg (
str, optional) – A message describing the operation
Examples
Weighted average of two grades:
def average(grades): return .4 * grades["Note_médian"] + .6 * grades["Note_final"] DOCS.compute_new_column("Note_médian", "Note_final", func=average, colname="Note_moyenne")
Average ignoring undefined values:
def average(grades): return grades.mean() DOCS.compute_new_column("note1", "note2", "note3", func=average, colname="Note_moyenne")
Recalculation with a modified grade without redefining the
averagefunction:DOCS.compute_new_column( ("note1", "note1_fix"), "note2", "note3", func=average, colname="Note_moyenne (fix)" )
- Documents.fillna_column(colname: str, *, na_value: str | None = None, group_column: str | None = None)¶
Replaces undefined values in the
colnamecolumnOnly one of the options
na_valueandgroup_columnshould be specified. Ifna_valueis specified, values are unconditionally replaced with the provided value. Ifgroup_columnis specified, values are filled by grouping ongroup_columnand using the only valid value in that column per group.- Parameters:
colname (
str) – Name of the column whereNAvalues will be replacedna_value (
str, optional) – Value to replace undefined values withgroup_column (
str, optional) – Name of the column used for grouping
Examples
Replace undefined entries in the
notecolumn with “ABS”:DOCS.fillna_column("note", na_value="ABS")
Replace undefined entries within each group identified by the
groupe_projetcolumn with the only defined value in that group:DOCS.fillna_column("note_projet", group_column="groupe_projet")
- Documents.flag(filename_or_string: str, *, colname: str, flags: List[str] | None = ['Oui', ''])¶
Flag a list of students in a new column
The document to aggregate is a list of student names displayed line by line.
- Parameters:
filename_or_string (
str) – Path to the file to aggregate, or the file content as a string.colname (
str) – Name of the column in which to place the flag.flags (
str, optional) – The two flag values used, default is “Yes” and empty.
Examples
Aggregating a file with student names as header:
The file “tiers_temps.txt”:
# Comments Bob Morane # Robust to circular permutation and case aRcTor BoB
The aggregation instruction:
DOCS.flag("documents/tiers_temps.txt", colname="Tiers-temps")
- Documents.replace_column(colname: str, rep_dict: dict, new_colname: str | None = None, backup: bool | None = False, msg: str | None = None)¶
Replacements in a column
Replaces values specified in
rep_dictin the columncolname.If the
backupargument is provided, the column is saved before any modification (with a_origsuffix). If thenew_colnameargument is provided, the column is copied to a new column namednew_colnameand changes are made on this new column.A message
msgcan be specified to describe what the function does; it will be displayed when the aggregation is performed. Otherwise, a generic message will be shown.- Parameters:
colname (
str) – Name of the column in which to perform replacementsrep_dict (
dict) – Dictionary of replacementsnew_colname (
str) – Name of the new columnbackup (
bool) – Save the column before making any changesmsg (
str, optional) – A message describing the operation
Examples
DOCS.replace_column("group", {"TD 1": "TD1", "TD 2": "TD2"})
ECTS_TO_NUM = { "A": 5, "B": 4, "C": 3, "D": 2, "E": 1, "F": 0 } DOCS.replace_column("Note_TP", ECTS_TO_NUM, backup=True)
- Documents.replace_regex(colname: str, *reps, new_colname: str | None = None, backup: bool | None = False, msg: str | None = None)¶
Regex replacements in a column
Replaces, in the
colnamecolumn, all occurrences matching the regular expressions specified inreps.If the
backupargument is provided, the column is saved before any modification (with a_origsuffix). If thenew_colnameargument is provided, the column is copied to a new column namednew_colnameand the modifications are applied to that new column.A
msgmessage can be specified to describe what the function does; it will be displayed when the aggregation is performed. Otherwise, a generic message will be shown.- Parameters:
colname (
str) – Name of the column in which to perform replacements*reps (any number of
tuple) – Pairs of regex / replacementnew_colname (
str) – Name of the new columnbackup (
bool) – Save the column before making any changesmsg (
str, optional) – A message describing the operation
Examples
DOCS.replace_regex("group", (r"group([0-9])", r"G\1"), (r"g([0-9])", r"G\1"))
- Documents.switch(filename_or_string: str, *, colname: str, backup: bool = False, new_colname: str | None = None)¶
Performs value swaps in a column
The
colnameargument specifies the column in which to perform the swaps. If thebackupargument is provided, the column is saved before any modification (with a_origsuffix). If thenew_colnameargument is provided, the column is copied to a new column namednew_colnameand the modifications are applied to this new column.- Parameters:
filename_or_string (
str) – Path to the file to aggregate, or the content of the file as a string.colname (
str) – Name of the column where the changes will be appliedbackup (
bool) – Save the column before making any changesnew_colname (
str) – Name of the new column
Examples
DOCS.switch("fichier_échange_TP", colname="TP")
DOCS.switch("Dupont --- Dupond", colname="TP")