Privacy


To understand the issues around this learning analytics tool’s privacy concerns, it is important to understand the legal framework that protects students’ educational records and gives the parents of students under 18 the right to do the same. The Family Educational Rights and Privacy Act (FERPA) was created to protect students’ privacy. All schools funded by the United States Department of Education are required to understand, know, and comply with these regulations. If a school fails to do so, it can face investigations and auditing, compliance implementation planning, and possibly funding sanctions, amongst other consequences. Schools, districts, or individual staff members, including educators, can also face the consequences of FERPA violations. Therefore, schools must have established policies and procedures to ensure student privacy. Furthermore, with wider technological tools within reach for both students and school staff, the Children’s Online Privacy Protection Act (COPPA) and Children’s Internet Protection Act (CIPA), both meant to protect children’s activity and privacy online, should be considered as well. So far, we got into the regulation that protects student’s privacy, but what about teacher’s personal information? In this case, educators’ information is protected by the Federal Trade Commission (FTC). In some cases, it is protected by State regulations for their residents, like the California Online Privacy Protection Act (CalOPPA).

To dig deeper into the main privacy concerns surrounding our learning analytics tool, it is important to understand that privacy concerns may arise from the student’s or teacher’s perspective. Therefore, we will enumerate the 4 categories that Solove’s Taxonomy considers as areas where privacy violations can lead to potential harms (Daniel J. Solove, A Taxonomy of Privacy, 2006):

This document will not analyze each concept of Solove’s concepts; instead, it will focus on the potential harms that might occur by using this tool.

Information Collection

Since this learning analytics tool will track students’ performance and teachers’ rigor throughout its use, there is a risk of surveillance harm. This is evident in the fact that the tool relies on data collection and analysis, which could cause both students and teachers anxiety and discomfort as a result of being constantly monitored for the sake of improving their schools’ performance.

Information Processing

Even though the anonymization of students and teachers is a requirement to use this tool, an Identification risk remains given only the information included. Because the dataset contains demographic information for both teachers and students, an identification risk may arise that and violate FERPA. Also, by combining this same information there is an Aggregation risk to be considered because our analysis could reveal new facts about a teacher or student that they did not expect would be known about them. The Exclusion risk may appear if information is being used to decide which students can enroll into intervention classes or which educator gets more students or courses, leaving others out without understanding how their information was used to make those decisions.

Dissemination of Information

Breach of Confidentiality and Exposure are risks that are always present and even more with the possibility of the Identification risk. Leaking or disclosing information of students’ performance in school may cause embarrassment and humiliation.

Invasion

And lastly, Decisional Interference risk is important since the decision made by boards or principal with the information provided by this tool can affect students’ academic and even professional choices and teachers’ professional career. Therefore, our tool must provide the strongest security protocols, the information and reports produced should be aggregated and not individualized and schools should get teachers informed consent to process and use their information.

Ethics


Aside from legal concerns that come with the territory when handling the data of students under the age of 18, it is important to also consider the ethical implications of this work. In particular, we need to take into consideration what bias might be inherent in our data. The specific data we are working with from a high school in the American Midwest was provided to us already cleaned; because of this, we do not know what the data collection process was like. There is a great deal of room for bias to be inserted, consciously or unconsciously, into any dataset where human decision-making is involved in the way the data is documented, included, or excluded.

One example of this that is likely an issue in our dataset has to do with how student identities are generalized in the data. In this context, students’ demographic data is often collected using school-wide surveys. For the information collected on these surveys to be useful in understanding demographic trends, demographic identities must be defined in terms narrow enough to create a reasonably low number of categories. Humans, however, have no such limits imposed on their actual experience and identity formation. This means that what students end up writing down on the survey may not always fully and accurately reflect their identities. Generalizing demographic data in this way is necessary for the data to provide useful categories for comparison, but it also inherently limits the nuance of insights that can be drawn from it. In a sense, it also limits their reliability. Imagine that a brown-skinned Latinx student selects that they are Latinx on a survey, and then is also made to select a race, but the only options are “Black,” “White,” “Asian or Asian American,” “American Indian or Alaska Native,” or “Native Hawaiian or other Pacific Islander.” None of these apply exactly to this student, so they choose the one they most identify with, but that does not necessarily reflect their identity. The data now associates that students’ data with a white Latinx identity, which could lead to slightly unreliable conclusions being drawn since the data does not tell the whole story of this student’s identity.

This example is a part of a bigger concern about this data - each line in our dataset represents a student, and each student is a full human being whose motivations and competencies cannot be easily reduced to a single line of information. It’s inherently risky to boil down the human experience this much; we are very aware of the risk of overgeneralizing, and for this reason use this data in an effort to provide a reference resource, not an identification of a definite cause or solution to any problem. In reality, there are many factors beyond the ones we investigate here that could impact a student's performance in class and/or on standardized tests. For example, some students struggle with test anxiety, some experience homelessness, and some have undiagnosed or unaccommodated learning differences. Any of those all-too-common life experiences could impact a students’ learning trajectory, just as their teachers or courses can.

The goal of this tool is not to establish the cause of any insight it generates. The goal of this tool is simply to share information based on school data. This landscape is tough to navigate because it is easy to jump to the conclusion that if all of the Black students in a teacher’s course are failing, and everybody else is passing, then the teacher must be racist. In reality, there are many reasons why certain students fail while others do not, and the situation is likely far more nuanced than we assume at first glance, both in its cause and in the intervention that may be necessary to rectify it. This tool reveals correlations between different teachers, courses, and student demographics, and supplies novel metrics for measuring teacher effectiveness. It does not make the leap into establishing the causes of performance differences or prescribing interventions to address them.