Using Detailed Access Trajectories for Learning Behavior Analysis

Yanbang Wang; Nancy Law; Erik Hemberg; Una-May O'Reilly

arXiv:1812.05767·cs.HC·December 17, 2018

Using Detailed Access Trajectories for Learning Behavior Analysis

Yanbang Wang, Nancy Law, Erik Hemberg, Una-May O'Reilly

PDF

Open Access

TL;DR

This paper introduces Detailed Access Trajectories (DATs), a new data organization method for MOOC learner activity that captures rich behavioral information at an intermediate granularity, enabling improved analysis of learning behaviors.

Contribution

The paper proposes DATs as a novel data structure for MOOC activity analysis and demonstrates their usefulness through four empirical studies.

Findings

01

DATs contain rich behavioral information

02

DATs facilitate detailed MOOC learning analysis

03

Empirical studies validate DATs' effectiveness

Abstract

Student learning activity in MOOCs can be viewed from multiple perspectives. We present a new organization of MOOC learner activity data at a resolution that is in between the fine granularity of the clickstream and coarse organizations that count activities, aggregate students or use long duration time units. A detailed access trajectory (DAT) consists of binary values and is two dimensional with one axis that is a time series, e.g. days and the other that is a chronologically ordered list of a MOOC component type's instances, e.g. videos in instructional order. Most popular MOOC platforms generate data that can be organized as detailed access trajectories (DATs).We explore the value of DATs by conducting four empirical mini-studies. Our studies suggest DATs contain rich information about students' learning behaviors and facilitate MOOC learning analyses.

Figures37

Click any figure to enlarge with its caption.

Tables5

Table 1. Table 1. Number of students and log events for 6.00.1x and 6.00.2x. We use the notation in brackets as identifiers throughout the paper.

Course	#Students	# Log Events
6.00.1x Summer 2016A(1A)	113,099	17,333,974
6.00.1x Summer 2016B(1B)	40,727	7,900,908
6.00.1x Spring 2017(1C)	69,399	13,176,220
6.00.2x Spring 2016(2A)	18,362	2,642,528
6.00.2x Fall 2016(2B)	22,023	2,501,276
6.00.2x Spring 2017(2C)	18,281	2,034,539

Table 2. Table 2. Resource quantities in terms of video and finger exercises, with 6.00.1x having many more than 6.00.2x.

Course	# Videos	# Problems
6.00.1x	81	212
6.00.2x	43	156

Table 3. Table 3. Cutoff frequencies that affect grade distribution and corresponding p-values that are all statistically significant ( < < 0.05). N=591.

LBP	Cutoff Frequency	p-value
“return to most recently watched”	7	0.014
“return after long time”	10	0.033
“return to previously skipped”	8	0.012

Table 4. Table 4. Size of some Potentially Marginalized Student Groups in MITx 6.001, spring 2016

Group Category	Proportion	Absolute Amount
Students with low education degree⁶⁶6Contains students that declare primary school or junior high school as their highest level of education	3.5%	3,924
Students from low-income Economies	1.4%	1,547

Table 5. Table 5. Convolutional Autoencoder Structure for Course 2A , 2B , 2C . Activation function: ReLu. Input DATs are zero-padded to 44 × 64 44 64 44\times 64 .

Layer	Size-in	Size-out	Kernel(, Stride)
Conv1	$44 \times 64 \times 1$	$22 \times 32 \times 16$	( $2 \times 2$ ), $2$
Conv2	$22 \times 32 \times 16$	$11 \times 16 \times 32$	( $2 \times 2$ ), $2$
Conv3	$11 \times 16 \times 32$	$5 \times 8 \times 64$	( $2 \times 2$ ), $2$
Conv4	$5 \times 8 \times 64$	$2 \times 4 \times 128$	( $2 \times 2$ ), $2$
Fc1	$1024$	$10$	-
Fc2	$10$	$1024$	-
ConvTranspose1	$2 \times 4 \times 128$	$5 \times 8 \times 64$	( $3 \times 2$ ), $2$
ConvTranspose2	$5 \times 8 \times 64$	$11 \times 16 \times 32$	( $3 \times 2$ ), $2$
ConvTranspose3	$11 \times 16 \times 32$	$22 \times 32 \times 16$	( $2 \times 2$ ), $2$
ConvTranspose4	$22 \times 32 \times 16$	$44 \times 64 \times 1$	( $2 \times 2$ ), $2$

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline Learning and Analytics · Software System Performance and Reliability · Data Stream Mining Techniques

Full text

Using Detailed Access Trajectories for Learning Behavior Analysis

Yanbang Wang

Department of Computer Science and EngineeringThe Hong Kong University of Science and TechnologyClear Water Bay, KowloonHong Kong SARChina

[email protected]

,

Nancy Law

Faculty of EducationThe University of Hong KongPokfulamHong Kong SARChina

[email protected]

,

Erik Hemberg

Computer Science and Artificial Intelligence LaboratoryMassachusetts Institute of Technology32 Vassar StreetCambridgeMAUSA

[email protected]

and

Una-May O’Reilly

Computer Science and Artificial Intelligence LaboratoryMassachusetts Institute of Technology32 Vassar StreetCambridgeMAUSA

[email protected]

(2019)

Abstract.

Student learning activity in MOOCs can be viewed from multiple perspectives. We present a new organization of MOOC learner activity data at a resolution that is in between the fine granularity of the clickstream and coarse organizations that count activities, aggregate students or use long duration time units. A detailed access trajectory (DAT) consists of binary values and is two dimensional with one axis that is a time series, e.g. days and the other that is a chronologically ordered list of a MOOC component type’s instances, e.g. videos in instructional order. Most popular MOOC platforms generate data that can be organized as detailed access trajectories (DATs). We explore the value of DATs by conducting four empirical mini-studies. Our studies suggest DATs contain rich information about students’ learning behaviors and facilitate MOOC learning analyses.

Massive Open Online Course (MOOC), learning behavior pattern, learning design pattern, marginalized learners, representation learning

††copyright: acmcopyright††doi: ††isbn: ††conference: International Conference on Learning Analytics & Knowledge; March 2019; Arizona, US††journalyear: 2019††ccs: Information systems Clustering and classification††ccs: Human-centered computing Visual analytics††ccs: Applied computing E-learning

1. Introduction

The analysis of students’ learning behavior has been a major focus for MOOC learning analytics(Zhuoxuan et al., 2015; Rai and Chunrao, 2016; Davis et al., 2018; Boroujeni and Dillenbourg, 2018). Popular MOOC platforms, like Edx and Coursera, usually provide comprehensive click-stream logs of all interactions with the MOOC platform or organized data in BigQuery tables. This data enables us to perform learning behavior analytics at many different granularities and behavior categories. Aggregation is often an efficient approach to the analysis of the large quantity of students and activity data. Existing works have examined a wide range of perspectives, including general group behaviors and detailed event-wise individual browser post and get requests.

Here, we study MOOC learning behaviors with a detailed access trajectory (DAT). This representation allows us to study learning patterns and behaviors from the perspective of when a particular student accesses a particular component assuming the component is in an ordering by when it appears in course material. The DAT representation is inspired by (Halawa et al., 2014) where student learning trajectories are visualized as a step-like signal. This is shown to the top plot in Figure 1 as an “activity plot”. It shows the number of days from the start of the course on the horizontal axis and the unit access on the vertical axis. E.g. if on a day $d$ , a student accesses material from unit $u$ , then the plot has a mark at coordinate $(d,u)$ . This reflects how a student proceeds through the course, accessing unit material over time. DATs expand the original work by detailing one type of the course component, e.g.:

A) video watching,

B) problem submission, or

C) active and/or passive forum participation (reading and submitting).

The DAT gives insights on how many and different course components are viewed, skipped and revisited, when this happens, the length of time that a student is absent, when a student stops out, etc. In this paper, we leverage the advantages of DAT for three explorations:

(1)

Learning Behavior Patterns We visualize video watching DATs and observe a distinctive behavior pattern where the last video of the previous day is the first video the student watches on the next day. We ask whether this behavior, which could be interpreted as either knowledge reinforcement or video watching completion, is correlated to grade. We also observe two more distinctive patterns where a video introducing material early in the course is revisited much later on, or is skipped for the first time but revisited much later on. These behaviors, again, could have multiple interpretations. We investigate their correlation to grade also. 2. (2)

Learning Design Patterns We probe the possibility that DATs can inform learning design hypotheses about Learning Design Patterns. This is a core concern of designers and instructors when they are designing their course for efficient student learning. 3. (3)

Background Examination We use DATs to examine students from educational and geographical backgrounds that make them potentially marginalized. The DAT helps identify whether these students are struggling with their MOOC studies, potentially allowing them to receive appropriate help. 4. (4)

Dimensionality Reduction We ask how a large quantity of DATs can be summarized and mapped into a low dimensional embedding that allows them to be input to modeling or be analyzed with 2D visualization yielding potential for observing clusters

The rest of the paper is structured as follows: Section 2 presents related work. The courses we use for demonstration are introduced in Section 3. Section 4 defines a DAT. Section 5 covers Exploration(1). Section 6 covers Exploration(3). Section 7 we present findings to Exploration(4). Finally, Section 8 concludes and discusses future work.

2. Related Work

Explorations(1, 2) Learning Patterns

There has been a lot of research over the past two decades on teaching as a design science (Laurillard, 2013). Some work in studying learning design in the context of learning technologies has been inspired by (Alexander, 1979)’s concept of design patterns, which are “invariants” underlying successful designs. In the context of learning design, the core elements of a design pattern comprise descriptions of the “problem” (the learning outcome to be achieved), the context (the learning situation, including the course and student contexts), and the “solution” (the sequences of learning activities involving tangible, virtual and social interactions). Learning design patterns (LDP) are the (not necessarily conscious) assumptions teachers have about how students should interact with specific materials and engage in designated learning activities for effective learning (Law et al., 2017). Learning behavior patterns (LBP) are, in contrast, what students actually do. By identifying empirically the localized learning behavior patterns exhibited by students, teachers can find out:

(1) what proportion of the students actually exhibit behavior as intended by the teacher’s LDP, and whether those following the intended LDP exhibit better learning outcomes; (2) what other learning behavior patterns exist, and whether any of these patterns are strongly correlated with students’ learning success or failure;

(3) whether students’ adoption of the observed patterns were dependent on their contextual backgrounds, and

(4) whether the effectiveness of the observed learning behavior patterns interact with students’ contextual background.

Answers to the above questions would make significant contributions to teachers’ learning design knowledge and practices, providing evidence-based input to personalizing learning design that are sensitive to both the specific learning objectives targeted and the students’ contexts.

Exploration(3) Background Examination

Previous works have noted that students on popular MOOC platforms have highly diverse backgrounds (DeBoer et al., 2013a; Pursel et al., 2016; DeBoer et al., 2013b). Some studies took one step further by examining the correlation between students’ background and their learning behaviors. For example, (Hood et al., 2015) analyzed survey day from MOOC users and found that those with strong data science backgrounds differ significantly with other students in terms of their self-regulated learning;(Guo and Reinecke, 2014) investigated how students’ demographic background could affect their navigation strategies, and found that older students and students from countries with low student-teacher rate are more likely to do follow a steady learning pattern.

Though the connection between background and learning behaviors are widely studied, very few works studied marginalized group’s learning behaviors by analyzing their learning data. Many studies have mentioned the importance of studying marginalized student groups (Brugha and Restoule, 2016; Wilson, 2018; McAndrew and Scanlon, 2013).

Exploration(4) Dimensionality Reduction

The compact representation of a DATs is variable length (and the matrix representation impractical). Finding a fixed length, numerical vector that could represent a DAT would support its use in existing modeling contexts, such as

1) predicting grades by student’s learning behaviors (Ren et al., 2016; Elbadrawy et al., 2016; Meier et al., 2016; Xu and Yang, 2016);

2) student grouping (clustering) or subpopulation analysis (Corrin et al., 2017; Kizilcec et al., 2013; Ferguson and Clow, 2015);

3) transfer knowledge about student populations across courses(Boyer and Veeramachaneni, 2015; He et al., 2015).

These works require a numerical vector representations as input and reply upon by calculating a number of learning-related features (e.g. number of watched videos, frequency of login, etc.). Three major problems exist with such method:

(1)

Many features are highly correlated with each other. For example, ”number of watched videos” is a ubiquitously used strong feature that is highly correlated with other features such as ”number of assignment submissions”, ”number of forum posts”, ”number of video pauses”, etc.(used by(Corrin et al., 2017)) 2. (2)

Since all the features are manually designed, very often some aspects of learning behavior are subjectively overemphasized, while some others are ignored. This problem along with 1 often leads to strong bias in the final feature vector. 3. (3)

The handcrafted features are usually high-level statistical aggregations. However, a lot of information is contained in shorter time windows, such as the periodicity of material access and frequent material revisits over a short time. The manually designed features usually fail to capture such subtleties.

3. Demonstration Courses

We analyze two courses on Edx

A) MITx 6.00.1x Introduction to Computer Science and Programming Using Python,

B) MITx 6.00.2x Introduction to Computational Thinking and Data Science.

Each course has three offerings in 2016 and 2017. The student population and total activity of each offering varies, see Table 1, with diverse demographics. Each offering lasts 10 weeks. The final grade is the weighted sum of scores in finger exercises (weight = 0.1), problem sets (weight = 0.4), midterm or quiz (weight = 0.25), and final exam (weight = 0.25). Both courses have multiple units, where each unit has an associated graded problem set. Students are expected to watch lecture videos narrated by instructors and complete“finger exercises” - optional problems interspersed in lecture videos that teach the content discussed in the video. Forum participation is optional and each discussion forum contains thousands of posts. The topics of each course differ because one course is the continuation of the other. The quantities of videos and finger exercises is much higher in 6.00.1x, see Table 2.

For data preprocessing, for each offering, we reference multiple BigQuery tables and extract three DATs for each student using tables named, respectively:

A) video_stats_day,

B) person_problem,

C) forum_person.

These are equivalent to video accesses, problem set accesses and forum accesses (read or write). The generation of the first two DATs is straightforwardly done on Coursera platform but a forum-participation DAT is slightly more platform-dependent.

4. Definition: Detailed Access Trajectory

A DAT is logically envisioned as a 2D matrix $(DxN)$ where the row dimension is course days $(1,\dots,D)$ ) and the columns are ordered course components from $(1,\dots,N)$ . The ordering is how the components are presented within the course structure. At $(d_{i},c_{j})$ is either a $1$ or [math] to denote the student accessing the j’th component on the ith day. Because this matrix would be sparse and large, we use a compact representation that is a series that expresses only the entries set to 1 (for access) by their day and component indices.

5. Learning Patterns

5.1. Learning Behavior Patterns (LBP)

The point of this exploration is to determine if any learning behavior pattern (LBP) we can discern by visualizing the DAT is informative to teachers who start from assuming a LDP. We visualize video watching DATs to look for multiple ”localized video watching patterns”. We observe a distinctive behavior pattern where the last video of the previous day is the first video the student watches on the next day. For simplicity, we dub it “return to most recently watched”. When visualized the pattern looks like a step signal, as shown by the top plot of figure 2. It shows that the student seems to review or complete learning previously accessed knowledge when he/she starts learning everyday. We ask whether this behavior, which could be interpreted as either knowledge reinforcement or video watching completion, is correlated to grade.

Adopting the same method, we also observe two more distinctive patterns in this way 1)“return after long time”: a video introducing material early in the course is revisited much later on; 2) “return to previously skipped”: a video introducing material early in the course is skipped over at the first but is revisited much later on. Both behaviors, again, could have multiple interpretations. For “return after long time” one possible interpretation is that the student is actively learning, explicitly deciding to review previous material. Another is that the video is left unfinished. To distinguish this behavior with the “return to most recently watched” behavior, we only consider a pattern as “return after long time” when a person re-watches a video after at least one active day when he/she did not watch that video, illustrated by the top plot of figure 3. “return to previously skipped” possibly illustrates a latent pattern of active learning: a student realizing that certain lecture videos skipped early on for some reasons actually matter, and so he/she consciously seeks those lecture videos to pick up the missing knowledge.

Scatter plots on the bottom of the three figures visualize the correlation between occurrence frequencies of the local pattern and grade. Figure 2 shows that most students do not “return to most recently watched” video very often. While no significant difference in grade distribution can be observed on students that do it occasionally, students who “return to most recently watched” more regularly often get high grades. This intriguing pattern is observed on all the six offerings, which suggests that when a student’s “return to most recently watched” learning behavior is regular, that student has a strong correlation with high course grade. Figure 3 and 4 can also be interpreted in a similar way (observe that grades of student with high learning pattern frequency in Figure 3 and 4 concentrate on the upper half of the plot respectively). This indicates that a student has a higher likelihood to receive a high grade in the course with more frequent “return after long time” and “return to previously skipped” behavior.

Further statistical testing is conducted to verify our observations as well as to determine the exact cutoff pattern frequencies that most prominently affect students’ grade. This is done by iterating though all possible values of cutoff frequencies (from 1 to max pattern frequency $-$ 1) and selecting one that results highest p-values of one-tailed t-test. Table 3 summarize the aforementioned cutoff frequencies of LDP that affect grade, and their corresponding p-values that measure how strongly students’ grades are affected by exhibiting learning pattern frequencies below or above the cutoff value.

5.2. Learning Design Pattern

Learning is not only about how much effort one spends on learning, but also about how one distributes the effort. Various hypotheses and theories exist regarding specific patterns of learning materials access and learning activity engagement that student can follow to achieve better course performance. Course instructors can design their courses based on pedagogical theories and their own professional experience to encourage such learning patterns and thus help students learn better. Design patterns that instructors adopt in their course design for students to follow we refer to as learning design pattern (LDP) (Law et al., 2017).

LDPs are usually difficult to evaluate and justify with traditional experimental methods, due to limited observation size and data collection difficulties. The advent of MOOCs provides valuable opportunities to examine LDPs on a larger scale with more visual and statistical analysis. For example, education experts and the MITx course designer have jointly identified a LDP regarding watching video and participating in forum discussion in a the MITx 6.00.1x course. Course instructors from MIT identified similar LDPs in the other computational-thinking course we investigate. Supported by DATs, we perform an exploration of one LDP identified by experts. The LDP states that ”students should participate in forum discussions shortly after they watch a video”. Our analysis will help answer the following questions:

•

How many student exhibit the learning behavior pattern that aligns with the LDP? Is the number significant enough relative to the course population?

•

What are the grades of student who did or did not exhibit such LDP aligned behavior patterns?

•

Are there any ambiguities in the definition of the LDP? How might such ambiguities affect designer conclusions?

We start by counting the number of different groups of certified participants111We restrict our exploration to certified student, who received a certificate for finishing the course after the course end. including video watchers, forum viewers and participants, etc. We calculate the average overall course grades (on a unit scale) of student within each group, along with their grade variance. The aggregations are visualized in figure 6. We can see that in the smallest offering, MITx 6.002 spring 2017, almost all certified course participants watched the videos and viewed the forums. However, less than half of the certified student participate in the forum discussions.

With regard to the specific learning design pattern, we see from the fifth and sixth bar that most student that watch videos and participate in forums follow the specific learning design pattern222There is ambiguity in the statement as to what exactly ”shortly” means: it could refer to a time lag of one, two or three days, or even longer..

The red line in figure 6 indicates that average grades do not vary significantly among different groups of student, especially on the last two rows that we care most about. We then performed two-tailed t-test on the last two groups:

•

$H_{0}$ : Students viewing forums within 2 days after watching videos (Group Y, hereafter) have the same average as the students never viewing forums within two days after watching videos (Group N, hereafter) ;

•

$H_{1}$ : Group Y does not have the same average grade as Group N.

The p-value is 0.6841, which is statistically insignificant. The same analysis was repeated on the other 5 offerings with no significant differences. This leads us to conclude that given the large variance of within-group grade (as visualized by the red error bars), no strong correlation is observed between the LDP and grade. Previous studies similarly mentioned about weak correlation between forum participation and grade among students who pass the course (Wise and Cui, 2018).

Despite the insignificance, we can still investigate this LDP in more detail. We ask the question: are students more likely to go to a forum shortly after watching a lecture video? Notice that the key phrase ”shortly” is imprecisely defined. In previous analysis, we heuristically set the length of ”shortly” to be two days. Here, ”shortly after” is further parameterized to an offset of n days, where n takes an integer value333 $n$ can be negative — in that case the LDP becomes ”viewing a forum $|n|$ days before watching a lecture video”. We define two events for a student 4441) The problem is modeled such that for a fixed $n$ , every student the identical independent distribution of the following events. 2) We discussion only students that both watch videos and view forums:

•

Event $A$ : the student watches at least one video on day $x$ ;

•

Event $B$ : the student views at least one forum thread on day $x+n$ .

$x$ could be any integer number within the range of course duration555For simplicity, we ignore the edge effect of $x$ approaching the beginning or end of the course. Bernoulli random variable $A,B$ respectively marks the probability that event $A,B$ happens (r.v. $A$ and event $A$ used interchangeably hereafter, same for $B$ ). An unbiased estimation of $P(B|A)$ for fixed n can therefore be obtained from a DAT with 3 steps:

(1)

For each student, count the number of video-watching days $v\_days$ from the student’s video-watching DAT. For each video watching day, check forum DAT to see if there is a forum view record on n-th day after that day. Count the number of video watching days that have such a paired forum-view day, $v\_f\_days$ . 2. (2)

Sum up $v\_days$ and $v\_f\_days$ for all students to obtain $total\_v\_days$ and $total\_v\_f\_days$ ; 3. (3)

$\hat{P}(B|A)$ = $total\_v\_days/total\_v\_f\_days$ ;

With this analysis we find that $N$ for the smallest offering Course 2C is 20,541, which is statistically significant with a small 99% confidence interval. Figure 5 plots the estimated conditional probability $\hat{P}(A|B)$ (red) against corresponding time offset parameter $n$ in Figure 5. It is clearly observed that $\hat{P}(B|A)$ peaks at zero offset, and then drops rapidly sideway, meaning that if a random student in the course watches a video, then the student has the highest probability of viewing a forum on the same day, compared to other days within a range of 10 days centered around. The second highest probability is exactly one day after watch a video. Similar patterns are observed in all other offerings.

The blue line in figure 5 further shows the estimated natural distribution of r.v. $B$ with $N=153,792$ . It means that on a random day, a random student has around 0.075 probability to view at least one forum. More importantly, it could be clearly observed that the $\hat{P}(B|A)$ and $\hat{P}(B)$ has very different distribution for each offset n. Further KS tests show that we have p-value ¡ 0.0001 to reject this hypothesis. Therefore, with high confidence level we conclude that $P(B|A)\neq P(B)$ , and thus $A$ and $B$ are dependent. In other words, we conclude that the LDP is a strong pattern, though its grade implication is quite weak.

6. Under-Represented Student Groups

Few works have concretely identified different potentially marginalized groups. In this section we identify several under represented groups and ask, by examining their DATs, if they experience any observable marginalization. We adopt the definition of ”marginalized group” as ”groups who have not been as successful as others at achieving educational success, students who find their current curriculum either too challenging or not sufficiently demanding”.

We first extract some under-represented groups based on the distribution of highest education level (Figure 7) and Gross National Income (GNI) per capita (USD) of country of origin (Table 4).

We use DATs in the following ways: First, a student’s DATs potentially contains a dropout timestamp777Dropout time-stamp are e.g. the last time the student access videos, or the last time the student access any materials. It also contains information about ”video replay” behavior, which has one possible interpretation of one ”finding certain videos difficult to digest”, and thus could indicate where the students need help. Finally, DAT can be intuitive to interpret, so in small quantities we can visually inspect the DAT’s of a group. The underrepresented groups we study here are:

•

students whose declared highest education level is either primary school or junior high school

•

students from low-income economy entities classified by World Bank based on GNI per capita (Group, 2016)

6.1. Students with Low Education Background

Before MIT released Introduction to Computational Thinking and Data Science on Edx, the course was offered in traditional classroom to MIT students, with the prerequisite of the course Introduction to Computer Science and Programming Using Python or have the same level of knowledge background. The MOOC course attracts students from all educational backgrounds, with 3.5% of the students have only primary school or junior high school degrees. Therefore, one concern is whether those students find the course materials too challenging, and, if they do, which part is most challenging. We leverage the latent cognitive meaning of ”replaying videos” — a student plays a lecture video for a second time since the student find the content challenging. We group students considered low in their education background, and count the total number of their replays for each video. As a control, we do the same to the group of student that have college-level education, who account for the majority of the course population (around 60%).

Figure 8 visualizes the result after scaling y-axis for both groups. We notice that both groups of students follow similar patterns of replaying lecture videos throughout the course. However, the red(low student) line fluctuates more than the blue (control) line does, which indicates that students with low education background at the least are distinctive and could be sensitive to the varying difficulty of course materials. One-tailed t-test is performed on mean absolute values of normalized frequency fluctuations of low-education (target) group and high-education (control) group replay the lecture videos more frequently. We conclude with 99% confidence level that the target group is more sensitive to difficulty changes in lecture videos.

This prompts us to investigate videos associated with prominent fluctuations. Video 8 introduces the students to the implementation of graph models and video 22 introduces confidence intervals. Both cover advanced topics that would be rarely touched by students before college, so it makes sense that students with lower education background replay these more. For example, figure 9 shows a typical student struggling at video 22. The analysis indicates that to help these group of students Edx could provide more reference materials for these two videos.

6.2. Students from Low-income Economies

According to the standard of(Group, 2016) issued by the World Bank in 2016, students from low-income economies have limited access to education. The lack of computers, adequate bandwidth, and quality educations all pose great challenges for them to perform well in a MOOC. In the largest offering Course 1A, We discover that out of the 182 students from low-income economies (Top 3: Uganda, Nepal, Ethiopia), only 50 watched videos at all, and only 4 watched more than 1/4 lecture videos. The four students’ video-watching DAT’s are shown in Figure 10. All struggle with their learning according to the DAT, even though none dropped out early, nobody proceeded beyond video 11. As a comparison, around 45% of the average MOOC users in the same offering proceed beyond video 11. This can imply that MOOCs are a form of remote education still inaccessible to many marginalized students and many of our previous dropout analysis fall short of representing some students that are really in need of help.

7. Student Learning Representation with DAT

In this section, we present the empirical studies we conduct with DAT that learn 10-dimensional vector representations and visualize data of many students. To examine the effectiveness of our learned representations, we project them to 2D space and visualize them together with the student’s grade (encoded by the color).

7.1. Traditional Feature Embedding as Baseline

DAT supports traditional scheme of features extraction. For video-watching DAT, We devised 10 features to summarize user’s video watching behaviors: n_unique_videos (number of unique videos watched), n_days (number of active days (of video access)), ave_day_intervals (average length of gaps between two consecutive active days), var_days (variance of active days), ave_video_intervals (average length of index gaps between two consecutive watched videos), var_videos (variance of indices of watched videos), rate_videos_repeats (proportion of replayed videos),* n_videos_per_day * (average number of videos watched per active day), var_day_intervals (variance of lengths between two consecutive active days), var_video_intervals (variance of indices gaps between two consecutive watched videos)

We obtain 10-dimensional feature vectors for each student by calculating corresponding features. The features are standardized to eliminate the bias introduced by the absolute values. To examine the obtained representations, we project the features to 2D spaces with T-SNE and visualize the coordinate vectors with grades encoded by color. The result is shown in figure11.

7.2. Feature Embedding from a Distance Matrix

The time-series described by DAT can be automatically transformed into features by machine learning. Motivated by (Mikolov et al., 2013), we notice that directly declaring a pool of ”good features” to summarize learning behaviors is difficult and sometimes controversial. So, we convert the problem to the challenge of finding a measure of the distance between a pair of two-dimensional time series. Dynamic Time Warping(DTW) proposed by (Berndt and Clifford, 1994) serves this purpose well, because it measures the distance between most similar segments in 2D time series. The overall distance is obtained by summing up the pairwise distance between points on most similar segments in the time series, see Figure 13. We also normalize the distance by number of matched pairs to eliminate the effect of the time series length. When we perform the distance calculation for each pair of students, a distance matrix of size $n\times n$ is obtained, where $n$ is the total number of students in the course iteration.

Next, with the distance matrix, we use multi-dimension scaling(MDS) to find a set of embeddings (vectors) in 10-dimensional space that best preserve the pairwise distances given by the distance matrix. In this way, the embeddings become representations for a student’s learning trajectory. Again, we project the representations to 2D spaces with T-SNE and visualize the 2D coordinates (vectors), encoding students’ grades with colors. Figure 11 shows that video-day graphs correlate with a student’s grade, with high-grade students concentrating in a cluster. Meanwhile, students with higher grades watch videos in more similar manners than students with lower grades do, leading to the high-grade samples clustering densely at one place while low grade students are scattered.

More importantly, since DTW considers both content temporal domain information, it takes into consideration more comprehensive information than traditional method does. While studies like (Halawa et al., 2014) report ”recognizing four common persistence patterns that represent the majority of MOOC students”, it is also important to point out that technically the inter-cluster distances are not large enough compared to the within-cluster ones. In other words, those prototypes are not observed from the video watching visualization.

7.3. Feature Embedding with a Convolutional Autoencoder

A drawback with DTW is that it essentially ”smoothes” all items in the time series, rendering it less sensitive to local video watching patterns (i.e. replaying the same videos a number of days in a row, watching many videos in a single day, etc.). We are inspired to harness the power of convolutional neural network(CNN) to recognize the local video watching patterns. We construct a CNN autoencoder(CNN-AE), and treat each student’s video-day activity graph as input to the autoencoder. In other words, every two-tuple $(d_{i},v_{i})$ in the time series maps to a pixel valued 1 at position $(d_{i},v_{i})$ on the image. Tables 5 in Appendix gives the CNN-AE’s structure.

We encode the activity to 10 dimensions and then decode it with the aim to condense the most important information in the reduced embedding. The embedded vector representation is then treated the same as the feature vector obtained in Section 7.1 — being projected to 2D space and plotted. Figure 11 plots the 2D embeddings encoded by course grade on unit scale, with high grade students forming one cluster. Figure 12 shows two typical video-day graphs and their reconstructed counterpart.

Figure 12a shows that the CNN-AE is capable of reconstructing the closest video watching patterns of the video-day graphs. Despite visible blurring and missing pixels, the reconstruction result is by and large satisfactory. The CNN-AE do not form multiple clusters. It is also noticed from (a) and (c) of the figure that projections form clear separation at around grade = 0.6, which is the passing line of the course. People’s learning behaviors become more distinct when they approach the passing grade. Some of them give up and their projections go to one side, while others persist through and their projections move to the other side.

8. Conclusions & Future Work

In this paper, we introduce a new perspective of learning behavior analytics in MOOC’s: the detailed access trajectory. Its underlying two-dimensional time series facilitate quantitative analysis of learning behaviors. We demonstrate DAT’s research value via a empirical studies including representation on two courses regarding introduction to programming and computational thinking.

We observe Learning Behavior Patterns from video watching with DATs and observe three distinctive behavior patterns that are correlated with high grade:

(1) “return after long time”,

(2) “return to most recently watched”,

(3) “return to previously skipped”.

DATs are also capable of informing learning designers about LDPs, a core concern of designers and instructors. We find evidence of an LDP regarding students watching a video and then going to the forum, but no significant correlation with grade is found for students that exhibits this behavior. In addition, we use DATs to examine students from educational and geographical backgrounds that make them potentially marginalized. The DAT helps identify specific videos students with lower educational background are struggling with, as well as observing that students watching MOOC videos from countries with low GNI watch very few videos. Finally, we explored summarizing DATs and mapping them into a low dimensional embedding for visualization and clustering.

In the future, a more systematic examination combinations of the three categories of DAT can be performed. More LDPs can be identified and queried. In addition, dropout labeling is enabled by DAT, and an interesting topic going forward is to look for local learning patterns that indicate the dropout in the near future. Finally, there is more work to be done on under-represented groups of students.

Appendix

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Alexander (1979) Christopher Alexander. 1979. The timeless way of building . Vol. 1. Oxford University Press, New York, New York.
3Berndt and Clifford (1994) Donald J Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series.. In KDD workshop . AAAI Press, 44 West 4th Street, New York, New York 10012-1126, 359–370.
4Boroujeni and Dillenbourg (2018) Mina Shirvani Boroujeni and Pierre Dillenbourg. 2018. Discovery and temporal analysis of latent study patterns in MOOC interaction sequences. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge (LAK ’18) . ACM, New York, NY, USA, 206–215.
5Boyer and Veeramachaneni (2015) Sebastien Boyer and Kalyan Veeramachaneni. 2015. Transfer Learning for Predictive Models in Massive Open Online Courses. In Artificial Intelligence in Education , Cristina Conati, Neil Heffernan, Antonija Mitrovic, and M. Felisa Verdejo (Eds.). Springer International Publishing, Cham, 54–63.
6Brugha and Restoule (2016) Meaghan Brugha and Jean-Paul Restoule. 2016. Examining the learning networks of a MOOC. Data Mining and Learning Analytics: Applications in Educational Research (2016), 121.
7Corrin et al . (2017) Linda Corrin, Paula G. de Barba, and Aneesha Bakharia. 2017. Using Learning Analytics to Explore Help-seeking Learner Profiles in MOO Cs. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference (LAK ’17) . ACM, New York, NY, USA, 424–428. https://doi.org/10.1145/3027385.3027448 · doi ↗
8Davis et al . (2018) Dan Davis, René F. Kizilcec, Claudia Hauff, and Geert-Jan Houben. 2018. The Half-life of MOOC Knowledge: A Randomized Trial Evaluating Knowledge Retention and Retrieval Practice in MOO Cs. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge (LAK ’18) . ACM, New York, NY, USA, 1–10. https://doi.org/10.1145/3170358.3170383 · doi ↗