Optimizing for Aesthetically Pleasing Quadrotor Camera Motion
Christoph Gebhardt, Stefan Stevsic, Otmar Hilliges

TL;DR
This paper investigates aesthetic preferences in aerial videography, then develops a novel optimization method that balances smoothness and user control, resulting in improved video quality and usability.
Contribution
It introduces a large-scale study on aerial video aesthetics and proposes a joint optimization method for smooth and user-controlled quadrotor camera trajectories.
Findings
Optimized trajectories are perceived as more aesthetically pleasing.
The method outperforms state-of-the-art in usability and preference.
Novices and experts both benefit from the proposed approach.
Abstract
In this paper we first contribute a large scale online study (N=400) to better understand aesthetic perception of aerial video. The results indicate that it is paramount to optimize smoothness of trajectories across all keyframes. However, for experts timing control remains an essential tool. Satisfying this dual goal is technically challenging because it requires giving up desirable properties in the optimization formulation. Second, informed by this study we propose a method that optimizes positional and temporal reference fit jointly. This allows to generate globally smooth trajectories, while retaining user control over reference timings. The formulation is posed as a variable, infinite horizon, contour-following algorithm. Finally, a comparative lab study indicates that our optimization scheme outperforms the state-of-the-art in terms of perceived usability and preference ofâŚ
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14| Symbol | Description |
|---|---|
| , , , | Quadrotor position, velocity, acceleration and jerk |
| , , , | Quad. yaw and angular velocity/acceleration/jerk |
| , , , | Gimbal yaw and angular velocity/acceleration/jerk |
| , , , | Gimbal pitch and angular velocity/acceleration/jerk |
| , | Quadrotor states and inputs |
| , | System matrices of quadrotor |
| Gravity | |
| Trajectory end time | |
| Horizon length | |
| Progress parameter | |
| Reference spline () | |
| Positional reference () | |
| Pitch reference | |
| Yaw reference | |
| Time reference | |
| Velocity reference | |
| , | Progress state and input |
| , | System matrices of progress |
| , | Approximate lag and contour error |
| Value | ||
|---|---|---|
| Weight (layed on) | Online survey | User study |
| (position) | 1 | (user-specified) |
| (lag, contour err.) | ||
| (yaw) | 1 | (user-specified) |
| (pitch) | 1 | (user-specified) |
| (jerk) | 10 | 100, 10 (if ) |
| (end-time) | 0 | 1, 0 (if ) |
| (length in t.) | 1 | 1 |
| (timing) | 0 | 100 |
| (velocity) | 0 | 100 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Optimizing for Aesthetically Pleasing Quadrotor Camera Motion
Christoph Gebhardt
AIT Lab, ETH ZĂźrich
,Â
Stefan StevĹĄiÄ
AIT Lab, ETH ZĂźrich
 andÂ
Otmar Hilliges
AIT Lab, ETH ZĂźrich
Abstract.
In this paper we first contribute a large scale online study () to better understand aesthetic perception of aerial video. The results indicate that it is paramount to optimize smoothness of trajectories across all keyframes. However, for experts timing control remains an essential tool. Satisfying this dual goal is technically challenging because it requires giving up desirable properties in the optimization formulation. Second, informed by this study we propose a method that optimizes positional and temporal reference fit jointly. This allows to generate globally smooth trajectories, while retaining user control over reference timings. The formulation is posed as a variable, infinite horizon, contour-following algorithm. Finally, a comparative lab study indicates that our optimization scheme outperforms the state-of-the-art in terms of perceived usability and preference of resulting videos. For novices our method produces smoother and better looking results and also experts benefit from generated timings.
computational design, aerial videography, quadrotor camera tools, trajectory optimization
â â copyright: acmlicensedâ â journal: TOGâ â journalyear: 2018â â journalvolume: 37â â journalnumber: 4â â article: 90â â publicationmonth: 8â â doi: 10.1145/3197517.3201390â â ccs: Computing methodologies Computer graphicsâ â ccs: Computing methodologies Motion path planningâ â ccs: Computing methodologies Robotic planningâ â ccs: Computer systems organization External interfaces for robotics
1. Introduction
Camera quadrotors have become a mainstream technology but fine-grained control of such camera drones for aerial videography is a high-dimensional and hence difficult task. In response several tools have been proposed to plan quadrotor shots by defining keyframes in virtual environments [Gebhardt and Hilliges, 2018; Gebhardt et al., 2016; Joubert et al., 2015; Roberts and Hanrahan, 2016]. This input is then used in an optimization algorithm to automatically generate quadrotor and camera trajectories. Intuitively, smooth camera motion is an obvious factor impacting the visual quality of a shot. This intuition alongside expert-feedback [Joubert et al., 2015] and literature on (aerial) cinematography [Arijon, 1976; Audronis, 2014; Hennessy, 2015] forms the basis for most existing quadrotor tools. These take a spline representation, connecting user specified keyframes, and optimize higher derivatives of these splines, such as jerk.
Balasubramanian et al. [2015] define global smoothness as âa quality related to the continuity or non-intermittency of a movement, independent of its amplitude and durationâ. However, because keyframe timings are kept fixed in current quadrotor camera optimization schemes [Gebhardt et al., 2016; Joubert et al., 2015], or close to the user input [Roberts and Hanrahan, 2016], smooth motion can only be generated subject to these hard-constraints. This can cause strong variation of camera velocities across different trajectory segments and result in visually unpleasant videos.
Consider popular fly-by-shots, such as the one illustrated in Figure 1, where an object is filmed first from one direction and then gradually the camera yaws around itâs own z-axis by as the quadrotor flies past the object until it is filmed from the opposing direction. To achieve visually pleasing footage both the quadrotor motion and the cameraâs angular velocity need to be smooth. Users generally struggle with this or similar problems in which they place the keyframes in the correct spatial location but too close (or too far) to each other temporally (see Figure 1, top and [video]). This is indeed a difficult task because keyframes are specified in 5D (3D position and camera pitch and yaw) and imagining the resulting translational and rotational velocities is cognitively demanding.
Although existing work provides UI tools (i.e. progress curves, timelines) to cope with this problem, it has been shown that users, especially novices, struggle to create smooth camera motion over a sequence of keyframes [Gebhardt and Hilliges, 2018]. While optimizing for global smoothness may address this issue for novices, an interesting tension arises when looking at experienced users. Experts explicitly time the visual progression of a shot in order to achieve desired compositional effects [Joubert et al., 2015] (e.g. ease-in, ease-out behavior). Our first contribution is a large online study (), highlighting this issue, where non-expert designed videos were rated more favorable when optimized for global smoothness while expert-designed videos were perceived as more pleasing with hard-constrained timings. To the best of our knowledge, this is the first study that provides empirical evidence for global smoothness indeed being important for the perception of aerial videography.
Embracing this dichotomy (of smoothness versus timing control), our second contribution is a trajectory optimization method that takes smoothness as primary objective and can re-distribute robot positions and camera angles in space-time. We propose the first algorithm in the area of quadrotor videography that treats keyframe timings and positions, and reference velocities as soft-constraints. This extends the state-of-the-art in that it allows users to trade off path-following fidelity with temporal fidelity. Such a formulation poses significant technical difficulties. Prior methods incorporate keyframe timings as hard-constraints, yielding a quadratic and hence convex optimization formulation (depending on the dynamical model), allowing for efficient implementation. In contrast, we formulated the quadrotor camera trajectory generation problem as a variable, infinite horizon, contour-following algorithm applicable to linear and non-linear quadrotor models. Our formulation has to discretize the model at each solver iteration according to the optimized trajectory end time. Although this formulation is no-longer convex, it is formulated as well-behaved non-convex problem and our implementation runs at interactive rates.
Finally, we show the benefit of our method compared to the state-of-the-art in a lab study in which we compare different variants of our method with [Gebhardt et al., 2016]. It can be shown that our method positively effects the usability of quadrotor camera tools and improves the visual quality of video shots for experts and non-experts. Both benefit from using an optimized timing initially, fine-tuning it according to their intention. In addition, the user study revealed that timing control does not need to be precise but is rather used to control camera velocity in order to create a certain compositional effect.
2. Related Work
Camera Control in Virtual Environments:
Camera placement [Lino and Christie, 2012], path planning [Yeh et al., 2011; Li and Cheng, 2008] and automated cinematography [Lino et al., 2011] have been studied extensively in virtual environments (VE), for a survey see [Christie et al., 2008]. These works share our goal of assisting users in the creation of camera motion (e.g., [Drucker and Zeltzer, 1994; Lino and Christie, 2015]). Nevertheless, it is important to consider that VEs are not limited by real-world physics and robot constraints, hence may yield trajectories that can not be flown by a quadrotor.
Character Animation:
In character animation, a variety of methods exist which are capable of trading-off positional and temporal reference fit to optimize for smoother character motion. In [Liu et al., 2006], the authors specify constraints in warped time and then optimize the mapping between warped and actual time according to their objective function. For an original motion, [McCann et al., 2006] find the convex hull of all physically valid motions attainable via re-timing. Plausible new motions are then found by performing gradient descent and penalizing distance between possible solutions and the feasible hull. Like [Liu et al., 2006], our formulation is based on a time-free parameterization of a reference path. In contrast to the character animation methods, we adjust timings by optimizing the progress of the quadrotor camera on the reference according to a objective favoring smoothness. Unlike [McCann et al., 2006] our formulation does not require nested optimization.
Trajectory Generation:
Trajectory generation for dynamical systems is a well studied problem in computer graphics [Geijtenbeek and Pronost, 2012] and robotics [Betts, 2009]. Approaches that encode the system dynamics as equality constraints to solve for the control inputs along a motion trajectory are referred to as spacetime constraints in graphics [Witkin and Kass, 1988] and direct collocation in robotics [Betts, 2009]. Used out-of-the box such approaches can lead to slow convergence time especially with long time horizons (cf. [Roberts and Hanrahan, 2016]).
With the commoditization of quadrotors, the generation of drone trajectories shifted into the focus of research. Exploiting the differential flatness of quadrotors in the output space, [Mellinger and Kumar, 2011] generated physically feasible minimal snap trajectories. Several methods exist for the generation of trajectories for aggressive quadrotor flight [Mellinger et al., 2012; Bry et al., 2015]. Traditionally, these methods convert a sequence of input positions into a time-dependent reference and based on a dynamical model generate a trajectory which follows this reference. For [Mellinger and Kumar, 2011; Bry et al., 2015], time optimization is done in a cascaded manner where an approximated gradient descent for keyframe timings is calculated based on the original optimization problem. These formulations suffer from very long runtimes as the original problem needs to be called once for each keyframe to calculate the gradient approximation. In contrast, our method optimizes keyframe timings and trajectory jointly reducing optimization runtime and allowing to trade-off temporal and positional fit. In [Mellinger et al., 2012], sequentially composed controllers are used to optimize the timing of a trajectory such that physical limits are not violated given desired feed-forward terms. Our work does not only ensure physical feasibility but is also capable of generating trajectories with different dynamics (smooth and more aggressive) for the same spatial input.
Computational Support of Aerial Videography:
A number of tools for the planning of aerial videography exists. Commercially available applications and consumer-grade drones often place waypoints on a 2D map [APM, 2016; DJI, 2016; Technology, 2016] or allow to interactively control the quadrotorâs camera as it tracks a pre-determined path [3D Robotics, 2015]. These tools generally do not provide means to ensure feasibility of the resulting plans and do not consider aesthetic or usability objectives in the video composition task. The planning of physically feasible quadrotor camera trajectories has recently received a lot of attention. Such tools allow for planning of aerial shots in 3D virtual environments [Gebhardt and Hilliges, 2018; Joubert et al., 2015; Gebhardt et al., 2016; Roberts and Hanrahan, 2016] and employ optimization to ensure that both aesthetic objectives and robot modeling constraints are considered.
In [Joubert et al., 2015] and [Gebhardt et al., 2016], users specify keyframes in time and space. These are incorporated as hard-constraints into an objective function. Solving for the trajectory only optimizes camera dynamics and positions. This causes the generation of locally smooth camera motion (between keyframes) but can lead to varying velocities across keyframes. Joubert et al. [2015] detect violations of the robot model constraints. However, correcting these violations is offloaded to the user. In contrast, by generating timings or incorporating them as soft-constraints our optimization returns the closest feasible fit of the user-specified inputs, subject to our robot model, and generates globally smooth quadrotor camera trajectories. [Gebhardt and Hilliges, 2018] re-optimizes keyframe timings in a cascaded optimization scheme. Here an approximated gradient on the keyframe times produced by the optimization formulation of [Gebhardt et al., 2016] is calculated and used to improve visual smoothness. However, this approach is relatively slow and the paper reports that users therefore did not make significant use of it in the evaluation. In contrast, our method runs at interactive rates optimizing trajectories of different duration within seconds (avg. ). Roberts and Hanrahan [2016] take physically infeasible trajectories and compute the closest possible feasible trajectory by re-timing the trajectories subject to a non-linear quadrotor model. In contrast, we prevent trajectories from becoming infeasible at optimization time. Although the method of [Roberts and Hanrahan, 2016] theoretically can be used to adjust timings based on a jerk minimization objective, our method can also trade-off the positional fit of a reference path in order to achieve even smoother motion.
Recently, several works address the generation of quadrotor camera trajectories in real-time to record dynamic scenes. [Galvane et al., 2016; Joubert et al., 2016] plan camera motion in a lower dimensional subspace to attain real-time performance. Using a Model Predictive Controller (MPC), [Naegeli et al., 2017] optimizes cinematographic constraints, such as visibility and position on the screen, subject to robot constraints for a single quadrotor. [Nägeli et al., 2017] extends this work for multiple drones and allows actor-driven tracking on a geometric path. Focusing on dynamic scenes, this work does not cover the global planning aspects of aerial videography.
Online Path Planning:
Approaches that address trajectory optimization and path following have been proposed in the control theory literature. They allow for optimal reference following given real world influences. Methods like MPC [Faulwasser et al., 2009] optimize the reference path and the actuator inputs simultaneously based on the system state. MPC has been successfully use for the real-time generation of quadrotor trajectories [Mueller and DâAndrea, 2013]. Nevertheless, [Aguiar et al., 2008] show that the tracking error for following timed-trajectories can be larger than for following a geometric path only. Motivated by this observation Model Predictive Contouring Control (MPCC) [Lam et al., 2013] has been proposed to follow a time-free reference, optimizing system control inputs for time-optimal progress. MPCC approaches have been successfully applied in industrial contouring [Lam et al., 2013] and RC racing [Liniger et al., 2014]. Recently, [Nägeli et al., 2017] extended the MPCC-framework to allow for real-time path following in 3D space with quadrotors. We propose a trajectory generation method that is conceptually related to MPCC formulations in that it optimizes timings for a quadrotor camera trajectory based on a time-optimal path-following objective. Our formulation treats keyframes, user specified reference timings and velocities as well as smoothness across the entire trajectory jointly in a soft-constrained formulation and allows users to produce aesthetically more pleasing videos.
3. Method
We propose a new method to generate globally smooth quadrotor camera trajectories. Our aim is to allow even novice users to design complex shots without having to explicitly reason about 5D spatio-temporal distances. Our central hypothesis is that smoothness across the entire trajectory matters and hence is the main objective of our optimization formulation. We first introduce the model of the system dynamics in Section 3.1 and discuss our optimization formulation in Section 3.2-3.5. See Appendix A for a table of notations.
3.1. Dynamical Model
We use the approximated quadrotor camera model of [Gebhardt et al., 2016]. This discrete first-order dynamical system is incorporated as equality constraint into our optimization problem:
[TABLE]
where are the quadrotor camera states and are the inputs to the system at stage . Furthermore, is the position of the quadrotor, is the quadrotorâs yaw angle and and are the yaw and pitch angles of the camera gimbal. The matrix propagates the state forward, the matrix defines the effect of the input on the state and the vector that of gravity for one time-step. is the the force acting on the quadrotor, is the torque along its z-axis and , are torques acting on pitch and yaw of the gimbal.
Please note that our formulation is agnostic to the dynamical model of the quadrotor. We verified this by incorporating the non-linear model of [Nägeli et al., 2017]. Qualitatively this does not impact results, yet the computational cost increases (see Figure 5).
3.2. Variable Horizon
In space-time optimization, the horizon length is defined by dividing the timing of the last keyframe by the discretization step . However, one key idea in our formulation is that we treat trajectories, at least initially, as time free. In particular, our method does not take timed keyframes as input and therefore traditional approaches to determining the horizon length are not applicable.
Taking inspiration from MPC literature [Michalska and Mayne, 1993], we make the length of the horizon an optimization variable itself by adding the trajectory end time into the state space of our model ( with ). This has implications for the dynamical model. At each iteration of the solver we adjust the discretization step . Here is the number of stages in the horizon spanning the entire trajectory. The forward propagation matrices and are also recalculated based on the current .
3.3. Reference Tracking Metric
We require a time-free parameterization of the reference to optimize the timing of keyframes. We use a chord length parameterized, piecewise cubic polynomial spline in hermite form (PCHIP) to interpolate the user-defined keyframes [Fritsch and Carlson, 1980]. The resulting chord length parameter describes progress on the spatial reference path defined as . To prevent sudden changes of the progress parameter, we add into our model and formulate its dynamics with the following linear discrete system equation:
[TABLE]
where is the state and is the input of at step and , are the discrete system matrices. Intuitively, approximates the quadrotorâs acceleration as is an approximation of the trajectory length.
With this extension of the dynamic model in place, we now formulate an objective to minimize the error between the desired quadrotor position and the current quadrotor position . With respect to the time optimization we want the quadrotor to follow as closely as possible in time (no lag) but allow deviations from its contour for smoother motion. This distinction is not possible when minimizing the 2-norm distance to the reference point. For this reason, we differentiate between a lag and a contour error similar to MPCC approaches (e.g., [Lam et al., 2013]). We approximate the true error from the spline by using the 3D-space approximation of lag and a contour error of [Nägeli et al., 2017] (see Figure 2, inset). The approximated lag error is defined as,
[TABLE]
where is the relative vector between desired and actual positions and is the normalized tangent vector of at . The resulting contour error approximation is given by:
[TABLE]
Both error terms are then inserted into the cost term,
[TABLE]
where is a diagonal positive definite weight matrix. Minimizing will move the quadrotor along the user defined spatial reference.
Our experiments have shown that distinguishing between lag and contour error is important for the temporal aspects of the optimization. Trajectories generated by minimizing , depending on the weighting of the term, either lag behind the temporal reference or cannot trade-off positional fit for smoother motion. With appropriate weights for lag and contour error this behavior is avoided.
To give users fine grained control over the target framing we follow user-specified viewing angles in an analogous fashion. To attain the camera yaw and pitch we minimize the 2-norm discrepancy between desired and actual orientation of the quadrotor and camera gimbal. Given by the following cost terms:
[TABLE]
where , , are the current yaw and pitch angles. Furthermore, we preprocess every keyframe by adding a multiple of to yaw and pitch such that the absolute distance to the respective angle of the previous keyframe has the smallest value.
3.4. Smooth Progress
For the camera to smoothly follow the path, we need to ensure that progresses. By specifying an initial and demanding to reach the end of the trajectory in the terminal state , the progress of can be forced with an implicit cost term. We simply penalize the trajectory end time by minimizing the state space variable ,
[TABLE]
Minimizing the end-time can be interpreted as optimizing trajectories to be as short as possible temporally (while respecting smoothness and limits of the robot model). This forces to make progress such that the terminal state is reached within time 111This also prevents solutions of infinitely long trajectories in time where adding steps with is free wrt. to Eq. (9))..
To ensure that the generated motion for the quadrotor is smooth, we introduce a cost term on the modelâs jerk states,
[TABLE]
where is jerk and angular jerk. We minimize jerk since it provides a commonly used metric to quantify smoothness [Hogan, 1984] and is known to be a decisive factor for the aesthetic perception of motion [Bronner and Shippen, 2015]. This cost term again implicitly effects by only allowing it to progress such that the quadrotor motion following the reference path is smooth according to (9). This is illustrated in Figure 2, left. The blue dot () progresses on the reference path such that the generated motion of the quadrotor following is smooth.
To still be able to specify the temporal length of a video shot with this formulation, we define the cost term,
[TABLE]
where we minimize the 2-norm discrepancy between the trajectory end time and a user-specified video length . In case a trajectory is optimized for Eq. (10), the weight for Eq. (8) is set to zero.
3.5. Optimization Problem
We construct our overall objective function by linearly combining the cost terms from Eq. (5), (6), (7), (8), (9), (10) and a 2-norm minimization of . The final cost is:
[TABLE]
where the scalar weight parameters are adjusted for a good trade-off between positional fit and smoothness. The final optimization problem is then:
[TABLE]
where is quadratic in , , and linear in . When flying a generated trajectory we follow the optimized positional trajectory with a standard LQR-controller and use velocity and accelerations states of as feed-forward terms.
4. Implementation
We implemented the above optimization problem with MATLAB and solve it with the FORCES Pro software [Domahidi and Jerez, 2017] which generates fast solver code, exploiting the special structure in the non-linear program. We set the horizon length of our problem to be . The solver requires a continuous path parametrization. To attain a description of the reference spline across the piecewise sections of the PCHIP spline, we need to locally approximate it. Therefore, we implemented an iterative programming scheme able to generate trajectories at interactive rates. For further details on the IP-scheme and the empirically derived weights of the optimization problem, we refer the interested reader to Appendix B.
5. Technical Evaluation
To evaluate the efficacy of our method in creating smooth camera motion even on problematic (for hard-constrained methods) inputs, we designed a challenging shot and generated two trajectories, one with [Gebhardt et al., 2016] and the other one with our method.
Figure 3 plots the resulting positions in and the corresponding camera angles. Our method adjusts the timing of the original keyframes (-) to attain smoother motion over time. This is visible when comparing the x-dimension of ours and [Gebhardt et al., 2016]. The need to trade-off timing and spatial location is illustrated by the orientation plot (Figure 3, bottom). The keyframes have been moved very close to each other which would cause excessive yaw velocities since the quadrotor would need to perform a 180° turn. Since our method trades-off the positional fit it generates smooth motion also for the camera orientation.
We also conducted a qualitative comparison by recording different videos with the same consumer grade drone. The quadrotor followed trajectories generated with our method and with [Gebhardt et al., 2016] using the same input. Figure 4 shows resulting video frames and jerk profiles (also see [video]). Although the timing of keyframes was improved for smoothness, our method still generates trajectories with lower magnitudes of positional jerk and less variation in angular jerk.
To assess quantitatively that our method generates smoother camera motion, we compare the averaged squared jerk per horizon stage of user-designed trajectories generated with our method, with [Joubert et al., 2015] and with [Gebhardt et al., 2016]. Figure 5 shows lower jerk and angular jerk values for our optimization scheme compared to both baseline methods, across all trajectories.
Finally, we evaluate the optimization runtime of our method. Therefore, we generated trajectories from the studies of [Gebhardt and Hilliges, 2018; Joubert et al., 2015] using the approximated linear quadrotor model of Sec. 3.1 and the non-linear model of [Nägeli et al., 2017]. We measured runtime on a standard desktop machine (Intel Core i7 4GHz CPU, Forces Pro NLP-solver). The computation time for the trajectories are shown in Figure 5. In average, it took 2.41 s (SD = 2.50 s) to generate a trajectory with the linear model and 14.79 s (SD = 15.50 s) with the non-linear model.
6. Perceptual Study
Our technical evaluation shows that the proposed method generates smoother trajectories. However, it has not been validated that the trajectories generated with our method result in aesthetically more pleasing video. To this end, we conduct an online survey comparing videos which follow user-specified timings, generated with the methods of [Gebhardt et al., 2016; Joubert et al., 2015], with videos generated by our method. Therefore, we compare user-designed trajectories from prior work [Gebhardt and Hilliges, 2018; Joubert et al., 2015]. For each question we take the user-specified keyframes of the original trajectory and generated a time-optimized trajectory of the same temporal duration (via Equation 10) using our method. We then render videos for the original and time-optimized trajectory using Google Earth (based on GPS-coordinates and camera angles). The two resulting videos are placed side-by-side, randomly assigned to the left or right, and participants state which video they prefer on a forced alternative choice 5-point Likert scale. The five responses are: âshot on the left side looks much more pleasingâ, âshot on the left side looks more pleasingâ, âboth the sameâ, âshot on the right side looks more pleasingâ, and âshot on the right side looks much more pleasingâ. Each participant had to compare 14 videos.
6.1. Results
In total, 424 participants answered the online survey. Assuming equidistant intervals, we mapped survey responses onto a scale from -2 to 2, where negative values mean that the original, timed video is aesthetically more pleasing, 0 indicates no difference and a positive value indicates a more aesthetically pleasing time-optimized video. In order to attain interval data, our samples are build by taking the mean of the Likert-type results of the expert and non-expert designed videos per participant. Visual inspection of residual plots did not reveal any obvious deviations from normality.
Evaluating all responses of the survey, we try to attain a mean which compensates random participant effects. Therefore, we construct a linear mixed model using the participant as random intercept, the video as fixed intercept and introducing a fixed-effect intercept to represent the overall mean. The adjusted mean of the data has a positive value with a high confidence (see Figure 6). A type III ANOVA showed that there is a significant effect of our method on the aesthetics of videos (). Unpacking this result further, we distinguish between videos that have been designed by non-expert users (data from [Gebhardt and Hilliges, 2018]) and expert users (i.e. cinematographers, data from [Joubert et al., 2015]). Analyzing the results for significance, we perform a one sample t-test on the averaged Likert ratings for expert- and non-expert-designed videos. The effect of both conditions and their confidence intervals are shown in Figure 6. While they are significant for both conditions (expert: ; non-expert: ), the effect is positive and amplified for non-expert designed videos and negative for expert designed videos.
6.2. Discussion
The perceptual study provides strong evidence that our method has a positive effect on the aesthetic perception of aerial videos. Furthermore, it has shown that this effect is even stronger for videos by non-experts. This supports our hypothesis that non-experts benefit from generating trajectories according to global smoothness as main criteria. Looking at expert created videos the picture is different. These videos were rated as more pleasant when generated with methods which respect user-specified timings. This can be explained by the fact that experts explicitly leverage shot timings to create particular compositional effects. Optimizing for global smoothness removes this intention from the result. However, the significant positive effect of our method on all responses and a larger effect size for the positive effect of non-expert- compared to the negative effect of expert designed videos indicate that smooth motion is a more important factor for the aesthetic perception of aerial videos than timing. This suggests that users, especially experts, could benefit from a problem formulation which allows for soft-constrained instead of hard-constrained timings. In this way, users could still employ shot timings to create compositional effects, while the optimization scheme generates trajectories trading-off user-specified timings and global smoothness.
Based on these results, we formulate three requirements for quadrotor camera generation schemes:
-
smoothness should be the primary objective of quadrotor camera trajectory generation,
-
methods should auto generate or adjust keyframe timings to better support non-experts,
-
while providing tools for experts to specify soft-constrained timings.
The proposed method already full-fills requirement 1) and 2). In the next section, we propose how our method can be extended such that *all *requirements are met.
7. Method Extensions
Recognizing the need to provide both global smoothness and explicit user control over camera timings, we present two method extensions to control camera motion: an approach based on âclassicâ keyframe timings and a further approach based on velocity profiles.
7.1. Keyframe Timings
We augment our objective function with an additional term for soft-constraint keyframe timings. The original formulation does not allow for the setting of timing references based on horizon stages: due to the variable horizon we lack a fixed mapping between time and stage. To be able to map timings with the spatial reference, we use the -parameterization of the reference spline. Reference timings hence need to be specified strictly monotonically increasing in . Based on the reference timings and the corresponding -values we interpolate a spline through these points, which results in timing reference function which can be followed analogously to spatial references by minimizing the cost,
[TABLE]
where is the current stage of the horizon and is the current discretization of the model. We add this cost into (11) and assign a weight to specify its importance . By setting the value of to a very large number, quasi hard-constrained keyframes are attainable.
7.2. Reference Velocities
The above extension enables mimicry of timing control in prior methods. However, the actual purpose of specifying camera timings in a video is to control or change camera velocity to achieve a desired effect (recall the fly-by example). Since determining the timing of the shot explicitly is difficult, we propose a way for users to directly specify camera velocities. We extend the formulation of our method to accept reference velocities as input. Again, we use the -parameterization to assign velocities to the reference spline . To minimize the difference between the velocity of the quadrotor and the user-specified velocity profile , we specify the cost,
[TABLE]
where we project the current velocity of the quadrotor on the normalized tangent vector of the positional reference function . We add this cost term and a weight to (11).
8. User Study
To understand whether our final method has the potential to improve the usability of quadrotor camera tools, whether soft-constrained timing methods produce videos of similar perceived aesthetics then hard-constrained timing methods and whether experts can benefit from our method, we conduct an additional user study. In this study, we compare different variants of our method with the method of [Gebhardt et al., 2016] as representative for quadrotor camera optimization schemes which use hard-constrained keyframe timings.
8.1. Experimental Design
User Interface:
In our experiment we used the tool of [Gebhardt and Hilliges, 2018] and extended the UI with a toolbar. This toolbar contains a slider to specify (see Equation 11) and depending on the condition, a progress curve or a velocity profile. A progress curve allows for the editing of the relative progress on a trajectory over time (see Figure 7, a). A velocity profile enables editing of the camera speed over the progress on the trajectory (see Figure 7, b).
Experimental conditions:
We investigate four different conditions:
-
In timed, participants work with the optimization method of [Gebhardt et al., 2016] and a progress curve (see Figure 7, a).
-
Soft-timeduses our optimizer and the progress curve. Participants can decide whether they want to specify keyframe timings (see Equation 13) or use the auto-generated timings.
-
In auto participants work with our optimization and the keyframe timings it provides. They can choose to fix the end time of a trajectory (see Equation 10).
-
Velocityuses our method and a velocity profile (see Figure 7, b). Participants can decide whether they want to specify camera velocity (see Equation 13) or use the auto-generated speed.
Tasks:
The study comprises two tasks:
-
Participants were asked to design a free-form video of a building in a virtual environment (T1). We asked participants to keep the spatial trajectory as similar as possible across conditions whereas the dynamics of camera motion were allowed to differ. They performed the task in the conditions timed, soft-timed and auto.
-
Participants were asked to faithfully reproduce an aerial video shot with varying camera velocity (T2). Participants should try to reproduce camera path and dynamics of the reference video. This task was performed with the conditions timed, soft-timed and velocity to investigate the level of control over timing afforded by the different conditions.
We use a within-subjects design and counterbalance order of conditions within a task to compensate for learning effects.
Procedure:
Participants were introduced to the system and the four conditions and were given time to practice using the tool in a tutorial taks. Participants then solved T1 and T2 in the respective conditions. Tasks were completed when participants reported to be satisfied with the designed shot (T1) or the similarity to the reference (T2). For each task and condition participants completed the NASA-TLX and a questionnaire on levels of satisfaction with the result. Finally, a short exit interview was conducted. A session took on average approximately 70 min (introduction 9 min, tutorial 7 min, T1 22 min, T2 23 min).
Participants:
We recruited 12 participants (5 female, 7 male). We purposely included 3 experts: an avid hobby quadrotor videographer, a professional videographer, experimenting with quadrotor videography in his free-time, and a professional quadrotor videographer. The remaining participants reported no experience in aerial or normal photo- or videography.
8.2. Results
We analyze the effect of the conditions on the usability of the tool and the aesthetics of the resulting videos. For significance testing, we ran a one-way ANOVA if the normality assumption holds and a Kruskal-Wallis test when it is violated. Analyzing the data of experts and non-experts separately, we found no significant differences in results and thus will not differentiate between them in this section.
Usability
To asses the effect of our method on the usability of the tool, we asked participants to fill out NASA-TLX and collected interaction logs (e.g. task execution time). In T1, auto has the lowest median in terms of task load, followed by soft-timed and timed (see Figure 8). This ranking remains the same for all interaction logging measures of T1 (see task execution time (TET), time updates and number of generations). Although there is no significant effect of conditions in T1 on task load (), the other measures do differ significantly (task execution time: ; time updates: ; number of generations: ). Pairwise comparison indicates that for TET and time updates auto is significantly different to timed (TET: ; time updates: ). For number of generations, auto and soft-timed significantly differ to timed (auto-timed: , soft-timed-timed: ). Other differences are not significant. Auto automatically generates timings and thereby camera velocities. This explains the conditionâs first rank in terms of task load and interaction logs as it simplifies the task drastically. For T2, velocity and soft-timed yield a lower task load compared to timed, indicating a slight advantage of our method in terms of usability (differences are not significant: ). This ranking is confirmed by interaction logs where soft-timed and velocity perform similar and are followed by timed. The number of generations between conditions differs statistically significantly (). A pairwise comparison indicates that velocity and soft-timed significantly differ from timed (velocity-timed: ; soft-timed-timed: ). Other differences are not significant (TET: ; time updates: ).
Aesthetics
We are also interested in participantsâ perceived difference in aesthetics of the generated videos. We asked participants in both tasks to assess the visual quality of the video they designed on a scale ranging from 1 (not at all pleasing) to 7 (very pleasing, see Figure 8). Although differences are not significant (aesthetics in T1: ; aesthetics in T2: ), the variants of our method are perceived to produce aesthetically more pleasing videos in both tasks. For T1, we also asked users to rate the smoothness of videos on a scale from 1 (non-smooth) to 7 (very smooth). Figure 8 summarizes the results which do not differ significantly between conditions ().
8.3. Discussion
Despite the small sample size of our experiment, the results indicate a positive effect of our method on both, the perceived aesthetics of results and the usability of the tool. Auto caused the lowest task load among conditions and participants where satisfied with the generated results. Although soft-timed and timed allow to specify the timing of a shot in the same manor (using the progress-curve or the timeline), soft-timed performed better than timed in terms of task load (T2) and aesthetics (T1 and T2). We think that this preference can be explained by two factors. First, participants in soft-timed generally used a workflow in which they initially take generated timings and then adjust keyframe times to create a desired visual. This workflow was implemented by experts but also by non-experts (if they used keyframe timings at all). Second, in soft-timed keyframe timings are specified as soft-constraints, allowing the optimizer to trade-off the temporal fit for a smoother trajectory. This makes soft-timed more forgiving than timed wrt to the space-time-ratio in-between keyframes, reducing adjustments participants had to do in order to solve a task in this condition (see time/velocity updates and no. of generations in Figure 8). In addition, soft-constrained timings allow the optimizer to still generate feasible trajectories even if the underlying user input would not yield a feasible result
The preference for soft-constrained keyframe timings is also an indication for our general assumption that timing control is not used to precisely specify the time a camera should be at a certain position. Instead users employ timing to specify the velocity of the camera along the path. This is also suggested by looking at the results of the velocity condition. In T2, it performed similar to soft-timed and better than timed for task load and aesthetics, indicating that specifying camera dynamics via a velocity profile is an intuitive alternative for providing keyframe timings.
9. Conclusion
In this paper, we addressed the dichotomy of smoothness and timing control in current quadrotor camera tools. According to design requirements in literature [Joubert et al., 2015] their optimizers incorporate keyframes timing as hard constraints, providing precise timing control. A recent study [Gebhardt and Hilliges, 2018] has shown that this causes users to struggle when specifying smooth camera motion over an entire trajectory. The current optimization formulations needs to have matching distances between the 5 dimensions of a keyframe (position, yaw and pitch of camera angle) with its temporal position. This poses a particular hard interaction problem for users, especially novices. In this paper, we propose a method which generates smooth quadrotor camera trajectories by taking keyframes only specified in space and optimizing their timings. We formulated this non-linear problem as a variable horizon trajectory optimization scheme which is capable of temporally optimizing positional references.
In a large-scale online survey we compared videos generated with our method to videos generated with [Gebhardt et al., 2016] and [Joubert et al., 2015]. The results indicate a general preference for videos generated according to a global smoothness objective, but also highlight that videos of experts are aesthetically more pleasing when provided timing control. Based on these insights, we extend our method such that users can specify keyframe timings as soft-constraints but still globally smooth trajectories are attained. In addition, we allow users to specify camera reference velocities set as soft-constraints in the optimization.
We test the efficacy and usability of our optimization in a comparative user study (baseline is [Gebhardt et al., 2016]). The results indicate that our method positively effects the usability of quadrotor camera tools and improves the visual quality of video shots for experts and non-experts. Both benefit from using an optimized timing initially and having the possibility of fine-tuning it according to their intention. In addition, the user study revealed that timing control does not need to be precise but is rather used to control camera velocity in order to create a desired compositional effect.
Acknowledgements.
We thank Yi Hao Ng for his work in the exploratory phase of the project, Chat Wacharamanotham for helping with the statistical analysis of the perceptual study and Velko Vechev for providing the video voice-over. We are also grateful for the valuable feedback of Tobias Nägeli on problem formulation and implementation. This work was funded in parts by the Sponsor Swiss National Science Foundation http://www.snf.ch/en/Pages/default.aspx ( Grant #UFO 200021L_153644).
Appendix A Notation
For completeness and reproducibility of our method we provide a summary of the notation used in the paper in Table 1.
Appendix B Implementation Details
In this section, we provide details on the weights we use in the objective function, the iterative programming scheme we implemented to attain a continuous path parametrization and its performance.
B.1. Optimization Weights
The values for the weights of the objective function we used in the user study and the online survey are listed in Table 2.
B.2. Iterative Programming Scheme
The solver requires a continuous path parametrization. To attain a description of the reference spline even in-between the piecewise sections of the PCHIP-spline, we need to locally approximate it. Therefore, we implement an iterative programming scheme where we compute a quadratic approximation of the reference spline around the -value of each stage in the horizon. This process is described in Figure 9. In the first iteration of the scheme we initialize all to zero and fit the entire reference trajectory (black spline) with a single quadratic approximation (blue spline). By solving the optimization problem of Equation 12, the progression of -values will be decided based on the quadratic approximation (yellow dots). For the next iterations, we always take the value of from the last run of the solver, project it on the reference spline (green dots) and fit a local quadratic approximation (red splines). Based on these fits the progress of -values again is optimized. We stop the optimization when the difference of the -values for all stages in-between iterations is smaller than a pre-defined threshold, .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1]
- 23D Robotics [2015] 3D Robotics. 2015. 3DR Solo. (2015). Retrieved September 13, 2016 from http://3drobotics.com/solo
- 3Aguiar et al . [2008] A. Pedro Aguiar, Joao P. Hespanha, and Petar V. Kokotovic. 2008. Performance limitations in reference tracking and path following for nonlinear systems. Automatica 44, 3 (2008), 598 â 610. https://doi.org/10.1016/j.automatica.2007.06.030 ¡ doi â
- 4APM [2016] APM. 2016. APM Autopilot Suite. (2016). Retrieved September 13, 2016 from http://ardupilot.com
- 5Arijon [1976] Daniel Arijon. 1976. Grammar of the film language. (1976).
- 6Audronis [2014] Ty Audronis. 2014. How to Get Cinematic Drone Shots. (2014). Retrieved August 29, 2017 from https://www.videomaker.com/article/c 6/17123-how-to-get-cinematic-drone-shots
- 7Balasubramanian et al . [2015] Sivakumar Balasubramanian, Alejandro Melendez-Calderon, Agnes Roby-Brami, and Etienne Burdet. 2015. On the analysis of movement smoothness. Journal of neuroengineering and rehabilitation 12, 1 (2015), 112. https://doi.org/10.1186/10.1186/s 12984-015-0090-9 ¡ doi â
- 8Betts [2009] John T. Betts. 2009. Practical Methods for Optimal Control and Estimation Using Nonlinear Programming (2nd ed.). Cambridge University Press, New York, NY, USA.
