Semi-strong Efficient Market of Bitcoin and Twitter: an Analysis of Semantic Vector Spaces of Extracted Keywords and Light Gradient Boosting Machine Models
Fang Wang (Florence Wong), Marko Gacesa

TL;DR
This paper investigates Bitcoin market efficiency during 2017-2022 by analyzing a large dataset of tweets and applying semantic vector space analysis and Light Gradient Boosting Machine models to predict market movements.
Contribution
It introduces a novel approach using fundamental keyword extraction and semantic vector analysis, moving beyond sentiment analysis to assess market efficiency.
Findings
High accuracy in predicting market movements from tweet data
Market reactions are faster and more precise than previously thought
Semantic analysis of keywords effectively captures market signals
Abstract
This study extends the examination of the Efficient-Market Hypothesis in Bitcoin market during a five year fluctuation period, from September 1 2017 to September 1 2022, by analyzing 28,739,514 qualified tweets containing the targeted topic "Bitcoin". Unlike previous studies, we extracted fundamental keywords as an informative proxy for carrying out the study of the EMH in the Bitcoin market rather than focusing on sentiment analysis, information volume, or price data. We tested market efficiency in hourly, 4-hourly, and daily time periods to understand the speed and accuracy of market reactions towards the information within different thresholds. A sequence of machine learning methods and textual analyses were used, including measurements of distances of semantic vector spaces of information, keywords extraction and encoding model, and Light Gradient Boosting Machine (LGBM)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlockchain Technology Applications and Security
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
